Monitor the state of BGP peering sessions in a Transit Gateway Connect peer using CloudWatch
This article details how to leverage AWS CloudWatch and serverless Compute with AWS Lambda to monitor the state of the BGP sessions in a Transit Gateway Connect peer.
AWS Customers are using AWS Transit Gateway Connect to connect their SD-WAN infrastructure with AWS without having to set up IPsec VPNs between SD-WAN network virtual appliances and Transit Gateway. However unlike AWS Site to Site IPsec VPNs, Transit gateway connect state metrics are not published to CloudWatch.
This article will demonstrate how to set up monitoring of AWS Transit Gateway connect peers using CloudWatch metrics and Lambda services. Customers could then use these metrics and leverage CloudWatch Alarms to notify them whenever the BGP peers are down and or Cloudwatch Dashboards to create customized views of the metrics and alarms.
Solution Overview
Following creation of the Transit Gateway connect peer (detailed steps on how to do this can be found here), the following are the steps required to deploy this solution.
Step 1: Determine the status of the connect peers programmatically using the DescribeTransitGatewayConnectPeers API
AWS SDK for Python (Boto3) will be the used in this solution. You can leverage any other SDK offerings that are supported by AWS Lambda. Detailed information on how to use Boto3 with the API DescribeTransitGatewayConnectPeers can be found here. There are two considerations to make here:
- Each connect peer has two BGP peerings and whose status will be monitored individually
- The API reports the peering status as either UP or DOWN. Similar to AWS VPN tunnel states, we will use 1 and 0 respectively as metric data point values.
Step 2: Publish metric data for each BGP peer to CloudWatch using the PutMetricData API
This API publishes metric data points to Amazon CloudWatch which then associates the data points with the specified metric. If the specified metric does not exist, CloudWatch creates the metric. The syntax and request parameters used are as described below:
put_metric_response = cw_client.put_metric_data(
MetricData = [
{
'MetricName': connect_peer_Id + '_' + 'bgp_peer_1',
'Dimensions': [
{
'Name': 'Connect_attachment_ID',
'Value': connect_attachment_Id
}
],
'Unit': 'None',
'Value': bgp_peer_1_metric_value
},
],
Namespace = 'TgwConnect'
)
- Creates a custom namespace called TgwConnect
- Since we have two BGP peerings in a connect peer, each session has a corresponding metric whose name is a concatenation of the connect peer ID and either "bgp_peer_1" or "bgp_peer_2".
- A dimension (key/value pair) with the connect attachment ID is added as a way to uniquely identify the metrics
- The metric value is either 1 or 0 to represent UP or DOWN respectively.
3. Package the operations into a Lambda function using the steps below:
- Open the AWS Lambda console at https://console.aws.amazon.com/lambda/
- Choose Create function.
- Choose Author from scratch.
- Enter a name and description for the Lambda function. For example, name the function tgw_connect_peer_status_checker.
- Choose Runtime as Python 3.12 ( you can choose the option that matches your SDK)
- Ensure to select an execution role that allows the Lambda function to perform the above two operations as well as upload logs to Amazon CloudWatch Logs
- Leave the rest of the options as the defaults and choose Create function.
- On the Code tab of the function page, double-click lambda_function.py.
- Replace the existing code with the following code.
import boto3
import logging
from botocore.exceptions import ClientError
ec2_client = boto3.client('ec2')
cw_client = boto3.client('cloudwatch')
def get_connect_peer_status():
connect_peer_details = []
try:
describe_connect_peer_response = ec2_client.describe_transit_gateway_connect_peers()
if 'TransitGatewayConnectPeers' in describe_connect_peer_response:
for connect_peer in describe_connect_peer_response['TransitGatewayConnectPeers']:
each_connect_peer_details = []
each_connect_peer_details.append(connect_peer["TransitGatewayAttachmentId"])
each_connect_peer_details.append(connect_peer["TransitGatewayConnectPeerId"])
each_connect_peer_details.append(connect_peer["ConnectPeerConfiguration"]["BgpConfigurations"][0]["BgpStatus"])
each_connect_peer_details.append(connect_peer["ConnectPeerConfiguration"]["BgpConfigurations"][1]["BgpStatus"])
connect_peer_details.append(each_connect_peer_details)
return connect_peer_details
except ClientError as e:
logging.warning(f"Unable to check connect peer status - {e}")
def put_metric_data(connect_peer_details):
try:
for connect_peer in connect_peer_details:
connect_attachment_Id = connect_peer[0]
connect_peer_Id = connect_peer[1]
if connect_peer[2] == 'down':
bgp_peer_1_metric_value = 0
else:
bgp_peer_1_metric_value = 1
if connect_peer[3] == 'down':
bgp_peer_2_metric_value = 0
else:
bgp_peer_2_metric_value = 1
#tunnel 1 state
put_metric_response = cw_client.put_metric_data(
MetricData = [
{
'MetricName': connect_peer_Id + '_' + 'bgp_peer_1',
'Dimensions': [
{
'Name': 'Connect_attachment_ID',
'Value': connect_attachment_Id
}
],
'Unit': 'None',
'Value': bgp_peer_1_metric_value
},
],
Namespace = 'TgwConnect'
)
#tunnel 2 state
put_metric_response = cw_client.put_metric_data(
MetricData = [
{
'MetricName': connect_peer_Id + '_' + 'bgp_peer_2',
'Dimensions': [
{
'Name': 'Connect_attachment_ID',
'Value': connect_attachment_Id
}
],
'Unit': 'None',
'Value': bgp_peer_2_metric_value
},
],
Namespace = 'TgwConnect'
)
except ClientError as e:
logging.warning(f"Unable to create or put metric data to cloudwatch - {e}")
def lambda_handler(event, context):
connect_peer_details = get_connect_peer_status()
put_metric_data(connect_peer_details)
- Choose Deploy.
4. Create a CloudWatch/EventBridge rule that runs on a schedule and with a target as the Lambda function. The solution detailed here triggers the Lambda periodically every five minutes. You can find detailed steps on how to create a rule with Lambda as a target here
Results
You should start seeing the status of each BGP peering session from each connect peer published to CloudWatch metrics under the TgwConnect custom namespace (All > TgwConnect > Connect_attachment_ID) as shown below:
In my test above I have:
- 2 Connect attachments: tgw-attach-0b7587d18b61b4052 & tgw-attach-0b36865156ef84716 that use different VPC transport attachments
- Connect attachment tgw-attach-0b7587d18b61b4052 has two connect peers: tgw-connect-peer-04918cb17e5f005a0 (one BGP peering session UP) & tgw-connect-peer-09c3592e4c0992570 (both BGP peering sessions DOWN) for a total of 4 BGP peerings
- Connect attachment tgw-attach-0b36865156ef84716 has one connect peer tgw-connect-peer-0f96b2ef90e58f2d0 which has both BGP peering sessions DOWN
Pricing Considerations
Aside from the normal Transit gateway Connect Pricing, AWS Lambda costs and CloudWatch Metric costs need to be factored in when deploying this solution
Related Items
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-run-lambda-schedule.html
Relevant content
- asked 8 months agolg...
- Accepted Answerasked 5 years agolg...
- Accepted Answerasked a year agolg...
- AWS OFFICIALUpdated 2 years ago
- How do I monitor my transit gateway and Site-to-Site VPN on a transit gateway using Network Manager?AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago