The purpose of this article is to provide troubleshooting steps to resolve connectivity issues between AWS VPC and On-premises via AWS Site to Site VPN.
Architecture:
If you are experiencing connectivity issues between AWS VPC and On-premises it’s important to understand following:
- What is the Traffic Flow?
- What is the Source IP address and Destination IP address?
- How are you trying to connect and what is the type of traffic? (Protocol and Port)
Once you gather all the data please verify below steps:
Step 1: Check if the source Instance is allowing the traffic between source and destination in Security Groups and Network Access Control List (NACLs).
Step 2: Check if the source Instance have the route pointing towards destination via Transit Gateway [x.x.x.x/x → tgw-abcdef].
Step 3: Check if the you have the same Availability Zone enabled for an VPC transit gateway attachment where the source Instance is present.
Step 4: Further, check the Network Access Control List associated to the transit gateway Elasticast ENI subnet. If Network Access List is restricted it should allow bidirectional traffic.
As per transit gateway design best practice, we recommend you to use separate subnet for Transit Gateway attachments: [1]
Use a separate subnet for each transit gateway VPC attachment. For each subnet, use a small CIDR, for example /28, so that you have more addresses for EC2 resources. When you use a separate subnet, you can configure the following:
- Keep the inbound and outbound network ACLs associated with the transit gateway subnets open.
- Depending on your traffic flow, you can apply network ACLs to your workload subnets.
Step 5: Check if the Transit Gateway routing table associated to the VPC attachment have correct route pointing towards VPN attachment.
- Static AWS Site to Site VPN: Ensure that you have a valid static route and it's not blackhole. If it’s blackhole route that means VPN connection could be down.
- Dynamic AWS Site to Site VPN: Correct routes are being propagated in the transit gateway route table. If you don’t see the correct routes in the transit gateway route table you need to check the routes advertised from on-premises/customer gateway device.
Step 6: From the on-premises customer gateway device, verify if IPsec traffic selectors are being configured correctly.
Step 7: Check the cloud watch metrics for VPN Attachment and VPN Tunnel Data In and Out to validate if there’s an active traffic.
Step 8: Check if the VPN attachment have the route for the traffic toward VPC attachment in next hop/routes.
Step 9: If it’s accelerated VPN please ensure that NAT-T is enabled.
Step 10: If everything looks good and still you are observing connectivity issue then please enable VPC flow log on Transit Gateway ENIs and Instance, using Flow Log Format as ${action} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} and analyze the traffic pattern. [2] Also, check the flow logs on the TGW ENI to see if the traffic is seen and not being REJECTED.
Step 11: Enable TGW Flow logs on the Transit Gateway and check if the flow is seen on respective attachments. You can also look for 'packets-lost-blackhole', 'packets-lost-no-route' and 'packets-lost-mtu-exceeded' fields.
For the AWS Site to Site VPN specific connectivity troubleshooting please follow one of my another referenced articles. [4]
- [1] Transit gateway design best practices: https://docs.aws.amazon.com/vpc/latest/tgw/tgw-best-design-practices.html
- [2] Publish flow logs to CloudWatch Logs: https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-cwl.html
- [3] Available flow logs fields: https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html#flow-logs-fields
- [4] How do I troubleshoot VPN tunnel connectivity to an Amazon VPC?: https://repost.aws/knowledge-center/vpn-tunnel-troubleshooting