Skip to content

Best Practice for VPN Failover with Dual Direct Connect (DX) + DX Gateway + Multi-Region TGW

0

We currently have two AWS Direct Connect (DX) connections:

  1. One DX in us-east-1 2 2.One DX in us-east-2

Both DX connections use Transit VIFs and are attached to a single Direct Connect Gateway (DXGW). The DX Gateway is associated with two Transit Gateways (TGW):

  • TGW in us-east-1
  • TGW in us-east-2

Our on-premises network exchanges routes with AWS via BGP over DX, and routes are learned and propagated into the TGW route tables via the DXGW attachments.

Our business requirement is to add a site-to-site VPN as a failover path in case both DX connections become unavailable.

We are primarily focused on us-east-2, as most production workloads are hosted there.

Current State

  • On-prem ↔ DX ↔ DX Gateway ↔ TGW (us-east-1 and us-east-2)
  • TGW route tables receive and propagate routes via the DXGW attachments
  • DX is the primary path for on-prem connectivity

Proposed Approach (VPN Failover)

My initial thought is:

  1. Create a TGW VPN attachment in us-east-2
  2. Establish BGP over the VPN to advertise on-prem routes
  3. Associate and propagate the VPN attachment into the same TGW route table that currently contains the DXGW attachment

Questions

  1. Is this the recommended best practice for providing VPN failover for Direct Connect in a multi-region TGW + DXGW architecture?
  2. Will there be any route conflicts or unexpected behavior when the same prefixes are learned via:
  3. DXGW attachment (primary)
  4. TGW VPN attachment (backup)

What is the recommended way to ensure DX is always preferred, and VPN is only used during DX failure (for example, using BGP attributes such as AS path prepending, local preference, or MED)?

Are there any architecture patterns or AWS reference designs for this specific use case (dual DX + DXGW + multi-region TGW + VPN failover)?

We would appreciate guidance on the best-practice architecture and routing design to implement clean DX primary + VPN secondary failover, especially with focus on us-east-2 as the primary production region.

Thank you in advance for any recommendations.

2 Answers
0
Accepted Answer

Your approach is correct. Please also consider the following:

  1. In order for DX to be preferred over VPN for traffic sent from AWS to your on-premises, the prefixes received across both connections need to either be the same or more specific on the DX connection. BGP attributes (AS Path prepending, MED) will not influence this decision. These attributes will only come into play when the routing decision needs to be made between two attachments of the same type. i.e you cannot use these attributes to prefer DX over VPN or vice versa. AWS will always prefer DX over VPN when the prefixes are the same and if both prefixes are received via BGP.
  2. To ensure that DX is preferred for traffic you send to AWS from on-premises, you can use the BGP attribute local preference in the event that prefixes advertised from AWS are the same or more specific on the VPN connection.
  3. Ensure to size the VPN connections correctly such that in the event of a failover you do not experience bandwidth contention and possibly packet drops. For high bandwidth requirements AWS now supports VPN connections with up to 5 Gbps per tunnel
  4. Ensure that your routing configuration does not cause asymmetric routing where this is not desired
AWS
EXPERT
answered a month ago
AWS
EXPERT
reviewed a month ago
0

Your proposed approach for VPN failover with your dual Direct Connect setup is on the right track, but I can provide some additional guidance to ensure it works as expected.

For your specific architecture with dual DX connections, DX Gateway, and multi-region Transit Gateways, AWS recommends using Site-to-Site VPN terminating on an AWS Transit Gateway as a backup for mission-critical workloads. This is exactly what you're proposing.

To implement this properly:

  1. Create a Site-to-Site VPN attachment to your Transit Gateway in us-east-2 (since that's your primary production region)
  2. Configure two VPN tunnels as part of this connection for high availability, with each tunnel terminating in a different Availability Zone
  3. Use BGP for dynamic routing over the VPN connection

For route preference to ensure DX is always the primary path:

  1. BGP path selection will naturally prefer Direct Connect over VPN due to AS path length when both are available
  2. You can further influence route selection by using BGP attributes:
  • AS path prepending on the VPN connection (adding extra AS numbers to make the path appear longer)
  • Adjusting MED (Multi-Exit Discriminator) values
  • Setting higher local preference values for DX routes

To avoid route conflicts when the same prefixes are learned via both DXGW and VPN attachments, ensure your Transit Gateway route tables are properly configured. The TGW will automatically handle route preference based on BGP attributes, so with proper configuration, there shouldn't be conflicts - it will prefer DX when available and fall back to VPN when needed.

For additional resilience, consider:

  • Using redundant customer gateway devices for your VPN connection
  • Ensuring your on-premises VPN endpoints have multiple internet connections from different ISPs
  • Configuring Equal-Cost Multi-Path (ECMP) routing across multiple VPN tunnels for increased throughput (up to 50Gbps, though each tunnel is capped at 1.25 Gbps)

This approach aligns with AWS best practices for providing VPN failover in a multi-region TGW + DXGW architecture, particularly for your us-east-2 production region.
Sources
Redundant AWS Site-to-Site VPN connections for failover - AWS Site-to-Site VPN
AWS Direct Connect | Create resilient hybrid networks | AWS
REL02-BP02 Provision redundant connectivity between private networks in the cloud and on-premises environments - AWS Well-Architected Framework

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.