- Newest
- Most votes
- Most comments
You describe the scenario precisely. What I do not understand is why it works most of the times but sometimes not. What I understand from the Appliance Mode is that TGW will return the packet to the original Ingress VPC TGW ENI using hash algorithm and in some cases it's not working. Anyway, thank you for the answer.
There's lot to unpack in this question and it's not 100% clear to me exactly what you've configured so I'm going to have to guess a little. A diagram would be really helpful here.
What I think you're doing is having ingress traffic from the internet come into a couple of firewalls, each of which has an EIP associated with it. Sessions come into the firewall and the destination address is NATted to "something" (load balancer perhaps) in your services VPC which is connected via Transit Gateway.
Through this you're seeing asymmetric flows but only sometimes.
The problem you're seeing is because the return traffic is going to have a source IP address which is public (I think - again, I'm guessing here). So the return traffic from the services VPC will be going to a public IP address (the original client's IP address). I'm not sure how you have the routing set up but what is probably happening is that traffic is "landing" in a random AZ and being routed to the firewall in that AZ. Which may not be the firewall that the original session came through - hence the asymmetric flows.
The easiest way to fix this is to do source NAT on the sessions coming in through the firewalls so that the network traffic coming from the firewalls going to the services VPC appears to come from the private IP address of the firewall.
Edit: A colleague pointed out that we do document this - in the highlighted section at the top:
Traffic can drop if the centralized VPC receives the traffic from a different gateway — for example, an Internet gateway — and then sends that traffic to the transit gateway attachment after inspection.
Bigger picture though: It sound like you're trying to build a centralised ingress solution using firewalls. I'm strongly opinionated about this and I think that it isn't a good model because those firewalls become a significant single point of failure - even though you have two, failover and scaling is a challenge. You can hear me talk more about this in detail on the AWS Podcast.
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago