- Newest
- Most votes
- Most comments
As far as I understand the initial point, both previous answers miss the core requirement of cost-optimizing for thousands of dedicated customer IPs, or they introduce architectural anti-patterns for trading.
To me „why“ the previous suggestions are not best here:
-
The NAT-Gateway-per-customer approach (AI Response) is commercially unviable: Deploying a dedicated NAT Gateway per customer costs ~$32/month per customer just for idle provisioning ($0.045/hr). For 1,000 customers, that is over $32,000/month in fixed networking costs. Also, "NAT Gateway regional availability mode" is a technical hallucination.
-
The Centralized Egress pattern (Riku's Response) breaks multi-tenancy: While centralized egress via Transit Gateway is an official AWS Best Practice for standard enterprise IT, it is designed to pool traffic. Sharing a central NAT Gateway means multiple customers share the same IP pool, defeating your requirement for a dedicated, deterministic per-customer IP. Additionally, Transit Gateway adds extra hops, introducing unwanted latency (jitter) for algorithmic trading.
The Correct SaaS Best Practice: Distributed Forward Proxy Fleet
Since your workloads run on EC2, the industry standard for scaling to thousands of dedicated outbound IPs without paying for thousands of NAT Gateways is to abstract the egress layer using a proxy fleet:
-
Design: Deploy a highly available, ultra-low-latency EC2 Auto Scaling Group running Envoy or Squid in a central public subnet.
-
Mechanism: Attach your pool of customer Elastic IPs (EIPs) to these proxy instances as secondary IPs. Configure the proxy to inspect incoming tenant workloads (via source subnet, custom HTTP headers, or TLS SNI) and dynamically route outbound traffic using the customer's specific public IP via Linux routing tables (iptables / ip rule).
-
Benefits:
- Cost Savings: You only pay for a few EC2 proxy instances and the raw EIPs, completely eliminating thousands of hourly NAT Gateway charges.
- Strict Isolation: Guarantees 1:1 deterministic EIP mapping per customer for broker whitelisting.
- Low Latency: Eliminates Transit Gateway hops, keeping trading execution tight and predictable.
Hello.
Does each customer need a dedicated EC2 instance?
If a dedicated EC2 instance is required, I think using a NAT Gateway is the easiest way to obtain a static IP address.
If you need to partition not only the EC2 instances but also the VPC, you can share the NAT Gateway using the configuration described in the following document.
Because NAT Gateway is scalable, I believe it will be able to handle the traffic even if the number of customers increases in the future.
https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/using-nat-gateway-for-centralized-egress.html
A single NAT gateway can support up to 55,000 simultaneous connections per assigned IP address to each unique destination. You can request a quota adjustment to allow up to eight assigned IP addresses, allowing for 440,000 simultaneous connections to a single destination IP and port. NAT gateway provides 5 Gbps of bandwidth and automatically scales to 100 Gbps. Transit Gateway generally does not act as a load balancer and would not distribute your traffic evenly across NAT gateways in the multiple Availability Zones. The traffic across the Transit Gateway will stay within an Availability Zone, if possible. If the Amazon EC2 instance initiating traffic is in Availability Zone 1, traffic will flow out of the Transit Gateway elastic network interface in the same Availability Zone 1 in the egress VPC and will flow to the next hop based on that subnet route table that elastic network interface resides in. For a complete list of rules, refer to NAT gateways in the Amazon Virtual Private Cloud documentation.
While a forward proxy using EC2 certainly guarantees better performance than a configuration using Transit Gateway and NAT Gateway, it significantly increases operational costs due to the need to manage a large number of Elastic IPs and complex iptables routing configurations. While this may not be a problem in the initial stages when the number of customers is small, the configuration will become more complex as the number of users increases, so it would be best to use AWS managed services if possible. You should choose based on whether you prioritize operational costs or performance.
For your retail algorithmic trading platform requiring dedicated fixed outbound IPs per customer, I recommend a VPC-based architecture using NAT Gateways with Elastic IP addresses, combined with per-customer or per-customer-group VPC isolation.
Recommended Architecture Pattern
Core Design:
- Deploy separate VPCs (or shared VPCs with isolated subnets) for customer workloads
- Use NAT Gateway in each VPC/subnet with a dedicated Elastic IP address
- Route customer egress traffic through their assigned NAT Gateway
- This ensures each customer consistently egresses through their dedicated fixed public IP
Implementation Approach:
- VPC Structure: Create private subnets for customer workloads and public subnets for NAT Gateways
- NAT Gateway Deployment: Deploy NAT Gateways in public subnets, each associated with a unique Elastic IP address
- Routing Configuration: Configure route tables in private subnets to direct outbound traffic (0.0.0.0/0) through the customer's designated NAT Gateway
- Internet Gateway: Attach an Internet Gateway to the VPC for NAT Gateway connectivity
For High Availability: Deploy NAT Gateways across multiple Availability Zones with separate Elastic IPs. The new NAT Gateway in regional availability mode can automatically expand and contract across availability zones, eliminating operational overhead of managing NAT infrastructure in multiple zones while maintaining the fixed IP requirement.
Scaling and Cost Optimization Strategies
Multi-Tenancy Options:
- Shared VPC with isolated subnets: Group customers with similar traffic patterns into shared VPCs, each customer getting dedicated subnets and NAT Gateway
- VPC-per-customer-tier: Segment by customer size/traffic volume rather than pure 1:1 mapping
Cost Controls:
- NAT Gateway pricing is based on hourly charges plus data processing, making costs predictable
- Consolidate low-traffic customers into shared infrastructure where broker APIs allow multiple customer IPs
- Use Amazon VPC IP Address Manager (IPAM) policies for centralized control over IP allocation strategies and automated management at scale
- Consider Bring Your Own IPv4 (BYOIPv4) if you have existing IP blocks, which can be managed through IPAM policies for cost optimization with automated enforcement
Operational Efficiency:
- IPAM with prefix lists integration automates partner allowlisting by tracking contiguous IP blocks and automatically updating managed prefix lists
- Share prefix lists with broker partners via AWS Resource Access Manager for streamlined whitelisting
- Use IPAM policies to map NAT Gateway public IPv4 address allocation to designated pools for centralized management
Alternative Considerations
NAT Instances: Not recommended due to higher operational overhead, single points of failure, and manual scaling requirements compared to managed NAT Gateway service.
Transit Gateway: Useful for VPC interconnection but doesn't directly solve the fixed outbound IP requirement; still requires NAT Gateways for internet egress with fixed IPs.
EKS-based routing: While technically possible using pod-level networking and egress gateways, it adds complexity without addressing the fundamental requirement more efficiently than VPC-level NAT Gateway architecture.
Trading Platform Specific Considerations
For your low-latency trading requirements:
- NAT Gateway provides consistent, predictable performance with automatic scaling
- Deploy NAT Gateways in the same Availability Zone as workloads to minimize latency
- Regional availability mode reduces complexity while maintaining availability
- Monitor NAT Gateway metrics and IPAM pool utilization for capacity planning
This architecture scales from hundreds to thousands of customers while maintaining operational simplicity, cost predictability, and the deterministic fixed IP requirement essential for broker API whitelisting.
Sources
Generate a static outbound IP address using a Lambda function, Amazon VPC, and a serverless architecture - AWS Prescriptive Guidance
Build scalable IPv4 addressing with AWS NAT gateway in regional availability mode, Amazon VPC IPAM policies and Prefix Lists | Networking & Content Delivery
Building on Florian's proxy fleet approach and the cell-based architecture discussed in the comments, two implementation details that might help:
BYOIP cost savings: BYOIP addresses are exempt from the $0.005/hr public IPv4 charge. At 1000 IPs, that's roughly $3,600/month in address fees you avoid on top of eliminating NAT GW costs. The trade-offs: you need /24 blocks (minimum per BYOIP provision) registered with ARIN/RIPE/APNIC, a Route Origin Authorization, and provisioning takes up to 3 weeks. Worth running the numbers for your specific scale. BYOIP documentation has the full requirements.
ENI failover within cells: If you put each cell's EIP pool on a dedicated secondary ENI with DeleteOnTermination set to false, failover to the standby becomes a single ENI reattachment rather than individually reassociating each EIP. The ENI persists with all EIP associations intact and you attach it to the standby in one call. Constraint: ENIs can't move cross-AZ. ENI failover pattern.
References:
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 6 months ago

Small addition to this answer: Scaling the Squid/Envoy fleet might be tricky, since every EIP needs to be in use all the time (so that each customers EIP is always active). This means scaling would need to essentially re-shard the IPs between instances (ie, 1000 IPs are spread across 50 instances, and then when scaling adds 5 more instances, a few EIPs from each original instance need to be moved to the new ones). That could be a bit tricky to do in real time. Normally I'd think a larger fleet of T instances might be good here, but the CPU jitter on T family probably isn't great for a trading platform
Spot on, @ShahadC - regarding the day-2 operational challenges. Dynamic auto-scaling (ASG) is definitely an anti-pattern here, as re-associating EIPs via the AWS API is too slow and causes unacceptable downtime.
To solve this, the architecture needs two adjustments:
*.nvariants) ensures predictable, ultra-low Nitro-powered networking latency and supports high ENI/IP limits per instance.This setup keeps the massive cost benefits while ensuring the deterministic performance trading requires.