Skip to content

Architecture Guidance Request for Cost-Optimized Fixed Outbound IP Design

0

We are building a retail algorithmic trading platform on AWS and need architecture guidance for designing the most scalable and cost-effective networking solution.

Current Architecture

Our trading workloads run on Amazon EC2.

Broker trading APIs require source IP whitelisting, so each customer must have a deterministic fixed outbound public IP for API connectivity.

Current traffic flow:

Customer Workload → EC2 → Dedicated Elastic IP → Broker API

Today, we achieve this by attaching multiple Elastic IPs through EC2 networking configurations.

Challenge

This approach is becoming operationally and commercially unsustainable as customer count grows.

Primary concerns:

  • Managing large numbers of Elastic IPs becomes complex
  • EC2 networking limits (ENIs, secondary IPs, routing complexity) create scaling bottlenecks
  • Adding EC2 instances purely to overcome networking constraints increases infrastructure cost significantly
  • Instance failure impacts multiple customers
  • Trading workloads require low latency, predictable network behavior, and high availability

Our key requirement is:

Each customer must consistently egress through a dedicated fixed public IP for broker API whitelisting.

Guidance Requested

We are looking for the most cost-optimized AWS-native architecture for this requirement.

Specifically:

  • Best AWS design pattern for deterministic per-customer outbound static IP mapping
  • Alternatives better than EC2 + multiple Elastic IP attachment
  • Whether EKS-based egress routing is practical for this use case
  • Whether NAT instances, NAT Gateway, Transit Gateway, BYOIP, or other AWS networking services are recommended
  • Best architecture for scaling from hundreds to potentially thousands of customers while controlling cost

Core Question

What is the most scalable, operationally efficient, and cost-effective AWS-native architecture for a trading platform where each customer requires a dedicated fixed outbound public IP for third-party broker API whitelisting?

4 Answers
2

As far as I understand the initial point, both previous answers miss the core requirement of cost-optimizing for thousands of dedicated customer IPs, or they introduce architectural anti-patterns for trading.

To me „why“ the previous suggestions are not best here:

  1. The NAT-Gateway-per-customer approach (AI Response) is commercially unviable: Deploying a dedicated NAT Gateway per customer costs ~$32/month per customer just for idle provisioning ($0.045/hr). For 1,000 customers, that is over $32,000/month in fixed networking costs. Also, "NAT Gateway regional availability mode" is a technical hallucination.

  2. The Centralized Egress pattern (Riku's Response) breaks multi-tenancy: While centralized egress via Transit Gateway is an official AWS Best Practice for standard enterprise IT, it is designed to pool traffic. Sharing a central NAT Gateway means multiple customers share the same IP pool, defeating your requirement for a dedicated, deterministic per-customer IP. Additionally, Transit Gateway adds extra hops, introducing unwanted latency (jitter) for algorithmic trading.

The Correct SaaS Best Practice: Distributed Forward Proxy Fleet

Since your workloads run on EC2, the industry standard for scaling to thousands of dedicated outbound IPs without paying for thousands of NAT Gateways is to abstract the egress layer using a proxy fleet:

  • Design: Deploy a highly available, ultra-low-latency EC2 Auto Scaling Group running Envoy or Squid in a central public subnet.

  • Mechanism: Attach your pool of customer Elastic IPs (EIPs) to these proxy instances as secondary IPs. Configure the proxy to inspect incoming tenant workloads (via source subnet, custom HTTP headers, or TLS SNI) and dynamically route outbound traffic using the customer's specific public IP via Linux routing tables (iptables / ip rule).

  • Benefits:

    • Cost Savings: You only pay for a few EC2 proxy instances and the raw EIPs, completely eliminating thousands of hourly NAT Gateway charges.
    • Strict Isolation: Guarantees 1:1 deterministic EIP mapping per customer for broker whitelisting.
    • Low Latency: Eliminates Transit Gateway hops, keeping trading execution tight and predictable.
EXPERT
answered 23 days ago
  • Small addition to this answer: Scaling the Squid/Envoy fleet might be tricky, since every EIP needs to be in use all the time (so that each customers EIP is always active). This means scaling would need to essentially re-shard the IPs between instances (ie, 1000 IPs are spread across 50 instances, and then when scaling adds 5 more instances, a few EIPs from each original instance need to be moved to the new ones). That could be a bit tricky to do in real time. Normally I'd think a larger fleet of T instances might be good here, but the CPU jitter on T family probably isn't great for a trading platform

  • Spot on, @ShahadC - regarding the day-2 operational challenges. Dynamic auto-scaling (ASG) is definitely an anti-pattern here, as re-associating EIPs via the AWS API is too slow and causes unacceptable downtime.

    To solve this, the architecture needs two adjustments:

    1. Static Sharding (Cell-based Architecture): Instead of one scaling fleet, group customers into fixed "cells" (e.g., 50–100 tenants per cell). Each cell gets a static pair of proxies. If one fails, the entire secondary ENI (holding that cell's EIP pool) fails over to the standby instance. New customers simply mean deploying a new cell via IaC—no real-time IP shuffling required.
    2. Network-Optimized Hardware (No T-Family): To eliminate the CPU jitter you accurately pointed out, the fleet must run on dedicated compute. Using C6in or C7g instances (*.n variants) ensures predictable, ultra-low Nitro-powered networking latency and supports high ENI/IP limits per instance.

    This setup keeps the massive cost benefits while ensuring the deterministic performance trading requires.

0

Hello.

Does each customer need a dedicated EC2 instance?
If a dedicated EC2 instance is required, I think using a NAT Gateway is the easiest way to obtain a static IP address.

If you need to partition not only the EC2 instances but also the VPC, you can share the NAT Gateway using the configuration described in the following document.
Because NAT Gateway is scalable, I believe it will be able to handle the traffic even if the number of customers increases in the future.
https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/using-nat-gateway-for-centralized-egress.html

A single NAT gateway can support up to 55,000 simultaneous connections per assigned IP address to each unique destination. You can request a quota adjustment to allow up to eight assigned IP addresses, allowing for 440,000 simultaneous connections to a single destination IP and port. NAT gateway provides 5 Gbps of bandwidth and automatically scales to 100 Gbps. Transit Gateway generally does not act as a load balancer and would not distribute your traffic evenly across NAT gateways in the multiple Availability Zones. The traffic across the Transit Gateway will stay within an Availability Zone, if possible. If the Amazon EC2 instance initiating traffic is in Availability Zone 1, traffic will flow out of the Transit Gateway elastic network interface in the same Availability Zone 1 in the egress VPC and will flow to the next hop based on that subnet route table that elastic network interface resides in. For a complete list of rules, refer to NAT gateways in the Amazon Virtual Private Cloud documentation.

EXPERT
answered 23 days ago
  • While a forward proxy using EC2 certainly guarantees better performance than a configuration using Transit Gateway and NAT Gateway, it significantly increases operational costs due to the need to manage a large number of Elastic IPs and complex iptables routing configurations. While this may not be a problem in the initial stages when the number of customers is small, the configuration will become more complex as the number of users increases, so it would be best to use AWS managed services if possible. You should choose based on whether you prioritize operational costs or performance.

0

For your retail algorithmic trading platform requiring dedicated fixed outbound IPs per customer, I recommend a VPC-based architecture using NAT Gateways with Elastic IP addresses, combined with per-customer or per-customer-group VPC isolation.

Recommended Architecture Pattern

Core Design:

  • Deploy separate VPCs (or shared VPCs with isolated subnets) for customer workloads
  • Use NAT Gateway in each VPC/subnet with a dedicated Elastic IP address
  • Route customer egress traffic through their assigned NAT Gateway
  • This ensures each customer consistently egresses through their dedicated fixed public IP

Implementation Approach:

  1. VPC Structure: Create private subnets for customer workloads and public subnets for NAT Gateways
  2. NAT Gateway Deployment: Deploy NAT Gateways in public subnets, each associated with a unique Elastic IP address
  3. Routing Configuration: Configure route tables in private subnets to direct outbound traffic (0.0.0.0/0) through the customer's designated NAT Gateway
  4. Internet Gateway: Attach an Internet Gateway to the VPC for NAT Gateway connectivity

For High Availability: Deploy NAT Gateways across multiple Availability Zones with separate Elastic IPs. The new NAT Gateway in regional availability mode can automatically expand and contract across availability zones, eliminating operational overhead of managing NAT infrastructure in multiple zones while maintaining the fixed IP requirement.

Scaling and Cost Optimization Strategies

Multi-Tenancy Options:

  • Shared VPC with isolated subnets: Group customers with similar traffic patterns into shared VPCs, each customer getting dedicated subnets and NAT Gateway
  • VPC-per-customer-tier: Segment by customer size/traffic volume rather than pure 1:1 mapping

Cost Controls:

  • NAT Gateway pricing is based on hourly charges plus data processing, making costs predictable
  • Consolidate low-traffic customers into shared infrastructure where broker APIs allow multiple customer IPs
  • Use Amazon VPC IP Address Manager (IPAM) policies for centralized control over IP allocation strategies and automated management at scale
  • Consider Bring Your Own IPv4 (BYOIPv4) if you have existing IP blocks, which can be managed through IPAM policies for cost optimization with automated enforcement

Operational Efficiency:

  • IPAM with prefix lists integration automates partner allowlisting by tracking contiguous IP blocks and automatically updating managed prefix lists
  • Share prefix lists with broker partners via AWS Resource Access Manager for streamlined whitelisting
  • Use IPAM policies to map NAT Gateway public IPv4 address allocation to designated pools for centralized management

Alternative Considerations

NAT Instances: Not recommended due to higher operational overhead, single points of failure, and manual scaling requirements compared to managed NAT Gateway service.

Transit Gateway: Useful for VPC interconnection but doesn't directly solve the fixed outbound IP requirement; still requires NAT Gateways for internet egress with fixed IPs.

EKS-based routing: While technically possible using pod-level networking and egress gateways, it adds complexity without addressing the fundamental requirement more efficiently than VPC-level NAT Gateway architecture.

Trading Platform Specific Considerations

For your low-latency trading requirements:

  • NAT Gateway provides consistent, predictable performance with automatic scaling
  • Deploy NAT Gateways in the same Availability Zone as workloads to minimize latency
  • Regional availability mode reduces complexity while maintaining availability
  • Monitor NAT Gateway metrics and IPAM pool utilization for capacity planning

This architecture scales from hundreds to thousands of customers while maintaining operational simplicity, cost predictability, and the deterministic fixed IP requirement essential for broker API whitelisting.
Sources
Generate a static outbound IP address using a Lambda function, Amazon VPC, and a serverless architecture - AWS Prescriptive Guidance
Build scalable IPv4 addressing with AWS NAT gateway in regional availability mode, Amazon VPC IPAM policies and Prefix Lists | Networking & Content Delivery

answered 23 days ago
EXPERT
reviewed 23 days ago
0

Building on Florian's proxy fleet approach and the cell-based architecture discussed in the comments, two implementation details that might help:

BYOIP cost savings: BYOIP addresses are exempt from the $0.005/hr public IPv4 charge. At 1000 IPs, that's roughly $3,600/month in address fees you avoid on top of eliminating NAT GW costs. The trade-offs: you need /24 blocks (minimum per BYOIP provision) registered with ARIN/RIPE/APNIC, a Route Origin Authorization, and provisioning takes up to 3 weeks. Worth running the numbers for your specific scale. BYOIP documentation has the full requirements.

ENI failover within cells: If you put each cell's EIP pool on a dedicated secondary ENI with DeleteOnTermination set to false, failover to the standby becomes a single ENI reattachment rather than individually reassociating each EIP. The ENI persists with all EIP associations intact and you attach it to the standby in one call. Constraint: ENIs can't move cross-AZ. ENI failover pattern.

References:

AWS
answered 14 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.