Skip to content

S3 Express One Zone throughput capped at NAT Gateway bandwidth: use a VPC gateway endpoint

12 minute read
Content level: Intermediate
0

Customers running high-throughput workloads on S3 Express One Zone in private subnets may see aggregate throughput plateau well below their instance's network capacity. The cause may be traffic routing through a NAT Gateway instead of using a VPC gateway endpoint. This article explains the symptom, the mechanism, how to diagnose it in two minutes, and how to fix it for free.

You moved a high-throughput workload to Amazon S3 Express One Zone for its single-digit millisecond latency. Your compute is in the same Availability Zone as the directory bucket, exactly as recommended. But aggregate throughput plateaus far below your instance's network bandwidth.

The usual reflex, adding more parallel connections, does nothing, or makes the tail latency worse.

Before you open a support case, check one thing: how your packets leave the VPC to reach the bucket. In most cases where I see this, the workload is in a private subnet and is reaching S3 Express through a NAT Gateway instead of through a VPC gateway endpoint. That single routing detail is the difference between saturating your network pipe and hitting a hard throughput ceiling.

Scope: this article focuses on the private-subnet-plus-NAT case, which is where I see the most dramatic impact. If your compute runs in a public subnet, the picture is different, and the What about public subnets? section covers what I measured there.

This article shows what the symptom looks like, why it happens, how to confirm it in two minutes, and how to fix it for free.

TL;DR

  • The fix is a VPC gateway endpoint for S3 Express (com.amazonaws.<region>.s3express), associated with your subnet's route table. It is free, with no hourly and no per-GB charge (Amazon VPC pricing).
  • Without it, a workload in a private subnet reaches the bucket through your NAT Gateway, whose bandwidth starts at 5 Gbps and scales toward 100 Gbps (NAT gateway basics). For a workload built to push tens of Gbps, that path becomes the bottleneck.
  • The symptom is a throughput ceiling, not a per-request latency penalty. Adding connections past the ceiling inflates tail latency instead of adding throughput.
  • AWS recommends the gateway endpoint as "the most optimal networking path" for S3 Express (tutorial).
  • S3 Express uses its own endpoint service. com.amazonaws.<region>.s3express is separate from the standard com.amazonaws.<region>.s3 gateway endpoint. If you already have a standard S3 gateway endpoint, it does not route S3 Express traffic; you need the s3express one.

Background: how S3 Express traffic leaves your VPC

S3 Express One Zone stores objects in a directory bucket pinned to a single Availability Zone, so you can co-locate storage next to compute for low-latency access (overview). You reach it through Regional and Zonal API endpoints that are different from standard S3 endpoints (networking for directory buckets).

Those zonal endpoints (for example s3express-euw1-az1.eu-west-1.amazonaws.com) resolve to public IP addresses, the same as standard Amazon S3. That detail matters: it means an instance can reach the endpoint whether or not you deploy a gateway endpoint. It just takes a different network path depending on your routing.

  • With a gateway endpoint: your subnet's route table gets a route for the S3 Express prefix list pointing at the endpoint, and traffic stays on the AWS network on the optimal path. AWS documents this as keeping traffic on the AWS network "to reduce the amount of time your packets spend on the network," at no additional cost (AZ networking).
  • Without one, from a private subnet: the bucket is reached via your default route (your NAT Gateway), which then egresses to the endpoint. Your S3 Express traffic now shares, and is bounded by, the NAT Gateway.

The AWS tutorial is clear: gateway endpoints "allow traffic to reach S3 Express One Zone without traversing a NAT Gateway... We strongly recommend using gateway endpoints as they provide the most optimal networking path" (Step 1: configure a gateway VPC endpoint).

What the ceiling looks like

To isolate the effect, I ran an identical GET workload against the same directory bucket from two instances in the same Availability Zone as the bucket, differing only in the network path:

  • Endpoint path: instance in a subnet whose route table has the S3 Express gateway endpoint.
  • NAT path: instance in a private subnet whose only route to the endpoint is the NAT Gateway.

Both used the same benchmark (many parallel whole-object GETs, sweeping the number of connections) on a high-bandwidth instance (a 150 Gbps-class NIC), reading 32 MB objects. Throughput is the effective application throughput; CPU is the instance's peak during the run; p99 is the per-request latency at the 99th percentile.

ConnectionsEndpoint GbpsNAT GbpsEndpoint p99NAT p99NAT client CPU
11.01.0255 ms253 ms~3%
87.97.9267 ms266 ms~3%
1615.413.7293 ms345 ms~5%
3229.624.2313 ms384 ms~11%
4844.322.5319 ms899 ms~15%
6455.724.9326 ms1211 ms~17%

Read the bottom rows. The endpoint path keeps scaling, 55.7 Gbps at 64 connections and still climbing. The NAT path flattens at roughly 25 Gbps and goes no further. At that point the NAT client's CPU is only ~17%, so the instance is nearly idle: it is not the bottleneck. The network path is.

Two more things stand out:

  1. Per-request latency at low concurrency is identical on both paths (~250 ms for a 32 MB object at one connection). The gateway endpoint is not a per-request-latency optimization. Its value is throughput headroom.
  2. Adding connections past the ceiling hurts. On the NAT path, p99 latency climbs from ~345 ms to over 1.2 seconds as connections increase against the cap, while the endpoint path's p99 stays in the low-300 ms range (255 ms to 326 ms across the sweep). This is why "just add more parallelism" doesn't rescue a NAT-bound workload: past the bandwidth limit, extra requests queue rather than complete.

The ~25 Gbps figure is specific to this test. The general result is robust: a NAT Gateway has a finite, shared bandwidth that starts at 5 Gbps and scales toward 100 Gbps (NAT gateway basics), and a workload designed to push tens of Gbps will find that ceiling. The gateway endpoint removes the NAT from the path entirely.

Why parallelism stops helping: a quick mental model

Aggregate throughput is roughly:

throughput  ≈  concurrency  ×  object_size  /  per_request_latency

While there's spare bandwidth, increasing concurrency increases throughput linearly. You can see that in the upper rows of the table, where both paths track each other. But once you hit the NAT bandwidth ceiling, the path can't move bytes any faster. Adding concurrency no longer raises the numerator's effect; instead, requests wait for bandwidth, per_request_latency rises, and the two effects cancel. Throughput stays flat (or dips) and tail latency climbs. That's exactly the NAT-path behavior above.

What about public subnets?

If your compute is in a public subnet (default route to an internet gateway, no NAT), the result is different. I tested that path too: a same-AZ instance in a public subnet reaching the directory bucket with no gateway endpoint, versus the same instance with the endpoint.

In my tests, per-request latency was the same with and without the endpoint (~6.5 ms for a 512 KB GET, ~5.6 ms for 64 KB, at one connection), and the public-subnet path scaled throughput to match the endpoint path. The reason: the S3 Express zonal endpoint resolves to a public IP, and a public-subnet instance with an internet gateway route reaches it over the AWS network, so there's no NAT in the path to become a bottleneck.

So the dramatic throughput ceiling in this article is specific to the NAT path. That said, AWS still recommends the gateway endpoint on any path as "the most optimal networking path" (tutorial). If you are in a public subnet and still see high per-request latency, that points to something other than the public-vs-NAT distinction (for example a cross-AZ or cross-Region mismatch between compute and bucket, or a client configuration issue) and is worth investigating separately.

How to diagnose it in two minutes

1. Inspect the route table of the subnet your compute runs in. Look for a route whose destination is the S3 Express prefix list, such as (pl-68a54012) with a target of a gateway endpoint (vpce-0a1b2c3d4e5f67890):

aws ec2 describe-route-tables \
  --route-table-ids <your-subnet-route-table-id> \
  --query "RouteTables[0].Routes[?DestinationPrefixListId!=null]"
  • If you see a pl-... route pointing at a vpce-..., the gateway endpoint is in place.
  • If you don't, and your default route (0.0.0.0/0) points at a nat-..., your S3 Express traffic is going through the NAT Gateway. That's the issue.

Not sure whether you're in a public or private subnet? Check the target of the subnet's default route: if 0.0.0.0/0 points at an igw-... you're in a public subnet; if it points at a nat-... you're in a private subnet. This article's throughput ceiling applies to the NAT case.

2. Watch for the symptom signature:

  • Aggregate throughput plateaus well below your instance's documented network bandwidth.
  • Adding connections or processes doesn't raise throughput; p99/p99.9 latency rises instead.
  • Client CPU is low while throughput is stuck (the client isn't the bottleneck).

3. Check the NAT Gateway in CloudWatch. Look at the BytesOutToDestination and BytesInFromSource metrics for the NAT Gateway during the run. A plateau in bytes that coincides with your throughput ceiling is consistent with hitting the NAT's bandwidth limit (NAT gateway metrics and dimensions).

A useful tell from the data above: this is a throughput problem, not a latency problem and not service throttling. If your single-request latency is normal but you can't scale up aggregate throughput, suspect the path, not the service.

The fix

Create a gateway VPC endpoint for S3 Express and associate it with the route table of the subnet your compute runs in. There is no hourly or per-GB charge for gateway endpoints, and the VPC pricing page explicitly notes you can avoid NAT Gateway data-processing charges for traffic to S3 by using a gateway endpoint, so this often saves money as well.

Console steps (from the AWS tutorial):

  1. Open the Amazon VPC console and choose Endpoints → Create endpoint.
  2. For Services, filter by Type = Gateway and choose com.amazonaws.<region>.s3express.
  3. Choose your VPC.
  4. Select the route table(s) for the subnet(s) where your compute runs. A prefix-list route is added automatically.
  5. Choose an access policy (Full access, or a custom policy) and create the endpoint.

CLI equivalent:

aws ec2 create-vpc-endpoint \
  --vpc-id <your-vpc-id> \
  --service-name com.amazonaws.<region>.s3express \
  --vpc-endpoint-type Gateway \
  --route-table-ids <your-subnet-route-table-id>

After it's created, re-run the route-table check above; you should now see the pl-... -> vpce-... route. Re-run your workload and the throughput ceiling should lift toward your instance's network capacity.

One operational note: a gateway endpoint affects whichever route tables you associate it with. Make sure you associate it with the route table used by the subnet(s) your compute actually runs in. Associating it with an unrelated route table won't change your data path. Also use the AZ ID (for example use1-az1), not the AZ name, when lining up compute and bucket, because AZ names map to different physical zones per account (S3 Express AZs and Regions).

A common mistake: assuming an existing standard Amazon S3 gateway endpoint (com.amazonaws.<region>.s3) covers this. It does not. S3 Express uses a distinct service name, com.amazonaws.<region>.s3express, and you need a gateway endpoint for that service specifically for directory-bucket traffic to take the optimal path.

Takeaways

  • For S3 Express One Zone in a private subnet, the VPC gateway endpoint isn't about latency, it's about keeping high-throughput traffic off the NAT Gateway so it can scale to your instance's network capacity. In a public subnet I measured no latency or throughput penalty from omitting it, though AWS still recommends it as the optimal path.
  • The cause is a directory bucket accessed from a private subnet with no gateway endpoint: it works fine at low load and silently caps under the high-concurrency pattern Express is built for.
  • Diagnose by checking the subnet route table for a pl-... -> vpce-... route, and by watching for a throughput plateau with low client CPU and rising tail latency.
  • The fix is free, fast, and AWS-recommended.

Methodology and caveats

The measurements above come from a controlled test I ran in a single AWS Region and account: a directory bucket and two same-AZ instances (150 Gbps-class NICs), reading pre-seeded 32 MB objects with a parallel-GET benchmark derived from a public AWS workshop tool, swept across connection counts. These are measured results from instances I deployed, not synthetic or illustrative figures. Each run transferred all requested objects with zero failures; the benchmark aborted and discarded any run where a request failed, so no partial-data artifacts are included. A separate probe with SDK retries disabled confirmed there were no 503 SlowDown (throttling) responses on either path, so the throughput ceiling is the network path, not the service. The absolute numbers are specific to this configuration and single run; a NAT Gateway's bandwidth also scales over time and with multiple subnets/addresses, so the exact cap you observe will vary (NAT gateway basics). S3 Express One Zone directory buckets support very high request rates, up to 2 million GET and 200,000 PUT transactions per second (performance best practices), so for most workloads the limiting factor is the network path out of your VPC, which is exactly what the gateway endpoint addresses.