How do I troubleshoot connectivity issues when I use NAT gateway on my private Amazon VPC?

4 minute read
1

I want to troubleshoot connectivity issues when I use NAT gateway on my private Amazon Virtual Private Cloud (Amazon VPC).

Short description

Private subnet resources might experience connectivity timeouts, sudden connection drops, or connectivity slowness for the following reasons:

  • Network access control lists (network ACL) rules
  • ErrorPortAllocation error on the NAT gateway
  • Client instance port exhaustion
  • IdleTimeoutCount error to release capacity
  • Bandwidth limitation per NAT gateway

Resolution

Troubleshoot connectivity timeouts, sudden connection drops, or connectivity slowness based on the following causes:

Network ACL rules

Make sure that the network ACL that's associated to the public subnet of the NAT gateway allows traffic from the ephemeral port range (1024-65535). If the network ACL allows a subset of the port range and the instances use an out of range source port, then traffic is dropped. For more information, see Example: VPC with servers in private subnets and NAT.

ErrorPortAllocation error on the NAT gateway

NAT gateways support a maximum of 55,000 simultaneous connections to each destination. If this threshold is crossed, then new connections to the destination fail and the ErrorPortAllocation metric for the NAT gateway increases in Amazon CloudWatch. To resolve this issue, associate up to eight IPv4 addresses to your NAT gateways to increase your limit. You can associate one primary and seven secondary IPv4 addresses.

Note: Secondary IPv4 addresses increase the number of available ports. This means that the number of concurrent connections that your workloads can use to connect to a NAT gateway to establish is also increased.

For more information, see How do I resolve the ErrorPortAllocation error on my NAT gateway?

Client instance port exhaustion

Check if the client instances in the private subnet have reached their operating system (OS) connection limits:

View the number of active connections:

Linux:

netstat -ano | grep ESTABLISHED | wc --lnetstat -ano | grep TIME_WAIT | wc --l

Windows:

netstat -ano | find /i "estab" /cnetstat -ano | find /i "TIME_WAIT" /c

If the preceding command returns a value that's near the allowed local port range (source port for client connections), then you might have port exhaustion.

To reduce port exhaustion, complete the following tasks:

  • Resolve any application-level issues that drain the available connections.
  • Increase the operating system local (ephemeral) port range:
net.ipv4.ip_local_port_range = 1025 61000

Note: The total port range number might or might not help troubleshoot the port allocation issue because of silent closures of connections.

IdleTimeoutCount error to release capacity

If a connection that uses a NAT gateway is idle for 350 seconds or more, then the connection times out. You also will see a spike on the IdleTimeoutCount metric. When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection. The NAT gateway doesn't send a FIN packet.

To resolve or work around the IdleTimeoutCount error, complete the following tasks:

  • Use the IdleTimeoutCount metric in Amazon CloudWatch to monitor for increases in idle connections. Make sure that you configure CloudWatch Contributor Insights to get visibility on the top contributors of clients with processes in the Idle state.
  • Close idle connections from clients to release capacity.
  • Initiate more traffic over the connection.
  • Turn on TCP keepalive on the instance with a value that's less than 350 seconds.

Bandwidth limitation from the NAT gateway

NAT gateways support 5 Gbps of bandwidth and automatically scale up to 100 Gbps. If the networking throughput metrics across all instances in the NAT gateway are equal to or more than 100 Gbps, then traffic slows down. For more information, see NAT gateway metrics and dimensions.

To resolve or work around a bandwidth limitation from the NAT gateway, split the resources between multiple subnets and create multiple NAT gateways.

For more information, see How can I use Amazon CloudWatch metrics to identify NAT gateway bandwidth issues?

Related information

How do I resolve intermittent connection issues when using a NAT instance?

Troubleshoot NAT gateways

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago
2 Comments

The Linux command above doesn't work... needs a line break or something to look like this:

netstat -ano | grep ESTABLISHED | wc --l

netstat -ano | grep TIME_WAIT | wc --l

Justin
replied 3 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 3 months ago