Skip to content

NLB Timeout Despite Healthy Targets & Verified Network Path (No VPC Flow Logs for NLB Traffic)

0

Hello AWS Community,

I'm facing a persistent connection timeout issue with an internet-facing Network Load Balancer (NLB) configured for TCP port 22, and I'm seeking advice on further diagnostics.

Setup: Internet-facing NLB spanning two public subnets in different AZs (us-east-1). NLB uses static Elastic IPs (EIPs) assigned via subnet_mapping. A TCP listener on port 22 forwards to a Target Group (TG). The TG targets the private IP addresses of a VPC Endpoint for an AWS Transfer Family server (SFTP, VPC endpoint type). Target Group has preserve_client_ip enabled. Public subnets have a route table with 0.0.0.0/0 pointing to an Internet Gateway (IGW). The default Network ACL is associated with the public subnets, allowing all inbound and outbound traffic. Problem: Connections initiated from external clients (tested from multiple networks/ISPs) using nc <NLB_EIP> 22 or nc <NLB_DNS_Name> 22 consistently time out.

Diagnostics Performed: NLB/Target Health: describe-load-balancers shows the NLB is active and correctly configured with its EIPs in both specified AZs/subnets. describe-target-health consistently shows the Transfer Family VPC endpoint IPs as healthy. Routing & NACLs: Verified public route table routes to IGW, and default NACLs allow all traffic. Security Groups: The VPC Endpoint's Security Group allows TCP/22 from the necessary source IPs (client/NLB subnets). (This is technically correct, but less relevant as the timeout happens before this point). Public Subnet Test: Launched a temporary EC2 instance in the same public subnet as one of the NLB nodes. Configured its Security Group to allow TCP/22 only from my external client IP. Successfully initiated an SSH connection (got past TCP handshake, failed on key auth as expected) to this test instance's public IP. This confirms the path from my client -> IGW -> Public Route Table -> Public Subnet NACL -> Instance works for TCP/22. VPC Flow Logs: Enabled Flow Logs for the VPC, logging ACCEPT and REJECT traffic. Queries in CloudWatch Logs Insights show NO entries (neither ACCEPT nor REJECT) for traffic originating from my external client IP destined for either of the NLB's public EIPs on port 22 during the timeout periods. Flow logs do show the successful traffic to the test EC2 instance. NLB Recreation: Used terraform taint to destroy and recreate the NLB listener, and subsequently the entire NLB resource. The new NLB instance exhibits the exact same timeout behavior, and Flow Logs still show no relevant traffic reaching its ENIs.

Core Question: Given that the NLB targets are healthy, the network path into the public subnets is functional (proven by the test EC2 instance), and security controls appear permissive, why would connection attempts to the NLB's public EIPs time out without any corresponding entries appearing in VPC Flow Logs for the NLB's network interfaces?

This seems to indicate the traffic isn't reaching the NLB ENIs at the VPC level, despite the successful test with the EC2 instance. What further steps can be taken to diagnose potential issues within the NLB service itself, EIP mapping/propagation, or internal AWS routing affecting only the NLB EIPs?

Thanks for any insights!

1 Answer
0
Accepted Answer

According to the Documentation:

Client IP preservation is not supported when a target group contains AWS PrivateLink ENIs, or the ENI of another Network Load Balancer. This will cause loss of communication to those targets.

Try to disable preserve_client_ip attribute on your target group and check again if it starts working to confirm.

AWS
EXPERT
answered 6 months ago
AWS
EXPERT
reviewed 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.