How do I troubleshoot Application Load Balancer HTTP 502 errors?

5 minute read
3

I want to use Amazon CloudWatch metrics and access logs to troubleshoot HTTP 502 "Bad gateway" errors that I receive with my Application Load Balancer.

Resolution

Identify the source of the HTTP 502 errors

There are several possible causes for HTTP 502: Bad gateway errors. The source of the error can be your target or your load balancer. To identify the source of the error, use CloudWatch metrics or access logs.

Use CloudWatch metrics

If the HTTPCode_ELB_502_Count metric has data points, then your load balancer is the source of the error. If the HTTPCode_Target_5XX_Count metric has data points, then your target is the source of the error.

Use access logs

If the elb_status_code is "502" and the target_status_code is "-", then your load balancer is the source of the error. If the elb_status_code is "502" and the target_status_code is "502", then your target is the source of the error.

Troubleshoot HTTP 502 errors

To determine the cause, filter the access logs by elb_status_code = "502" and target_status_code. Then, complete the resolution for the issue that you're experiencing.

The load balancer receives a TCP RST from the target when the load balancer tries to establish a connection

You might receive a TCP RST message from the target when you establish a connection. This message appears when the load balancer can't establish a TCP 3-way handshake with the target. As a result, the load balancer can't forward the user request to the target.

Check the TargetConnectionErrorCount metric for data points that show that the target is rejecting connections from the load balancer with a TCP RST.

In the access logs, set the request_processing_time, target_processing_time, and response_processing_time fields to a -1 value. When you set the value to -1, the load balancer can't send the request to the target because you must connect the load balancer.

The following is an example of an access log entry that has the request_processing_time, target_processing_time and response_processing_time set to -1:

http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 -1 -1 -1 502 - 86 155 "GET http://example.com:80/ HTTP/1.1" "curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067" Root=1-58337262-36d228ad5d99923122bbe354"

The load balancer receives an unexpected response from the target, such as "ICMP Destination unreachable (Host unreachable)", when the load balancer tries to connect

Set the request_processing_time, target_processing_time and response_processing_time fields in the access logs to a -1 value. Then, confirm that the load balancer subnets allow traffic to the targets on the target port.

The target closes the connection with a TCP RST or a TCP FIN message when the load balancer has an outstanding request to the target

The load balancer receives a request and forwards the request to the target. The target starts to process the request, but closes the connection to the load balancer too early. This occurs when the duration of the keep-alive timeout that you configured on the target is shorter than the idle timeout value of the load balancer. Make sure that the duration of the keep-alive timeout is longer than the idle timeout value.

The target response is malformed or contains HTTP headers that aren't valid

To troubleshoot the target response, perform a packet capture on the target for the timeframe of the issue.

To perform a packet capture for Linux, run the following command:

sudo tcpdump -i any -w filename.pcap

Important: If the target instance has a large amount of traffic, then the Packet Capture (PCAP) file that the tcpdump collection generates might affect your disk space. A large amount of traffic can also affect the services of your target instance.

For Windows, download and use the Wireshark application on the Wireshark website.

To use tcpdump to test packet capture samples, see How do I troubleshoot network performance issues between Amazon Elastic Compute Cloud (Amazon EC2) Linux or Windows instances in an Amazon Virtual Private Cloud (Amazon VPC) and an on-premises host over the internet gateway?

The load balancer encounters an SSL handshake error when it connects to a target

The TCP connection from the load balancer to the target's HTTPS listener is successful, but the subsequent SSL handshake encounters an error. As a result, the load balancer can't forward the request to the target.

If the target group uses the HTTPS protocol, then perform a packet capture on the target for the timeframe of the issue.

Your server must use a TLS cipher suite that the security policy that your load balancer uses for backend connections supports.

The deregistration delay period passed for a request that a deregistered target manages

In your AWS CloudTrail events, check for the DeregisterTargets API action during the timeframe of the issue. If the target deregistered too early, then an HTTP 502 error occurs. To resolve the issue, increase the deregistration delay period so that lengthy operations can complete.

Troubleshoot HTTP 502 errors when the target is a Lambda function

For requests that fail to an AWS Lambda function, check the Lambda error reason codes in the error_reason field of the load balancer's access logs.

The target is a Lambda function, and the response body exceeds 1 MB

To confirm the issue, check whether there's a data point for the LambdaUserError metric. Or, check whether the error_reason field in the load balancer access log is set to LambdaResponseTooLarge.

To troubleshoot the issue, update your Lambda code and add error handling logic.

The target is a Lambda function that didn't respond before the configured timeout

To confirm the issue, check whether there's a data point for the LambdaUserError metric. Or, check whether the error_reason field in the load balancer access log is set to LambdaUnhandled.

The target is a Lambda function that returned an error, or Lambda throttled the function

To confirm whether Lambda throttled the function, check the Throttles metric for a data point. If Lambda throttled the function, then use AWS Service Quotas to request a quota increase for Lambda concurrent execution.

For more information, see Understanding Lambda invoke throttling limits.

AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago
4 Comments

Hi,

In this case -

The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target

You instruct the following - The load balancer receives a request and forwards it to the target. The target receives the request and starts to process it, but closes the connection to the load balancer too early. This usually occurs when the duration of the keep-alive timeout for the target is shorter than the idle timeout value of the load balancer. Make sure that the duration of the keep-alive timeout is greater than the idle timeout value.

where do i configure the keep-alive timeout for the connection between the alb and the target exactly ? neither do i see this in the alb attributes console nor in target groups attributes console.

replied 10 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 10 months ago

Hi, Can we get someone from AWS to either update the article or give us fine-grained instruction on how to do this ? It's been a week since the comment above was posted...

replied 10 months ago

@Luke - The keep alive timeout would be configured on whatever software you are using as your target, for Apache & Nginx, see https://repost.aws/knowledge-center/apache-backend-elb

replied 9 months ago