I want to use Amazon CloudWatch metrics and access logs to troubleshoot HTTP 502 "Bad gateway" errors that I receive with my Application Load Balancer.
Resolution
Identify the source of the HTTP 502 errors
There are several possible causes for HTTP 502: Bad gateway errors. The source of the error can be your target or your load balancer. To identify the source of the error, use CloudWatch metrics or access logs.
Use CloudWatch metrics
If the HTTPCode_ELB_502_Count metric has data points, then your load balancer is the source of the error. If the HTTPCode_Target_5XX_Count metric has data points, then your target is the source of the error.
Use access logs
If the elb_status_code is "502" and the target_status_code is "-", then your load balancer is the source of the error. If the elb_status_code is "502" and the target_status_code is "502", then your target is the source of the error.
Troubleshoot HTTP 502 errors
To determine the cause, filter the access logs by elb_status_code = "502" and target_status_code. Then, complete the resolution for the issue that you're experiencing.
The load balancer receives a TCP RST from the target when the load balancer tries to establish a connection
You might receive a TCP RST message from the target when you establish a connection. This message appears when the load balancer can't establish a TCP 3-way handshake with the target. As a result, the load balancer can't forward the user request to the target.
Check the TargetConnectionErrorCount metric for data points that show that the target is rejecting connections from the load balancer with a TCP RST.
In the access logs, set the request_processing_time, target_processing_time, and response_processing_time fields to a -1 value. When you set the value to -1, the load balancer can't send the request to the target because you must connect the load balancer.
The following is an example of an access log entry that has the request_processing_time, target_processing_time and response_processing_time set to -1:
http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 -1 -1 -1 502 - 86 155 "GET http://example.com:80/ HTTP/1.1" "curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067" Root=1-58337262-36d228ad5d99923122bbe354"
The load balancer receives an unexpected response from the target, such as "ICMP Destination unreachable (Host unreachable)", when the load balancer tries to connect
Set the request_processing_time, target_processing_time and response_processing_time fields in the access logs to a -1 value. Then, confirm that the load balancer subnets allow traffic to the targets on the target port.
The target closes the connection with a TCP RST or a TCP FIN message when the load balancer has an outstanding request to the target
The load balancer receives a request and forwards the request to the target. The target starts to process the request, but closes the connection to the load balancer too early. This occurs when the duration of the keep-alive timeout that you configured on the target is shorter than the idle timeout value of the load balancer. Make sure that the duration of the keep-alive timeout is longer than the idle timeout value.
The target response is malformed or contains HTTP headers that aren't valid
To troubleshoot the target response, perform a packet capture on the target for the timeframe of the issue.
To perform a packet capture for Linux, run the following command:
sudo tcpdump -i any -w filename.pcap
Important: If the target instance has a large amount of traffic, then the Packet Capture (PCAP) file that the tcpdump collection generates might affect your disk space. A large amount of traffic can also affect the services of your target instance.
For Windows, download and use the Wireshark application on the Wireshark website.
To use tcpdump to test packet capture samples, see How do I troubleshoot network performance issues between Amazon Elastic Compute Cloud (Amazon EC2) Linux or Windows instances in an Amazon Virtual Private Cloud (Amazon VPC) and an on-premises host over the internet gateway?
The load balancer encounters an SSL handshake error when it connects to a target
The TCP connection from the load balancer to the target's HTTPS listener is successful, but the subsequent SSL handshake encounters an error. As a result, the load balancer can't forward the request to the target.
If the target group uses the HTTPS protocol, then perform a packet capture on the target for the timeframe of the issue.
Your server must use a TLS cipher suite that the security policy that your load balancer uses for backend connections supports.
The deregistration delay period passed for a request that a deregistered target manages
In your AWS CloudTrail events, check for the DeregisterTargets API action during the timeframe of the issue. If the target deregistered too early, then an HTTP 502 error occurs. To resolve the issue, increase the deregistration delay period so that lengthy operations can complete.
Troubleshoot HTTP 502 errors when the target is a Lambda function
For requests that fail to an AWS Lambda function, check the Lambda error reason codes in the error_reason field of the load balancer's access logs.
The target is a Lambda function, and the response body exceeds 1 MB
To confirm the issue, check whether there's a data point for the LambdaUserError metric. Or, check whether the error_reason field in the load balancer access log is set to LambdaResponseTooLarge.
To troubleshoot the issue, update your Lambda code and add error handling logic.
The target is a Lambda function that didn't respond before the configured timeout
To confirm the issue, check whether there's a data point for the LambdaUserError metric. Or, check whether the error_reason field in the load balancer access log is set to LambdaUnhandled.
The target is a Lambda function that returned an error, or Lambda throttled the function
To confirm whether Lambda throttled the function, check the Throttles metric for a data point. If Lambda throttled the function, then use AWS Service Quotas to request a quota increase for Lambda concurrent execution.
For more information, see Understanding Lambda invoke throttling limits.