My Amazon CloudWatch metric SurgeQueueLength for my Classic Load Balancer has an increased maximum statistic. Clients also receive HTTP 503 Service Unavailable or HTTP 504 Gateway Timeout errors when they try to connect to my Classic Load Balancer. I want to troubleshoot Elastic Load Balancing (ELB) capacity issues.
Short description
The Classic Load Balancer metric SurgeQueueLength measures the total number of requests queued by your Classic Load Balancer. An increased maximum statistic for SurgeQueueLength shows that backend systems can't process incoming requests as fast as the requests are received. Possible reasons for a high SurgeQueueLength metric include:
- Overloaded Amazon Elastic Compute Cloud (Amazon EC2) instances behind the Classic Load Balancer can't process all incoming requests
- Application dependency issues because of external resource performance issues
- Maximum allowable connection limits for instances
When requests exceed the maximum SurgeQueueLength, the SpilloverCount metric starts to measure rejected requests. The maximum SurgeQueueLength is 1024.
Resolution