I get health check failures on my Amazon Elastic Container Service (Amazon ECS) tasks on AWS Fargate.
Short description
Health check failures for Amazon ECS tasks on Fargate can occur for the following reasons:
- Container health check errors
- A target that's in an Availability Zone that's deactivated for the load balancer
- Resource constraints for CPU or memory
- Misconfigured health check settings
- Network connectivity issues
When a task fails the load balancer health check, you receive one of the following errors in your Amazon ECS service event message:
- "(service AWS-service) (port 8080) is unhealthy in (target-group arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789) due to (reason Health checks failed with these codes: [5xx]/[4xx]/[3xx]) or (Request timed out)."
- "(service AWS-Service) (task eaa3ec9e9f104070b461490987654321) failed container health checks."
- "(service AWS-Service) (instance 10.122.144.145) (port 8080) is unhealthy in (target-group arn:aws:elasticloadbalancing:ap-south-1:120987654321:targetgroup/AWS-Service-TG/159c835dc9d8cf84) due to (reason Target is in an Availability Zone that is not enabled for the load balancer)."
On the Amazon ECS task console, you might also receive the error, "Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:ap-south-1:111111111111:targetgroup/aws-targetgroup/123456789)".
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Troubleshoot an HTTP [5xx]/[4xx]/[3xx] or (Request timed out) error
Check your load balancer's health check configuration
You receive these errors when the Elastic Load Balancing (ELB) load balancer expects a 200 HTTP code but instead receives a 3xx, 4xx, or 5xx error code. As a result, the load balancer stops the task and marks it as Unhealthy. To resolve this issue, configure the correct health check path, or modify the health check settings.
For information about HTTP errors in your Application Load Balancer, see The load balancer generates an HTTP error.
If the Amazon ECS task doesn't respond to the load balancer health check within the timeout period, then you receive the Request timed out error.
Verify that the load balancer can perform health checks
Confirm the following configurations:
- The security group that's associated with your load balancer allows outbound traffic to your Amazon ECS task elastic network interfaces on the registered container port.
- The Amazon ECS security group allows all inbound traffic on the registered container port from the security group that's associated with your load balancer.
Configure your timeout parameters
If your Amazon ECS tasks are slow to respond to health checks, then it's a best practice to increase the value of the HealthCheckGracePeriod parameter.
If your application typically takes a long time to respond to health checks, then you can also increase the target group's health check timeout value. This parameter defines how long Amazon ECS tasks have to respond to health checks.
Check your resource use
If your application requires backend connectivity to a database, then check for high resource use at the database level. High resource use can cause issues during the application initialization process and cause errors during the load balancer health check.
Troubleshoot slow response
Use ECS Exec to check how the application responds on the health check path and port. To verify that there's a successful response from the backend without delay, run the following command:
curl -iv localhost:container-port/path
Note: Replace container-port with the port that your container uses and path with the health check path.
Check your health check configuration
First, make sure that the file is in the path that you configured in the target group health check.
If your file is in the right location but health checks are still failing, then complete the following steps:
- Open the Amazon Elastic Compute Cloud (Amazon EC2) console.
- In the navigation pane, under Load balancing, choose Target groups.
- Select the target group.
- Choose the Health checks tab, and then choose Edit.
- On the Edit target group page, for Success code, enter the HTTP code from your error message. For example, enter 404.
- Choose Save.
Important: Make sure that the target group protocol is set to the HTTP protocol that your application supports.
Check for resource constraints in Amazon ECS
Check the Amazon ECS service metrics for CPU and memory utilization. If CpuUtilization or MemoryUtilization is consistently high, then you experience performance issues that can cause health checks to fail.
To reduce resource usage, take the following actions:
- Allocate more CPU and memory resources in the task definition.
- To horizontally scale your configuration, increase the number of running tasks to evenly distribute the workload across multiple instances and provide better performance during sudden spikes.
- Regularly assess the system's performance under various load conditions to identify potential bottlenecks and better resource allocation.
- Monitor your application's network usage, connection patterns, and response times to identify potential bottlenecks and optimize inefficient network usage patterns.
It's a best practice to frequently monitor your system's behavior and adjust your configuration as needed.
Troubleshoot a failed container health check error
If the ECS container instances in your task use health checks that your service can't pass, then you receive the failed container health checks error. To troubleshoot this issue, see How do I troubleshoot the container health check failures for Amazon ECS tasks?
Troubleshoot Availability Zone issues with the target Amazon ECS tasks
If you register targets in an Availability Zone that you don't activate, then the registered targets don't receive traffic. For more information, see Availability Zones and load balancer nodes.
For example, your Amazon ECS subnets belong to the us-east-1x and us-east-1y Availability Zones. If the load balancer has the us-east-1p and us-east-1q Availability Zones activated, then you receive an error.
To update Amazon ECS to use the subnets for the load balancer's Availability Zone, run the update-service AWS CLI command:
aws ecs update-service --cluster cluster-name --service service-name --region region-code --network-configuration '{"awsvpcConfiguration": {"subnets": ["subnet-1","subnet-2"],"securityGroups": ["sg-abcdxyz"]}}'
Note: Replace cluster-name with your cluster name, service-name with your service name and region-code with your AWS Region. Also, replace subnet-1 and subnet-2 with your subnets and sg-abcdxyz with your security group.
Related information
Container definitions
Automatically scale your Amazon ECS service
Amazon ECS best practices
Monitoring Amazon ECS