Skip to content

How do I troubleshoot container health check failures for Amazon ECS tasks?

3 minute read
1

My Amazon Elastic Container Service (Amazon ECS) task fails the container heath check.

Short description

If the Amazon ECS containers in your task use health checks that your service can't pass, then you receive the following error:

"(service AWS-Service) (task ff3e71a4-d7e5-428b-9232-2345657889) failed container health checks."

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Locally test the container to make sure that it passes the container health check

Before you provision your container to Amazon ECS, make sure that it works as expected. Test your container with HEALTHCHECK on the Docker Docs website. Confirm that the container passes the health check in the Dockerfile. Then, specify the health check configuration in the task definition to allow the Amazon ECS container agent to monitor and report the health check.

Note: Amazon ECS doesn't monitor Docker health checks that are embedded in a container image and aren't specified in the container definition. Health check parameters that are specified in a container definition override Docker health checks that exist in the container image.

Confirm that you use the correct syntax for your Amazon ECS tasks

Confirm that the command that you pass to the container is accurate and that you use the correct syntax for your commands. Don't separate the health check command with double quotes, for example ["CMD-SHELL", "healthcheck.sh", "||", "exit 1"]. Instead, use the following command syntax:

["CMD-SHELL", "healthcheck.sh || exit 1"]

Give your container enough time to initiate

If your container takes a long time to initiate, then it might fail the container health check. Set the startPeriod in the advanced container definition parameter to meet your configuration needs.

For tasks that are running for a long time, check your application logs

If your container is running for a long time and fails the container health check, then check your application logs. If your task uses the awslogs log driver, then check your application logs in Amazon CloudWatch.

Note: AWS Fargate is a managed service, so you can't access the underlying infrastructure. To troubleshoot this issue for Fargate, launch your Amazon ECS tasks in Amazon Elastic Compute Cloud (Amazon EC2). Then, use SSH to connect to your Amazon EC2 instances. You can also use ECS Exec to interact directly with your Amazon ECS containers.

Related information

How do I get my Amazon ECS tasks that use the Amazon EC2 launch type to pass the Application Load Balancer health check?

5 Comments

Is there a way to alert on this health check ?

replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
MODERATOR
replied 3 years ago

Exactly my thinking. The current state of these logs, UI is horrible.

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
EXPERT
replied 2 years ago

Yep. The UI for knowing WHY something failed is still horrible. Amazon has done nothing to help out in this area. But, our health check commands are usually to blame. Most are just one line linux curl or wget commands with a || exit 1 tacked onto the end. That will yield only the frustratingly unhelpful message "health check return exit code 1" in the ECS console, and nothing more because that is the only thing returned by the command exit. So the use of exit 1 is really the problem here. What we want is a command that prints out something and returns a non-zero code if it fails, but prints nothing and returns 0 if everything passes. This probably means if you are going to use a health check command you probably should write a proper shell script that can respond all of the special conditions your health check procedure will encounter, and print out good debugging messages when it fails. The other option is to forgo the health check commands all together because typically ELB will health check your app for you. You won't get auto-restart of containers, but you'll have more stability from situations like this where debugging the problem is made harder by using poorly chosen health check commands.

replied 5 months ago