I want to troubleshoot a failed Amazon Elastic Container Service (Amazon ECS) task in an ECS cluster.
Short description
Your containers can exit because of image issues, application issues, resource constraints, or other issues.
For a task failure because of image issues, see How do I resolve the "Image does not exist" error when my tasks fail to start in my Amazon ECS cluster?
For AWS Fargate tasks that unexpectedly stop, see Amazon ECS stopped tasks error messages.
To analyze other issues that cause an ECS task not to start, you can use the AWSSupport-TroubleshootECSTaskFailedToStart AWS Systems Manager automation runbook.
You can also manually troubleshoot your Amazon ECS cluster.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Run the runbook to troubleshoot your cluster
Before you start the AWSSupport-TroubleshootECSTaskFailedToStart runbook, make sure that your IAM user or role has the required permissions. For more information, see the Required IAM permissions section of AWSSupport-TroubleshootECSTaskFailedToStart.
To launch the runbook, complete the following steps:
- Open the AWSSupport-TroubleshootECSTaskFailedToStart document on the AWS Systems Manager console.
- Choose Execute automation.
- Enter the following values for the input parameters:
(Optional) AutomationAssumeRole: The ARN of the IAM role that allows Automation to perform actions for you. If you don't specify a role, then the automation uses the permissions of the user that starts the runbook.
ClusterName: The name of the ECS cluster where the task failed to start.
(Optional) CloudwatchRetentionPeriod: The retention period, in days, to store the AWS Lambda function logs in Amazon CloudWatch Logs.
Note: Define this value when the analysis determines that you must test network connectivity. The default value is 30 days. Other valid values are 1, 3, 5, 7, 14, 60, and 90.
TaskId: The ID of the most recently failed task.
- Choose Execute.
Note: For more information about the runbook's steps, see the Document Steps section of AWSSupport-TroubleshootECSTaskFailedToStart.
- Review the detailed results in the Outputs section. The output includes the following information:
TaskFailureReason: An analysis of the reason for the final task failure.
ExecutionLogs: The output logs of each step that the runbook performed.
ENI_Deletion_Message.Status: The status of Lambda virtual private cloud (VPC) elastic network interface deletion when the runbook creates a Lambda function to test network connectivity.
Note: If ENI_Deletion_Message shows that the network interface wasn't deleted, then manually delete the network interface.
Manually troubleshoot your ECS cluster
Check for diagnostic information in the service event log. Also, use the Amazon ECS console or the AWS CLI to check stopped tasks for errors.
To address memory constraint issues, see How can I allocate memory to tasks in Amazon ECS?
If you already configured a log driver, then check your application logs for application issues. For example, if you configured the awslogs log driver in your task definition, then view the awslogs container logs in Amazon CloudWatch. Or, use the log configuration options in your task definition to send logs to a supported log driver for the container.
ECS tasks other than Fargate
If you use the default JSON file log driver with the Amazon Elastic Compute Cloud (Amazon EC2) launch type, then run the following docker logs command:
docker logs dc7240fe892a
Note: Replace dc7240fe892a with your container ID.
The preceding command checks the Docker logs of the container on your ECS container instance. For more information about the JSON log driver, see JSON File logging driver on the Docker website.
Fargate tasks
The awslogs log driver passes the logs from Docker to CloudWatch Logs. The logs show the STDOUT and STDERR I/O streams from the command outputs.
Related information
AWS Support Automation Workflows (SAW)
Setting up Automation
Run an automated operation powered by Systems Manager Automation