How do I troubleshoot an Amazon ECS task that failed to start in an ECS cluster?

2 minute read
0

I want to use the AWSSupport-TroubleshootECSTaskFailedToStart runbook to troubleshoot a failed Amazon Elastic Container Service (Amazon ECS) task in an ECS cluster.

Short description

Use the AWSSupport-TroubleshootECSTaskFailedToStart AWS Systems Manager automation runbook to analyze the issues that might prevent an ECS task from starting. The runbook analyzes Amazon Virtual Private Cloud (Amazon VPC) and network connectivity, security group and logging configurations, AWS Identify and Access Management (IAM) permissions, and AWS Secrets Manager secrets.

Before you start the AWSSupport-TroubleshootECSTaskFailedToStart runbook, make sure that your IAM user or role has the required permissions. For more information, see the Required IAM permissions section of AWSSupport-TroubleshootECSTaskFailedToStart.

Resolution

To launch the runbook, complete the following steps:

  1. Navigate to the AWSSupport-TroubleshootECSTaskFailedToStart document in the AWS Systems Manager console.
  2. Choose Execute automation.
  3. Enter the following values for the input parameters:
    AutomationAssumeRole (optional): The ARN of the IAM role that allows Automation to perform actions. If you don't specify a role, then the automation uses the permissions of the user that starts the runbook.
    ClusterName (required): The name of the ECS cluster where the task failed to start.
    CloudwatchRetentionPeriod (optional): The retention period, in days, for the AWS Lambda function logs to be stored in Amazon CloudWatch Logs. This is needed when the analysis determines that network connectivity must be tested. The default value is 30. Other valid values are: 1 | 3 | 5 | 7 | 14 | 60 | 90.
    TaskId (required): The ID of the most recently failed task.
  4. Choose Execute.
    Note: For more information about the runbook's steps, see the Document Steps section of AWSSupport-TroubleshootECSTaskFailedToStart.
  5. After the automation completes, review the detailed results in the Outputs section. The output includes the following information:
    TaskFailureReason: The final task failure reason analysis with an explanation.
    ExecutionLogs: The output logs of each step performed by the runbook.
    ENI_Deletion_Message.Status: The status of Lambda VPC elastic network interface deletion when the runbook creates a Lambda function for network connectivity testing.
    Note: If ENI_Deletion_Message shows that the elastic network interface hasn't been deleted, then delete the resource manually.

Related information

AWS Support Automation Workflows (SAW)

Setting up Automation

Running automations