ecs execute-command not working for pid_mode=task

0

I have an ECS task running on Fargate with three containers. The task is configured to run with the pid_mode=task (because I want to collect process-metrics via an agent in one of the containers). However when I try to connect to my containers via "ecs execute-command", I am only able to do so for the container which is defined first in my task, for the other two containers I always get the error "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later". If I remove the setting pid_mode=task, then I am able to connect to all three containers of my task without any problem.

The utility "check-ecs-exec.sh" also doesn't show any issues, everything is either green or at least yellow:

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/1.27.44 Python/3.10.12 Linux/5.15.0-91-generic botocore/1.29.44)
  Session Manager Plugin | OK (1.2.536.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-1
Cluster: ecstest-stage
Task   : 97d1a…
-------------------------------------------------------------
  Cluster Configuration  | Audit Logging Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::…
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks | 
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "datadog-agent"
         2. RUNNING for "log-router"
         3. RUNNING for "ecstest-webserver"
    ----------
      Init Process Enabled (ecstest-stage:28)
    ----------
         1. Enabled - "ecstest-webserver"
         2. Enabled - "log-router"
         3. Enabled - "datadog-agent"
    ----------
      Read-Only Root Filesystem (ecstest-stage:28)
    ----------
         1. Disabled - "ecstest-webserver"
         2. Disabled - "log-router"
         3. Disabled - "datadog-agent"
  Task Role Permissions  | arn:aws:iam::…
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          | 
    Found existing endpoints for vpc-0d10e…:
      - com.amazonaws.eu-west-1.s3
    SSM PrivateLink "com.amazonaws.eu-west-1.ssmmessages" not found. You must ensure your task has proper outbound internet connectivity.  Environment Variables  | (ecstest-stage:28)
       1. container "ecstest-webserver"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
       2. container "log-router"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
       3. container "datadog-agent"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
MMoench
asked 4 months ago386 views
3 Answers
0
Accepted Answer

At this time, the behavior of Amazon ECS is non-deterministic with respect to enableExecuteCommand when pidMode is set to task. The AWS SSM agent (which powers the feature) will be running in one of the containers only, but right now you cannot specify which container is the one in which it will run, nor can you specify that you want it to run in all of them.

The ECS service team is aware of this limitation. If you'd like to track the progress of the issue, I'd recommend you create a GitHub Issue at https://github.com/aws/containers-roadmap/issues and discuss your use case.

AWS
EXPERT
answered 3 months ago
0

Hi MMoench,

Did you make sure that in your task definition's structure/JSON "enableExecuteCommand" is set to true ?

aws ecs describe-tasks --cluster <CLUSTER> --tasks <TASK_ARN>

If not try to enable it.

Best,

Didier

profile pictureAWS
EXPERT
answered 4 months ago
  • Yes, "enableExecuteCommand" is indeed set to true in the task definition, which I have confirmed by running "describe-tasks". Additionally I am able to log in to the first container in my example above (ecstest-webserver), just the other two containers don't work while "pid_mode=task" is enabled.

0

hello - is there a solution to this? I have the same issue. I'm using a datadog sidecar for metrics and logging and need to be able to send a flare as per: https://docs.datadoghq.com/agent/troubleshooting/send_a_flare/?tab=agentv6v7#ecs-fargate

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions