ecs execute-command not working for pid_mode=task

0

I have an ECS task running on Fargate with three containers. The task is configured to run with the pid_mode=task (because I want to collect process-metrics via an agent in one of the containers). However when I try to connect to my containers via "ecs execute-command", I am only able to do so for the container which is defined first in my task, for the other two containers I always get the error "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later". If I remove the setting pid_mode=task, then I am able to connect to all three containers of my task without any problem.

The utility "check-ecs-exec.sh" also doesn't show any issues, everything is either green or at least yellow:

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/1.27.44 Python/3.10.12 Linux/5.15.0-91-generic botocore/1.29.44)
  Session Manager Plugin | OK (1.2.536.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-1
Cluster: ecstest-stage
Task   : 97d1a…
-------------------------------------------------------------
  Cluster Configuration  | Audit Logging Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::…
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks | 
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "datadog-agent"
         2. RUNNING for "log-router"
         3. RUNNING for "ecstest-webserver"
    ----------
      Init Process Enabled (ecstest-stage:28)
    ----------
         1. Enabled - "ecstest-webserver"
         2. Enabled - "log-router"
         3. Enabled - "datadog-agent"
    ----------
      Read-Only Root Filesystem (ecstest-stage:28)
    ----------
         1. Disabled - "ecstest-webserver"
         2. Disabled - "log-router"
         3. Disabled - "datadog-agent"
  Task Role Permissions  | arn:aws:iam::…
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          | 
    Found existing endpoints for vpc-0d10e…:
      - com.amazonaws.eu-west-1.s3
    SSM PrivateLink "com.amazonaws.eu-west-1.ssmmessages" not found. You must ensure your task has proper outbound internet connectivity.  Environment Variables  | (ecstest-stage:28)
       1. container "ecstest-webserver"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
       2. container "log-router"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
       3. container "datadog-agent"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
MMoench
已提問 4 個月前檢視次數 424 次
3 個答案
0
已接受的答案

At this time, the behavior of Amazon ECS is non-deterministic with respect to enableExecuteCommand when pidMode is set to task. The AWS SSM agent (which powers the feature) will be running in one of the containers only, but right now you cannot specify which container is the one in which it will run, nor can you specify that you want it to run in all of them.

The ECS service team is aware of this limitation. If you'd like to track the progress of the issue, I'd recommend you create a GitHub Issue at https://github.com/aws/containers-roadmap/issues and discuss your use case.

AWS
專家
已回答 4 個月前
0

Hi MMoench,

Did you make sure that in your task definition's structure/JSON "enableExecuteCommand" is set to true ?

aws ecs describe-tasks --cluster <CLUSTER> --tasks <TASK_ARN>

If not try to enable it.

Best,

Didier

profile pictureAWS
專家
已回答 4 個月前
  • Yes, "enableExecuteCommand" is indeed set to true in the task definition, which I have confirmed by running "describe-tasks". Additionally I am able to log in to the first container in my example above (ecstest-webserver), just the other two containers don't work while "pid_mode=task" is enabled.

0

hello - is there a solution to this? I have the same issue. I'm using a datadog sidecar for metrics and logging and need to be able to send a flare as per: https://docs.datadoghq.com/agent/troubleshooting/send_a_flare/?tab=agentv6v7#ecs-fargate

已回答 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南