ecs execute-command not working for pid_mode=task

0

I have an ECS task running on Fargate with three containers. The task is configured to run with the pid_mode=task (because I want to collect process-metrics via an agent in one of the containers). However when I try to connect to my containers via "ecs execute-command", I am only able to do so for the container which is defined first in my task, for the other two containers I always get the error "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later". If I remove the setting pid_mode=task, then I am able to connect to all three containers of my task without any problem.

The utility "check-ecs-exec.sh" also doesn't show any issues, everything is either green or at least yellow:

-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/usr/bin/jq)
  AWS CLI | OK (/usr/local/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/1.27.44 Python/3.10.12 Linux/5.15.0-91-generic botocore/1.29.44)
  Session Manager Plugin | OK (1.2.536.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-1
Cluster: ecstest-stage
Task   : 97d1a…
-------------------------------------------------------------
  Cluster Configuration  | Audit Logging Not Configured
  Can I ExecuteCommand?  | arn:aws:iam::…
     ecs:ExecuteCommand: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks | 
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "datadog-agent"
         2. RUNNING for "log-router"
         3. RUNNING for "ecstest-webserver"
    ----------
      Init Process Enabled (ecstest-stage:28)
    ----------
         1. Enabled - "ecstest-webserver"
         2. Enabled - "log-router"
         3. Enabled - "datadog-agent"
    ----------
      Read-Only Root Filesystem (ecstest-stage:28)
    ----------
         1. Disabled - "ecstest-webserver"
         2. Disabled - "log-router"
         3. Disabled - "datadog-agent"
  Task Role Permissions  | arn:aws:iam::…
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
  VPC Endpoints          | 
    Found existing endpoints for vpc-0d10e…:
      - com.amazonaws.eu-west-1.s3
    SSM PrivateLink "com.amazonaws.eu-west-1.ssmmessages" not found. You must ensure your task has proper outbound internet connectivity.  Environment Variables  | (ecstest-stage:28)
       1. container "ecstest-webserver"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
       2. container "log-router"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
       3. container "datadog-agent"
       - AWS_ACCESS_KEY: not defined
       - AWS_ACCESS_KEY_ID: not defined
       - AWS_SECRET_ACCESS_KEY: not defined
MMoench
posta 4 mesi fa424 visualizzazioni
3 Risposte
0
Risposta accettata

At this time, the behavior of Amazon ECS is non-deterministic with respect to enableExecuteCommand when pidMode is set to task. The AWS SSM agent (which powers the feature) will be running in one of the containers only, but right now you cannot specify which container is the one in which it will run, nor can you specify that you want it to run in all of them.

The ECS service team is aware of this limitation. If you'd like to track the progress of the issue, I'd recommend you create a GitHub Issue at https://github.com/aws/containers-roadmap/issues and discuss your use case.

AWS
ESPERTO
con risposta 4 mesi fa
0

Hi MMoench,

Did you make sure that in your task definition's structure/JSON "enableExecuteCommand" is set to true ?

aws ecs describe-tasks --cluster <CLUSTER> --tasks <TASK_ARN>

If not try to enable it.

Best,

Didier

profile pictureAWS
ESPERTO
con risposta 4 mesi fa
  • Yes, "enableExecuteCommand" is indeed set to true in the task definition, which I have confirmed by running "describe-tasks". Additionally I am able to log in to the first container in my example above (ecstest-webserver), just the other two containers don't work while "pid_mode=task" is enabled.

0

hello - is there a solution to this? I have the same issue. I'm using a datadog sidecar for metrics and logging and need to be able to send a flare as per: https://docs.datadoghq.com/agent/troubleshooting/send_a_flare/?tab=agentv6v7#ecs-fargate

con risposta 2 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande