ECS Fargate Task Losing Internet Connectivity

0

Running a task using aws ecs run-task works well for one-off tasks, except for they always lose internet connectivity within about 5 minutes from starting.

Here is the script I'm using to start the task:

export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

cmd=$(cat <<EOF
{
    "containerOverrides": [
        {
            "name": "${CONTAINER}",
            "command": ${COMMAND}
        }
    ]
}
EOF
)

taskDefinition=$( \
    aws ecs list-task-definitions --region us-east-1 \
        --family-prefix ${TASK_DEFINITION} \
        --output text \
    | grep ${TASK_DEFINITION} | tail -n1 | awk -F '/' '{print $NF}' \
)

echo "Running '${SCRIPT_NAME}' using '$taskDefinition' task definition"
aws ecs run-task --region us-east-1 \
    --cluster ${ECS_CLUSTER} \
    --launch-type FARGATE \
    --started-by "ManualScript" \
    --task-definition $taskDefinition \
    --network-configuration "awsvpcConfiguration={subnets=[${TASK_SUBNET}],securityGroups=[${TASK_SECURITY_GROUP}],assignPublicIp=ENABLED}" \
    --overrides "$cmd"

The container name and task definition family do not have auto-scaling rules, which was my first thought, so I'm not sure how to identify the fix here.

2 Answers
0

ECS Fargate uses AWSVPC networking mode. An ENI gets attached to the ECS task during launch, and stays attached to it until the ECS task gets terminated. Your issue might not be related to ECS, but due to some network configurations on your end.

I see that you are assigning a public IP to your ECS task. It might be possible that some security compliance process (probably AWS Config) is modifying your task security-group or modifying other network configurations to deny network access to your task for security reasons.

To figure this out, go to CloudTrail and search if any events were triggered around the same time when you lost your network connectivity. I hope this helps!

profile pictureAWS
SUPPORT ENGINEER
answered 2 years ago
  • This is on a new AWS Account and we're not using AWS Config. There's also no security processes around using a public IP, unless it's configured by default by AWS.

    Looking in CloudTrail, I don't see any events that looks like they relate. I see the DeleteNetworkInterface event, but it's date is about 10-15 seconds after the network error is thrown, so the container is probably being decommissioned at that stage. Other than that, I see some Describe events, which are readonly anyways.

    Could there be any related quotas we might be hitting?

  • Can you please check who initiated the DeleteNetworkInterface API action in the CloudTrail log? Will your container get terminated if it loses network connectivity?

0

Good advice from Venkat above, if your account is part of an AWS organization you wouldn't necessarily have visibility into the Config rules. Per your query on quotas, by default, all AWS accounts are limited to five (5) Elastic IP addresses per Region, because public (IPv4) internet addresses are a scarce public resource. So it's possible you are hitting against this as well.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions