How do I troubleshoot auto scaling issues in Amazon ECS?

6 minute read
0

The auto scaling that I configured for Amazon Elastic Container Service (Amazon ECS) isn't scaling in or scaling out the desired task count.

Short description

To automatically update the desired task count, integrate Amazon ECS with AWS Application Auto Scaling and Amazon CloudWatch alarms.

Auto scaling might not add or remove tasks as expected because of the following reasons:

  • You didn't correctly configure the scaling policies.
  • You deleted or edited the CloudWatch alarm that invokes the scaling policies.
  • You incorrectly specified the cron expression format in the scheduled action.
  • The auto scaling is suspended.
  • You used AWS CloudFormation or AWS Cloud Development Kit (AWS CDK) to manually update the task count and entered the wrong value.
  • Your ECS cluster doesn't have enough resources or capacity to run new tasks.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

Troubleshoot CloudWatch alarms and scaling policies

To troubleshoot CloudWatch alarms and scaling policies, complete the following tasks based on the scaling policy.

Scalable target

You must register your ECS service as a scalable target with Application Auto Scaling.

To determine if the service is registered as a scalable target, run the describe-scalable-targets command:

aws application-autoscaling describe-scalable-targets --service-namespace ecs --region example-region

Note: Replace example-region with your AWS Region.

If the service isn't registered, then run the register-scalable-target command to register the service.

Example:

aws application-autoscaling register-scalable-target --service-namespace ecs --scalable-dimension ecs:service:DesiredCount --resource-id service/your-cluster/your-service-name --min-capacity 1 --max-capacity 10 --region example-region

Then, configure the scaling policies and CloudWatch alarms. For more information, see How can I configure Amazon ECS Service Auto Scaling on Fargate?

In the preceding command's output, check if any of the following values are set to true:

  • DynamicScalingInSuspended
  • DynamicScalingOutSuspended
  • ScheduledScalingSuspended

When DynamicScalingInSuspended or DynamicScalingOutSuspended is set to true, Application Auto Scaling doesn't update capacity when you initiate a scaling policy. When ScheduledScalingSuspended is set to true, Application Auto Scaling doesn't initiate the scaling actions that you scheduled to run during the suspension period. For more information, see Suspend and resume scaling for Application Auto Scaling.

To restart your auto scaling, run the register-scalable-target command:

aws application-autoscaling register-scalable-target --service-namespace ecs --scalable-dimension ecs:service:DesiredCount --resource-id service/your-cluster/your-service-name --suspended-state DynamicScalingInSuspended=false,DynamicScalingOutSuspended=false,ScheduledScalingSuspended=false

Note: Replace example-region with your Region and your-cluster and your-service-name with your values.

To retrieve information about the auto scaling of your ECS service, run the describe-scaling-policies and describe-scaling-activities commands:

aws application-autoscaling describe-scaling-policies --service-namespace ecs --region example-region

aws application-autoscaling describe-scaling-activities --service-namespace ecs --scalable-dimension ecs:service:DesiredCount --resource-id service/your-cluster/your-service-name --region example-region

When you create or update the CloudWatch alarms for your auto scaling, specify the correct metrics, dimensions, statistics, period, condition, and threshold values. Otherwise, CloudWatch doesn't initiate the alarm to update the associated scaling policy.

Step scaling

Check whether CloudWatch is initiating the alarms that are associated with the scaling policies. To check for errors, view the CloudWatch alarms history.

Check whether the threshold, step adjustments, and scaling adjustment type are set in the CloudWatch alarm. If there's no step adjustment for the breach delta, then you see the following error message in the alarm history:

"Failed to execute AutoScaling action: No step adjustment found for metric value [xx, xx] and breach delta xx"

To resolve this issue, specify all step adjustments in your policies. For scale in, specify -infinity to 0. For scale out, specify 0 to +infinity.

Note: For a scale-out policy, typically only the upper bound can be null (+infinity) in the step adjustments. For a scale-in policy, only the lower bound can be null (-infinity). For more information, see Step adjustments.

Target tracking scaling

Application Auto Scaling creates CloudWatch alarms for scaling events based on a target metric value that you specify when you create target tracking scaling policies. Because it can affect scaling behavior, don't edit or delete these alarms. If you modify or delete these alarms, then recreate the target tracking policy.

As long as each policy uses a different metric, you can have multiple target tracking scaling policies for an ECS service. However, when you configure multiple scaling policies, the policies can conflict and cause consecutive scale-in and scale-out activity.

Troubleshoot incorrect cron expression

Make sure that the cron expression is correct in the scheduled actions configuration for Application Auto Scaling. The cron format that Application Auto Scaling supports consists of six fields that are separated by white spaces: [Minutes] [Hours] [Day_of_Month] [Month] [Day_of_Week] [Year].

For more information, see Create scheduled actions for Application Auto Scaling using the AWS CLI.

Troubleshoot the desired task count update

When you update the desired task count for ECS services, you might invoke a scale event. For scale-out activities, service auto scaling scales the desired count up to the minimum capacity value. For scale-in activities, service auto scaling scales the desired count out to the maximum capacity value. For both activities, service auto scaling continues to scale as required by the scaling policy associated with the alarm, and doesn't adjust the desired count. 

If you create your ECS service with CloudFormation or CDK but don't specify the DesiredCount field, then the DesiredCount field is set to 1. When you use CloudFormation or CDK to update the service and don't specify the DesiredCount field, the existing desired count is used for new deployments. For the desired count value in the CloudFormation stack or AWS CDK, specify a value between the minimum and maximum values during the service update.

Troubleshoot cluster capacity issues

When your ECS cluster doesn't have enough resources to run tasks, the scaling activity that the scaling polices initiated doesn't scale. To avoid capacity issues, use Amazon ECS capacity providers to automatically provision Amazon Elastic Compute Cloud (Amazon EC2) instances.

To prevent excessive scaling during scaling activities, auto scaling uses the actual running task count in a service as the starting point, not the desired count. If there aren't enough container instance resources to place the additional tasks, then excessive scaling might not be satisfied. If the container instance capacity is available later, then the pending scaling activity might succeed. After the cooldown period, further scaling activities continue.

Related information

Automatically scale your Amazon ECS service

application-autoscaling

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago