AWS AutoScaling group created many more instances than the configured max capacity. Why?

0

I have an AWS AutoScaling group configured with min/desired/max 1/1/5

To this ASG, I've attached a dynamic scaling policy that targets a CPU utilization of 75%

In response to a bug in some code I deployed to EC2 last night, CPU utilization in the one running instance shot up to 100%. The ASG provisioned more instances in response, but because the same buggy code was being deployed to each, they each had high CPU utilization.

In this scenario, I expected that the ASG would increase capacity to 5 instances, then stop there. To my surprise however, the ASG kept provisioning more instances, increasing the max capacity by itself. After running this way for some time, it had provisioned a total of 31 instances.

I want to know why the ASG went over the max capacity I configured. The ASG documentation (https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html?icmpid=docs_ec2as_help_panel#as-how-scaling-policies-work) says this should not be possible.

ack_inc
asked 2 months ago134 views
1 Answer
1
Accepted Answer

There are only 2 situations[1] where the ASG can change its own max capacity, neither of which should apply to your situation. I suggest looking in cloudtrail for EventName=UpdateAutoScalingGroup to see if maybe there was a user or some sort of automated script which increased the max?

Also just to confirm: Did the Max actually get changed from 5 to 31, or were there 31 total instances, but the max was still set to 5? Its possible if there were frequent healthcheck replacements happening from the instance CPU being maxed out, that the ASG could have had more instances than the max. The Max is a boundary for the desired capacity, not the total instances. While an instance is being replaced due to healthcheck failures, it isn't going to contribute towards the groups capacity calculations, and therefore the total instance count can be over max temporarily

[1]

  1. Scheduled scaling: If a scheduled action is created, it can be set to change the min, max, and/or desired capacity of the ASG. You would have had to explicitly created this, so I doubt its applicable here
  2. PredictiveScaling when the MaxCapacityBehavior is set to IncreaseMaxCapacity. However, Predictive Scaling only changes the groups capacity based on the past 1-14 days of usage. It does not react to any real time changes to usage in the group like you described
AWS
answered 2 months ago
profile picture
EXPERT
reviewed a month ago
  • Thanks for the answer Shahad. I'm unable to verify if MaxCapacity indeed changed from 5 to 31 since I never enabled monitoring, but what you described - healthcheck replacements happening due to CPU being maxed out - is most likely what happened.

  • If you don't see the max as 31 currently; and you don't see any UpdateAutoScalingGroup calls in CloudTrail lowering it back down; then its a pretty safe bet it didn't change. AutoScaling metrics are free, so I'd recommend enabling them now to make future troubleshooting simpler: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-metrics.html

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions