AWS AutoScaling group created many more instances than the configured max capacity. Why?

Question

I have an AWS AutoScaling group configured with min/desired/max 1/1/5

To this ASG, I've attached a dynamic scaling policy that targets a CPU utilization of 75%

In response to a bug in some code I deployed to EC2 last night, CPU utilization in the one running instance shot up to 100%. The ASG provisioned more instances in response, but because the  same buggy code was being deployed to each, they each had high CPU utilization.

In this scenario, I expected that the ASG would increase capacity to 5 instances, then stop there. To my surprise however, the ASG kept provisioning more instances, increasing the max capacity by itself. After running this way for some time, it had provisioned a total of 31 instances.

I want to know why the ASG went over the max capacity I configured. The ASG documentation (https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html?icmpid=docs_ec2as_help_panel#as-how-scaling-policies-work) says this should not be possible.

Accepted Answer

There are only 2 situations[1] where the ASG can change its own max capacity, neither of which should apply to your situation.  I suggest looking in cloudtrail for EventName=UpdateAutoScalingGroup to see if maybe there was a user or some sort of automated script which increased the max?

Also just to confirm: Did the Max actually get changed from 5 to 31, or were there 31 total instances, but the max was still set to 5?  Its possible if there were frequent healthcheck replacements happening from the instance CPU being maxed out, that the ASG could have had more instances than the max.  The Max is a boundary for the desired capacity, not the total instances.  While an instance is being replaced due to healthcheck failures, it isn't going to contribute towards the groups capacity calculations, and therefore the total instance count can be over max temporarily

[1]
1) Scheduled scaling: If a scheduled action is created, it can be set to change the min, max, and/or desired capacity of the ASG.  You would have had to explicitly created this, so I doubt its applicable here
2) PredictiveScaling when [the MaxCapacityBehavior](https://docs.aws.amazon.com/autoscaling/ec2/userguide/predictive-scaling-create-policy.html) is set to IncreaseMaxCapacity.  However, Predictive Scaling only changes the groups capacity based on the past 1-14 days of usage.  It does not react to any real time changes to usage in the group like you described

AWS AutoScaling group created many more instances than the configured max capacity. Why?

Relevant content