By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Why didn’t Amazon EC2 Auto Scaling stop an unhealthy instance?

7 minute read
1

I have an Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling Group set up, but it's not stopping an unhealthy Amazon EC2 instance.

Short description

Amazon EC2 Auto Scaling Group (ASG) automatically determines the health status of an instance through the following methods:

  • Amazon EC2 status checks.
  • Amazon Elastic Block Store (EBS) status checks.
  • Elastic Load Balancing (ELB) health checks.
  • VPC Lattice health checks.
  • Custom health checks.

All ASG scaling actions are logged in Activity History in the Auto Scaling section of the Amazon EC2 console. You can't determine why Amazon EC2 Auto Scaling didn't cancel an unhealthy instance from Activity History alone.

You can find more information about an unhealthy instance's state, and how to cancel that instance, within the Amazon EC2 console.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

Note the state of the instance in Amazon EC2 Auto Scaling:

  1. Sign in to Amazon EC2 console.
  2. In the navigation pane under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  3. Choose the Instance Management view and note the health state of the instance.

Health check grace period

EC2 Auto Scaling doesn't mark an instance as Unhealthy based on EC2 status checks and ELB health checks until the health check grace period expires. To find the grace period length, in the instance's group, select the Details view and then note the Health Check Grace Period length. There's no health check grace period for warm pool instances, Amazon EC2 Auto Scaling doesn't start checking instance health until the lifecycle hook finishes.

Suspended processes

Process suspension, such as, HealthCheck, ReplaceUnhealthy, Launch, or Terminate affects the capability of Amazon EC2 Auto Scaling to detect, replace, or terminate unhealthy instances. To remove the processes from Suspended Processes state:

  1. Under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Details view.
  3. Choose Edit and remove the following processes from Suspended Processes: HealthCheck, ReplaceUnhealthy, Launch, or Terminate.
  4. Choose Save to resume the processes.

Instance status in Amazon EC2 console

Amazon EC2 Auto Scaling doesn't immediately stop instances with an Impaired status. Instead, Amazon EC2 Auto Scaling waits a few minutes for the instance to recover. To check if an instance is impaired:

  1. On the Amazon EC2 console navigation pane, under Instances, choose Instances, and then select the instance.
  2. Select the Status and Alarms column and note whether the instance's status is Impaired.

Amazon EC2 Auto Scaling might delay or not stop instances that fail to report data for status checks. This happens when there is insufficient data for the status check metrics in Amazon CloudWatch. To stop these instances manually:

  1. On the Amazon EC2 console navigation pane, under Instances, choose Instances, and then select the instance.
  2. Choose the Monitoring view and note the status of the instance.
  3. If the status is Insufficient Data, select the instance again, choose the Actions menu.
  4. Choose Instance State.
  5. Choose Terminate.
    Note: To stop the instance in the ASG, run the following command terminate-instance-in-auto-scaling-group or the following API TerminateInstanceInAutoScalingGroup.

Instance state in Auto Scaling group

Instance is in Standby state

Amazon EC2 Auto Scaling does not perform health checks on instances in the Standby state or any state other than InService (such as Pending:Wait state). To set Standby instances back to the InService state:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling Groups, select the instance's group, and then choose the Instances view.
  2. In the Lifecycle column, check that the respective Instance is in Standby state.
  3. To resume health checks, select the instance.
  4. Select the Actions menu for an instance, and then choose Set to InService to exit the Standby state.

Note: If your Auto Scaling group is unbalanced across Availability Zones, then Amazon EC2 Auto Scaling will automatically evenly distribute your instances across the Availability Zones.

Instance is waiting on Lifecycle Hooks

Amazon EC2 Auto Scaling stops an instance after a lifecycle hook completes. To find the lifecycle status and complete the lifecycle hook:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Instances view and note the Lifecycle status for the instance.

If the status is Terminating:Wait, then the instance stays in that state until you complete the lifecycle action, or until the Heartbeat timeout period ends. Then the instance transitions to the next state (Terminating:Proceed).

Instance is waiting for ELB connection draining to complete

If Amazon EC2 Auto Scaling is waiting for an ELB connection draining period to complete, then it waits to stop the instance:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Instances view and check that the instance's Lifecycle is terminating.
  3. Choose the Activity view.
  4. For Filter, select Waiting for ELB connection draining to check if the group is waiting for ELB connection draining to stop the instance.

ELB health checks

ELB settings can affect health checks and instance replacements. Note the instance's status on the ELB console:

For Classic Load Balancers (CLB):

  1. On the Amazon EC2 console navigation pane, under Load Balancing, choose Load Balancers, and then select the load balancer that the instance is registered.
  2. Choose the Target Instances view and note the Health status and description.

For Application (ALB) or Network Load Balancers (NLB)

  1. On the Amazon EC2 console navigation pane, under Load Balancing, choose Target Groups, and then select the Target Group that the instance is registered.
  2. Choose the Targets view and note the Health status and details of the instance.

When the group's health check configuration is set to EC2, Amazon EC2 Auto Scaling doesn't use the ELB health checks to determine instance health status. As a result, Amazon EC2 Auto Scaling doesn't stop instances that fail ELB health checks. If an instance's status is OutofService or Unhealthy on the ELB console, but the instance's status is Healthy on the Amazon EC2 Auto Scaling console, check that the health check type is set to ELB:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Details view, under Health Check, note the Health Check Type.
  3. Choose Edit and select ELB for Health Check Type, and then choose Save.

For more information, see Application Load Balancers Health check reason codes, Network Load Balancers Health check reason codes, and Troubleshoot a classic load balancer: health checks. A few examples are provided below.

Instance maintenance policy

By default an Auto Scaling group doesn't have an instance maintenance policy, that causes it to respond to instance maintenance events with the default behaviors. The Auto Scaling group instance maintenance policy affects Amazon EC2 Auto Scaling events that result in instances to be replaced.

Auto Scaling minimizes downtime to your application

To avoid disruptions the ASG gradually replaces unhealthy instances. This measured approach helps the ASG maintain capacity and stability during the recovery process. For more information see, How Amazon EC2 Auto Scaling minimizes downtime.

Related information

Troubleshoot Amazon EC2 Linux instances with failed status checks

Why did Amazon EC2 Auto Scaling terminate an instance?

How do I troubleshoot and fix failing health checks for Classic Load Balancers?

How do I troubleshoot failed health checks for Application Load Balancers?

Troubleshoot your Network Load Balancer

AWS OFFICIAL
AWS OFFICIALUpdated 4 months ago