Even though I can see that the alarm metric exceeds the configured threshold on my Amazon CloudWatch graphs, my CloudWatch alarm isn't activated. I want to make sure that my CloudWatch alarms are activated and perform the alarm actions.
Short description
CloudWatch alarms continuously watch time-aggregated metrics in a rolling window. If all the data points collected during the evaluation period don't exceed the configured threshold, then the CloudWatch alarm isn't activated.
CloudWatch alarms start actions when the alarm state changes and is maintained for a specified number of periods. For more information, see Using Amazon CloudWatch alarms.
Important: If an alarm is in a specified state, then the CloudWatch alarm continuously activates the Amazon EC2 Auto Scaling actions. If no states change and the alarm remains in the specified state, then the activity continues.
Resolution
To measure time-aggregated metrics when you create alarms, verify the mechanism that CloudWatch uses.
To make sure that the alarm correctly works, lower the metric data thresholds.
Troubleshooting example
In the following example, an alarm watches average CPU utilization. The alarm is configured with a threshold that's greater than 45 and a period of 5 minutes. Both Evaluation Period and Datapoints to Alarm are 3. If all existing data points in the most recent three consecutive periods are above the threshold, then the alarm changes to the ALARM state. The alarm runs for three consecutive periods of 5 minutes.
The evaluation interval is 15 minutes for the time-aggregated metrics:
- 05:25:00: data: {Avg=61.123}
- 05:30:00: data: {Avg=57.847}
- 05:35:00: data: {Avg=60.503}
- 05:40:00: data: {Avg=55.473}
- 05:45:00: data: {Avg=41.685}
- 05:50:00: data: {Avg=58.390}
- 05:55:00: data: {Avg=57.846}
- 06:00:00: data: {Avg=61.123}
For more information, see Evaluating an alarm.
The preceding data points result in the following alarm states:
- 05:35 ALARM
- 05:40 ALARM
- 05:45 ALARM to OK
- 05:50 OK
- 05:55 OK
- 06:00 OK to ALARM
The data point collected at 05:55 exceeds the average CPU utilization threshold of 45%. However, the alarm remains in the OK state and doesn't activate the action at 05:55. Because the data point collected at 05:45:00 doesn't exceed the threshold and is included in the evaluation at 05:55, no action happens. 5 minutes later, because the alarm state changes from OK to ALARM at 06:00, the alarm starts the action.
For the following time-aggregated metrics, the data points exceed the average CPU utilization threshold of 45%, so the alarm state changes to ALARM after 05:25:00. Because there are no state changes, the alarm action isn't activated.
- 05:25:00: data: {Avg=61.123}
- 05:30:00: data: {Avg=57.847}
- 05:35:00: data: {Avg=60.503}
- 05:40:00: data: {Avg=55.473}
- 05:45:00: data: {Avg=45.075}
- 05:50:00: data: {Avg=58.390}
- 05:55:00: data: {Avg=57.847}
- 06:00:00: data: {Avg=61.123}
Related information
Dynamic scaling for Amazon EC2 Auto Scaling
View available metrics