Why did my CloudWatch alarm initiate when the monitored metric doesn't have any breaching datapoints?

5 minute read

My Amazon CloudWatch alarm changed to the ALARM state. When I check the metric that's monitored, the CloudWatch graph doesn't show any breaching datapoints. However, the Alarm History contains an entry with a breaching datapoint. I want to know what initiated my CloudWatch alarm.

Short description

CloudWatch alarms evaluate metrics based on the datapoints that are immediately available. The alarm's history shows a record of the datapoints that the alarm evaluated at that timestamp. However, after the alarm evaluation, CloudWatch can publish new samples. The new samples might affect the value that's calculated when CloudWatch aggregates the metric data.

Resolution

Find the breaching datapoints

If your CloudWatch graph doesn't show any breaching datapoints, then the datapoints occurred outside of the alarm evaluation time.

For example, X number of samples become available when an alarm evaluation occurs. The X number of examples result in an aggregated value of A. Then, new samples are posted. So,Y number of samples are retrieved for the same timestamp. The Y number samples results in an aggregated value of B.

In the following example, an alarm is configured with the preceding parameters:

Namespace: Web_App
Metric: ResponseTime
Dimension: host,h_04254448d4e964956
Statistic: Average
Threshold: 0.005
ComparisonOperator: GreaterThanThreshold
Period: 60 seconds (1 minute)
Evaluation Period: 1

When the alarm evaluates the period from 12:00:00 - 12:01:00 UTC, the metric retrieves following values:

Sample-1: 12:00:00 UTC, numeric value: 0.00675  
Sample-2: 12:00:00 UTC, numeric value: 0.00789  
Sample-3: 12:00:00 UTC, numeric value: 0.00421

Because the average of these values is 0.006283333, the average breaches the threshold of 0.005 seconds, and the alarm changes to the ALARM state. The alarm's history shows the aggregated values that exceed the threshold.

A host that temporarily experiences a performance issue affects the client application that's responsible for publishing metrics. As a result, the host might not post datapoints at equal intervals. In this case, samples for 12:00 are published after the alarm evaluation occurs.

The following example represents all the samples for the 12:00 timestamp:

Sample-1: 12:00:00 UTC, numeric value: 0.00675  
Sample-2: 12:00:00 UTC, numeric value: 0.00789  
Sample-3: 12:00:00 UTC, numeric value: 0.00421  
Sample-4: 12:00:00 UTC, numeric value: 0.00002  
Sample-5: 12:00:00 UTC, numeric value: 0.00007

When you receive an alert from the alarm, generate a CloudWatch graph to review the metric behavior. CloudWatch retrieves the five samples from 12:00:00 - 12:01:00 UTC and aggregates them as an average of 0.003788. So, the value changed from the previously calculated value and is below the threshold. If additional samples are posted after the alarm evaluation occurs, then the breaching datapoints aren't visible in the time range.

Increase the alarm evaluation interval

When you configure Datapoints to Alarm, a longer evaluation interval might occur. When an alarm generates false alerts because of delayed metrics, the evaluation interval increases, and the delayed datapoints are included in the alarm evaluation. The inclusion of delayed datapoints reduces the number of false alerts.

To increase the evaluation interval, use one of the following options.

Increase the period. In the following example, the period is increased to 5 minutes:

Namespace: Web_App
Metric: ResponseTime
Dimension: host,h_04254448d4e964956
Statistic: Average
Threshold: 0.005
ComparisonOperator: GreaterThanThreshold
Period: 300 seconds (5 minutes)
Evaluation Period: 1

Or, configure M out of N Datapoints to Alarm. In the following example, M out of N datapoints are configured to two out of three datapoints:

When you configure Evaluation Periods and Datapoints to Alarm as different values, the M out of N alarm is set. Datapoints to Alarm is set to M and Evaluation Period is set to N. For example, if you configure four out of five datapoints with a period of 1 minute, then the evaluation interval is 5 minutes. If you configure three out of three datapoints with a period of 10 minutes, then the evaluation interval is 30 minutes.

If you configure Datapoints to Alarm with different values, then CloudWatch alarms evaluate more datapoints. CloudWatch alarms also change the alarm state when a minimum number of datapoints breaches a set of datapoints. The parameter can adjust the alarm to activate on a single datapoint, or require multiple datapoints to transition to the ALARM state.

For more information, see Create a CloudWatch alarm based on a static threshold and Configuring how CloudWatch alarms treat missing data.

Related information

Why didn't I receive an SNS notification for my CloudWatch alarm trigger?

How do I troubleshoot my CloudWatch alarm in the INSUFFICIENT_DATA state?

Why did my CloudWatch alarm send me a notification after a single breached data point?

Topics

Management & Governance Networking & Content Delivery

Relevant content

Cloudwatch alarm not triggering when setpoint breached
rePost-User-8646237
asked 10 months ago
Routing based on Cloudwatch metrics/alarms
rePost-User-9334055
asked 2 years ago
Cloudwatch: Why is there a delay between the data breach and the alarm being triggered?
DavidJ
asked 5 months ago
Unexplainable delay between Alarm data breach and Alarm state change
hikkoiri
asked 4 years ago
Cloudwatch alarm send notify when 1 datapoint exceed the threshold 3
Tu Phan
asked 5 months ago
How do I create an Amazon CloudWatch alarm that monitors cross account metrics?
AWS OFFICIALUpdated 5 months ago
How do I troubleshoot a CloudWatch alarm that monitors CloudWatch agent metrics and is in the INSUFFICIENT_DATA state?
AWS OFFICIALUpdated 2 months ago
Why did my CloudWatch alarm send me a notification after a single breached data point?
AWS OFFICIALUpdated 2 years ago
How do I understand my alarm and evaluate my alarm transition state in CloudWatch?
AWS OFFICIALUpdated a month ago
Monitoring SageMaker Notebook Instance with CloudWatch Custom Metrics
EXPERT
Ben Lee
published 6 months ago