Why is my CloudWatch alarm in INSUFFICIENT_DATA state?
5 minute read
My Amazon CloudWatch alarm is in INSUFFICIENT_DATA state. How can I find out what's causing this?
The INSUFFICIENT_DATA state can indicate any of the following:
An Amazon CloudWatch alarm just started.
The metric is unavailable.
The metric parameters, like namespace, metric name, or dimensions, have been misconfigured.
There's not enough data for the metric to determine the alarm state.
When your alarm is unexpectedly in the INSUFFICIENT_DATA state, review the following troubleshooting steps for some of the most common causes.
Normal metric behavior
When you create a CloudWatch alarm, its first state by default is INSUFFICIENT_DATA. It remains in this state until it completes its first evaluation of the metric being monitored. Typically, an alarm transitions out of INSUFFICIENT_DATA within a few minutes of creation.
An alarm in INSUFFICIENT_DATA state might reflect the normal behavior of a metric. There are two types of metrics based on how they are pushed to CloudWatch: period-driven and event-driven. Some services send periodic data points to their metrics at regular intervals. Other services push metric data when triggered by certain events, and might have periods without data points.
An example of a period-driven metric is the default CPUUtilization metric of an Amazon Elastic Compute Cloud (Amazon EC2) instance. This metric has a data point every five minutes. But, if you stop the instance, the service doesn't push any data points to it. An example of an event-driven metric is the HTTPCode_ELB_5XX_Count metric for an Application Load Balancer. The Elastic Load Balancing (ELB) service sends data points to this metric when there is a 5XX response from an ELB. If there are no 5XX errors during a period, then the result is an empty period (rather thn a zero value).
Each metric is defined by a namespace, a metric name, and up to ten dimensions. When retrieving a data point, you must specify a timestamp (and optionally, a unit). If you provide an incorrect value for one of these parameters, then CloudWatch attempts to retrieve a metric that doesn't exist. The result is an empty dataset.
Note: Data points are usually pushed to a metric with a single unit, but you aren't required to specify the unit when creating an alarm. If you don't specify a unit, you don't encounter issues related to incorrect unit configurations. But, if the data points in your metric have multiple units, then it's a best practice to use the correct unit.
Use the DescribeAlarms API to get a complete list of parameters for your monitored metrics. You can compare this with the ListMetrics output. Check the parameters for:
Misspellings and improper use of uppercase and lowercase letters. Namespaces, metric names, and dimension keys/values are case-sensitive.
Incorrectly specified or missing dimensions.
Incorrectly configured alarm periods
You can configure an alarm to retrieve data points with your desired frequency. But, you might get undesired states if the alarm uses a shorter period than the period used by the service (or source) to send the data points to the metric. To avoid unwanted INSUFFICIENT_DATA states, it's a best practice to configure the alarm's period to be equal to or greater than the period in which the data points of the metric are pushed. You can also use the M out of N settings for the alarm.
Delayed delivery of data points
Depending on the data points that are sent to CloudWatch, you might experience unexpected INSUFFICIENT_DATA states on an alarm monitoring a metric.
For example, you might have a custom application that sends data points from software deployed in an EC2 instance to a custom metric. To avoid losing data, you might configure the application to retry failed API calls. But, due to an external factor (for example, a modification of the VPC settings), the instance loses connectivity with CloudWatch. In this scenario, your environment still generates data. But the data points being sent are failing until the external issue is fixed.
If you have set up a standard alarm, the alarm evaluates the metric every minute. During evaluation, the alarm gets the latest available data points from the configured metric. During this period without connectivity, the alarm is still evaluating the metric. Because the data points are not successfully being delivered to CloudWatch, the alarm can't retrieve any data points for those evaluation periods. This triggers an INSUFFICIENT_DATA state.
After recovering connectivity, the application sends the backlog of data points, each one with its own timestamp. Because the data points are sent after this delay, the alarm can now retrieve the recent data points based on the period and evaluation period that you specified on it. At this point, you no longer see blank spaces in the metric, because the data points are now stored in CloudWatch. But, because the alarm has already evaluated that time frame, the alarm history still shows a message similar to: