- Newest
- Most votes
- Most comments
According to the Amazon CloudWatch User Guide, CloudWatch acts as a passive repository for metrics. It does not actively "poll" or "pull" data from your RDS instance; it waits for the instance to push data to it.
- Metric Ingestion: The documentation confirms that metrics are only stored when the service (RDS) publishes them.
- The Cause: During a Recovery event (unlike a simple reboot), the underlying host or the SQL Server engine agents responsible for executing the PutMetricData calls are suspended or in the process of restarting.
- The Result: Since no data is sent during this window, CloudWatch correctly displays a gap (or INSUFFICIENT_DATA). It cannot "backfill" this time period because the metric data was never generated.
Correction / Refinement
Do not use CPUUtilization or DatabaseConnections to verify if your DB is up during recovery, as these require a healthy OS to report.
Instead, check the RDS Events log or the StatusCheckFailed metric. These are designed to report availability status even when the database engine itself is unresponsive.
In Simple Terms
Think of the RDS instance as a person on a phone call with CloudWatch.
- Normal: The person is talking constantly (sending metrics).
- Recovery: The person drops the phone and has to pick it up. During this time, there is silence on the line.
- Resume: When they pick the phone back up, they start talking again.
The "gap" you see is just that silence. CloudWatch can't fill it in because the person wasn't talking while the phone was on the floor.
Based on the information available, there's no explicit documentation stating that CloudWatch metrics stop reporting during RDS SQL Server recovery states. However, this behavior could be expected for a few reasons:
When an RDS instance is in recovery mode, it may not be fully operational, which could affect the collection and reporting of metrics. Since CloudWatch metrics for RDS track operational aspects like CPU utilization, database connections, and SQL Server Agent jobs, these metrics may not be available or may not be collected when the database instance is not in a normal operating state.
Amazon RDS automatically sends metric data to CloudWatch in 1-minute periods under normal circumstances. These metrics include CPUUtilization, DatabaseConnections, and FailedSQLServerAgentJobsCount, which are the ones you mentioned seeing gaps in. If the instance is in recovery, the normal metric collection process might be interrupted.
It's worth noting that CloudWatch retains metric data points for 15 days, so once the instance returns to normal operation, you should see the metrics resume without any action required on your part. However, the gaps during recovery periods will remain, as CloudWatch doesn't backfill metric data that wasn't collected.
If you need to monitor the recovery process itself, you might want to look at RDS events and logs rather than CloudWatch metrics, as these would provide more relevant information about the recovery status.
Sources
Amazon Relational Database Service (RDS) - Amazon CloudWatch
Monitoring Amazon RDS metrics with Amazon CloudWatch - Amazon Relational Database Service
Amazon CloudWatch metrics for Amazon RDS - Amazon Relational Database Service
Relevant content
- asked a year ago
- asked a year ago
