I want to troubleshoot write latency spikes in my Amazon Relational Database Service (Amazon RDS) DB instance.
Short description
The WriteLatency metric defines the average amount of time that each disk I/O operation takes. It's a best practice for write latency not to be more than a single digit millisecond.
If you perform the following actions, then your DB instance might spike in write latency:
The spike might also result from input/output operations per second (IOPS) or a throughput bottleneck that's caused by a heavy workload on the database.
Resolution
Troubleshoot latency spikes
To identify the cause of high write latency on your DB instance, check the following Amazon CloudWatch metrics for Amazon RDS:
The following values show that your DB instance is under a heavy workload and requires more resources:
- The latency values are high.
- The throughput and IOPS values reached their maximum quotas.
- The DiskQueueDepth value is high.
- For gp2, the BurstBalance value is low.
To troubleshoot an IOPS or throughput bottleneck, take one of the following actions:
If CloudWatch metrics don't show that your resources are throttled, then use Enhanced Monitoring to check the writeIOsPS metric.
Note: CloudWatch records metrics in 60-second intervals so it might not record every spike or drop. You can set the Enhanced Monitoring Granularity property to a 1-second interval.
If the preceding metrics don't show the cause for your latency, then check the NetworkReceiveThroughput and NetworkTransmitThroughput CloudWatch metrics for network issues.
Reduce lazy loading
Lazy loading can occur when you perform a PITR, change a Single-AZ instance to a Multi-AZ instance, or create a new read replica. If you try to access data that Amazon RDS didn't load, then the DB instance immediately downloads the requested data from Amazon Simple Storage Service (Amazon S3). The instance continues to load the rest of the data in the background.
To reduce lazy loading on tables that you require access to quickly, perform operations that involve full-table scans, such as SELECT *. This lets Amazon RDS download all backed-up table data from Amazon S3.
To reduce lazy loading after you change Single-AZ instance to a Multi-AZ instance, you can also take one of the following actions:
- Perform a manual failover soon after you change the instance.
- Run a full dump or just the required queries to load all the data from the tables. For Amazon RDS for PostgreSQL instances, you can run the pg_prewarm command.
Implement high latency best practices
If you have high latency in your DB instance, then implement the following best practices:
Important: Performance Insights will reach its end of life on June 30, 2026. You can upgrade to the Advanced mode of Database Insights before June 30, 2026. If you don't upgrade, then DB clusters that use Performance Insights will default to the Standard mode of Database Insights. Only the Advanced mode of Database Insights will support execution plans and on-demand analysis. If your clusters default to the Standard mode, then you might not be able to use these features on the console. To turn on the Advanced mode, see Turning on the Advanced mode of Database Insights for Amazon RDS and Turning on the Advanced mode of Database Insights for Amazon Aurora.
Related information
Best practices for Amazon RDS
Understanding burst vs. baseline performance with Amazon RDS and gp2
Multi-AZ DB instance deployments for Amazon RDS