How do I troubleshoot the increase to my DiskQueueDepth in Amazon RDS for PostgreSQL?

4 minute read
0

I want to troubleshoot the increase to my DiskQueueDepth in Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

Short description

DiskQueueDepth is the number of input and output (I/O) requests that were submitted by the application, but haven't been sent to the storage device yet. The requests are pending because disk or storage is busy with other requests.

A high DiskQueueDepth indicates that I/O requests have increased faster than the storage system can process. Amazon RDS reports this metric as an average over one minute intervals. It's a best practice to target a queue length of 1 for every 1000 IOPS available.

Note: To measure I/O performance, check the instance's throughput and number of I/O operations.

Resolution

To troubleshoot the increase to your DiskQueueDepth in Amazon RDS for PostgreSQL, complete the following:

Identify the high DiskQueueDepth

To identify high DiskQueueDepth, check the following:

  • Check your Amazon CloudWatch metrics when you have performance issues. Look for spikes in the DiskQueueDepth metric that are higher than normal. If DiskQueueDepth remains elevated for an extended time period, then queries will wait longer before they run.
  • Check your instance type limits. Confirm whether the ReadIOPS, WriteIOPS, ReadThroughput, and WriteThroughput metrics reach their instance type limits.
  • Check your ReadThroughput and WriteThroughput metrics. CloudWatch presents this data in bytes per second, so for clearer interpretation, convert these values to MB/s or GB/s. To determine if you have reached your allocated throughput limit, sum both metrics for the same time period. For details on the throughput allocated to your instance, refer to the Amazon RDS instance types documentation.
  • Check your EBSByteBalance% metric. If EBSByteBalance% is at or near zero, then the instance consumes high throughput. Also, a byte balance of zero indicates throttling on your bandwidth.

Check your storage type and size

To check your storage type and size, take the following actions:

  • Optimize your workload. Explore available options to tune and improve your workload's efficiency.
  • For General Purpose SSD (gp2), make sure that your storage size has enough baseline IOPS and throughput.
  • To gain more IOPS or provisioned gp3, io1, and io2, tune your workload or scale to larger instance with increased CPU, memory, and I/O capacity.

Check for sudden traffic increases

Unexpected spikes in database activity might lead to increased DiskQueueDepth. To identify spikes in your connections, take the following actions:

  • Check the DatabaseConnections graph in your Amazon CloudWatch metrics. Too many concurrent connections might lead to heavy read and write operations.
  • Achieve a queue length of 1 for every 1000 IOPS available. (This is the baseline for General Purpose SSD volumes and the provisioned amount for Provisioned IOPS SSD volumes.)
  • Monitor your application performance and make adjustments based on your requirements.

Check for micro-bursting issues

If your instance doesn't reach its throughput or IOPS limit, but experiences high latency and queue length, then you might have micro-bursting on your instance. To resolve this issue, see How can I identify if my Amazon EBS volume is micro-bursting and then prevent this from happening? Also, make sure that Enhanced Monitoring is turned on, and then set the granularity to 1 second to identify if micro-bursting is an issue.

Optimize bad queries

To optimize bad queries, take the following actions:

  • To check the query types that run at high DiskQueueDepth time periods, use Performance Insights.
  • To generate query run plans, use EXPLAIN to identify optimization opportunities. For more information, see EXPLAIN on the PostgreSQL website.
  • Check for full table scans, inefficient joins, or missing indexes that might cause excessive I/O.