By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How do I identify whether my Amazon EBS volume is micro-bursting and then make sure it doesn’t affect performance?

4 minute read
4

I have an Amazon Elastic Block Store (Amazon EBS) volume that hasn’t reached its throughput or input/output operations per second (IOPS) limit in Amazon CloudWatch. But the volume is throttled and experiences high latency and queue length.

Resolution

CloudWatch metrics helps to monitor the IOPS and throughput for all Amazon EBS volume types. CloudWatch metrics collect samples at every one minute interval. However, I/O operations are performed at a millisecond rate. When the volume experiences bursts of high IOPS or throughput for a shorter time interval than the collection interval, CloudWatch doesn't capture the burst. This failure occurs because the metric monitoring happens at a per-second rate.

Use CloudWatch metrics to identify possible micro-bursting

Check the VolumeIdleTime metric

The VolumeIdleTime metric graph shows the number of seconds that no read or write operation was submitted, in a specified duration. If VolumeIdleTime is high, then the volume remained idle for most of the duration. If during the same duration there was sufficiently high IOPS or throughput, then that volume experienced micro-bursting.

Calculate the average throughput and the average IOPS that the EBS volume received

Use the following formula to calculate the average throughput of the EBS volume:

Estimated Average Throughput in Bps = (Sum(VolumeReadBytes) + Sum(VolumeWriteBytes) ) / (Period - Sum(VolumeIdleTime) )

Use the following formula to calculate the average IOPS for the EBS volume:

Estimated average IOPS in Ops = (Sum(VolumeReadOps) + Sum(VolumeWriteOps) ) / ( Period - Sum(VolumeIdleTime) )

Use CloudWatch to get the micro-bursting event

Complete the following steps:

  1. Open the CloudWatch console.
  2. Choose All Metrics.
  3. Use the volume ID to search for the volume that has experienced micro-bursting.
  4. Choose EBS, then Per-Volume Metrics.
  5. To view throughput metrics, select VolumeReadBytes, VolumeWriteBytes, and VolumeIdleTime.
  6. Choose Graphed metrics.
  7. For Statistics, select Sum, and for Period, select 1 minute.
  8. For Add Math, choose Start with empty expression.
  9. For Details of Expression, use the assigned graph IDs for VolumeReadBytes, VolumeWriteBytes, VolumeIdleBytes. This builds the Estimated Average Throughput in Bps formula.
    For example: (m1+m2)/(60-m3).

If the graph shows a value that is greater than the maximum throughput for the volume, then the workload is micro-bursting.

To check if there was micro-bursting due to I/O operations, follow the previous steps. Then in step 5, replace VolumeReadBytes, VolumeWriteBytes, and VolumeIdleTime with VolumeReadOps, VolumeWriteOps, and VolumeIdleTime.

Use an OS level tool to confirm micro-bursting

An EBS volume can experience micro-bursting even when a volume is busy (VolumeIdleTime is low). For volumes with low VolumeIdleTime, OS level tools with granular collection of samples are a more efficient way to identify whether the workload is micro-bursting.

Linux

To report I/O statistics for all your mounted volumes with one-second granularity, run the iostat command:

iostat -xdmzt 1

For more information, see iostat(1) on the Linux man page website.

Note: The iostat tool is part of the sysstat package. If you can't find the iostat command, then run the following command to install sysstat on Amazon Linux Application Machine Images (AMI):

sudo yum install sysstat -y

To determine whether you reach the throughput limit, review the rMBps and wMBps in the output. If rMBps + wMBps is greater than the maximum throughput for the volume, then volume experiences micro-bursting.

To determine whether you reach the IOPS limit, review the rps and wps in the output. If rps + wps is greater than the volume's maximum IOPS, then volume experiences micro-bursting.

Windows

Run the perfmon command in the Windows Performance Monitor. For more information see, Determine your IOPS and throughput requirements.

Change your volume size or type to accommodate your applications and prevent micro-bursting

Micro-bursting might cause I/O throttling or I/O latency in your application. To prevent this, modify the volume to a type and size that accommodates your required IOPS and throughput, even at micro-bursting levels. For more information, see Amazon EBS volume types. There are limits on the IOPS and throughput the instance can push to all attached EBS volumes.

It's a best practice to benchmark your volumes against your workload to verify which volume types can safely accommodate your workload in a test environment. For more information, see Benchmark Amazon EBS volumes.

AWS OFFICIAL
AWS OFFICIALUpdated a month ago
1 Comment

Got Latency? This was very helpful in identifying microbursting on our MS SQL cluster as the culprit. The normal volume monitoring metrics do not show the correct and accurate IOP use (at least it did not for us). When we followed these steps (with the help of an AWS rep) we were able to see that where we thought we were over provisioning SQL data volume at 4K IOPs (since normal iop monitoring was saying we never came close to 4K)... it was actually using/bursting between 6K and 10K, so that is why we were seeing a lot of latency (using sql querys to display disk performance.) Why AWS does not include this metric in disk monitoring is beyond me, but I'm glad that AWS support was able to assist us in getting to the truth.

replied a year ago