ElasticSearch Disk Throughput Throttle


We have been getting "Disk Throughput Throtte" notifications 4-5 times every day on our ElasticSearch production cluster running on Elasticsearch 6.8.

"Your domain is experiencing throttling as you have hit disk throughput limits. Please consider scaling your domain to suit your throughput needs. Please refer to the documentation for more information: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/monitoring-events.html."

ThroughputThrottle metric shoots to 1 several times a day, even when WriteThroughput & ReadThroughput are less than 100k bytes/sec on both nodes. ReadIOPS & WriteIOPS at those timestamps are below 1 count/second (maximum), and ReadThroughput & WriteThroughput at those timestamps are below 1 MBps (maximum).

We have enterprise support with AWS and they have advised us to move from gp2 to gp3 to increase our baseline throughput, but we do not believe that is the right solution because the requests to our cluster are very low and we do not see any spike in read/write throughput on CloudWatch. We also have a dev cluster with a similar number of requests per day and a smaller instance type for data nodes (t3.medium) with gp2 32GB but that is not being throttled, so we are unable to justify gp2 as the culprit.

Looking for a second opinion before we make a decision on scale-up. Thanks in advance.

Production cluster config: 2 Data nodes of the type - m4.large.search using gp2 EBS storage 35 GiB 3 Masster nodes of the type - t3.medium.search

asked a year ago3962 views
1 Answer

Hi There,

Given that ThroughputThrottle metric is spiking to 1 frequently, it does indicates that the underlying GP2 based EBS volume on individual data node is throttling. As you might already be aware at this point, a value of 1 indicates that the requests (read/write) are being throttled due to limitations of underlying EBS volumes on data nodes.

As AWS Premium Support had advised you already, switching to GP3 based EBS volumes should stop this CloudWatch metric bouncing to 1 and prevent any throttling issues with the underlying EBS volume. Furthermore, with GP3 based volumes you can individually scale the parameters — IOPS and Throughput, which is not feasible with GP2 based volumes. Moreover, GP3 based EBS volumes offer better baseline performance than its GP2 counterpart, and all of this at a lower cost than GP2.

The below documentations from AWS should help you in making an informed decision. [+] Monitoring OpenSearch cluster metrics with Amazon CloudWatch - Cluster metrics: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-cloudwatchmetrics.html#managedomains-cloudwatchmetrics-cluster-metrics [+] Monitoring OpenSearch cluster metrics with Amazon CloudWatch - EBS volume metrics: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-cloudwatchmetrics.html#managedomains-cloudwatchmetrics-master-ebs-metrics [+] General Purpose SSD volumes - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose.html [+] Amazon OpenSearch Service Pricing: https://aws.amazon.com/opensearch-service/pricing/

Hope this helps in making an informed decision!

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions