Assessing the Stability and Recovery from Potential Throttling of High Throughput Queries on DynamoDB with On-Demand Mode

0

Hello,

I am currently working on an application where we are leveraging AWS Lambda in conjunction with DynamoDB for real-time data rendering purposes. Specifically, we are utilizing the table.query feature on DynamoDB, set in on-demand mode, while extensively using Global Secondary Indexes (GSI) for our query filtering.

Our system occasionally experiences high throughput situations, and there is a concern regarding the stability of this feature under such stress, especially in terms of maintaining performance consistency. We aim to understand better the behavior of DynamoDB under frequent, high-volume query operations and any potential throttling that could occur despite opting for on-demand capacity, as indicated by certain sources [1].

The critical points we are seeking insight into include:

Throttling Recovery Time: If throttling conditions are encountered, what is the typical recovery time, and how does DynamoDB manage sudden spikes in requests in on-demand mode? Understanding this is vital as it directly correlates with our real-time data rendering requirements.

Stability and Performance: Can DynamoDB sustain occasional high-throughput loads, specifically using the table.query function with GSI? Is the performance hit noticeable, and could it disrupt real-time operations?

Best Practices for Query Optimization: In the context of ensuring minimal latency and maintaining high availability and performance, would using table.scan be a more efficient approach compared to table.query in scenarios of high request volume? Are there recommended strategies for optimizing query patterns, especially when dealing with GSI and high-throughput situations?

Given the real-time nature of our application, the emphasis is on consistent, high-performance read operations. Any suggestions or insights into the architecture design, capacity planning (even in an on-demand setup), or practical experiences you could share would be immensely valuable.

Thank you for your time and assistance.

Best regards,

Ben

[1] "Why are on-demand tables getting throttled in Amazon DynamoDB?" Knowledge Center, AWS.

1 Answer
1

Throttling Recovery Time: If throttling conditions are encountered, what is the typical recovery time, and how does DynamoDB manage sudden spikes in requests in on-demand mode? Understanding this is vital as it directly correlates with our real-time data rendering requirements.

On-demand lets you scale to twice your previous peak instantly. If you require more than twice your previous peak in short duration, you may be throttled. Recovery time can vary depending on your data model and throughput, but DynamoDB will typically take action to scale within minutes.

Stability and Performance: Can DynamoDB sustain occasional high-throughput loads, specifically using the table.query function with GSI? Is the performance hit noticeable, and could it disrupt real-time operations?

Read and Writes consume separate capacity, so high read requests will not impact any real time operations. Furthermore, as DynamoDB is horizontally scalable, your performance should remain the same under high load conditions.

Best Practices for Query Optimization: In the context of ensuring minimal latency and maintaining high availability and performance, would using table.scan be a more efficient approach compared to table.query in scenarios of high request volume? Are there recommended strategies for optimizing query patterns, especially when dealing with GSI and high-throughput situations?

Query is a much more efficient request than Scan, as Query targets items based on the partition key, whereas Scan reads all items in the table/index. Optimizations should start with a correct data model, which uses a high cardinality partition key which has well distributed traffic.

Given the real-time nature of our application, the emphasis is on consistent, high-performance read operations. Any suggestions or insights into the architecture design, capacity planning (even in an on-demand setup), or practical experiences you could share would be immensely valuable.

Ensure you understand how to model correctly on DynamoDB, as that is the most important aspect of getting the performance you require, you can start with our data modelling course:

https://explore.skillbuilder.aws/learn/course/external/view/elearning/17754/amazon-dynamodb-data-modeling-techniques

If you have really high throughput out of the box, you may consider pre-warming your table/indexes so you set a high previous peak which helps you avoid throttling during unprecedented high load;

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.PreWarming

profile pictureAWS
EXPERT
answered 8 months ago
profile pictureAWS
EXPERT
reviewed 8 months ago
  • I appreciate the detailed insights provided on DynamoDB's capacity to handle high-throughput scenarios. I'm particularly interested in understanding the behavior of table.get_item operations concerning throttling.

    Could you clarify if table.get_item is generally less susceptible to throttling compared to table.query ? Also, in the event of throttling, does it recover within the same timeframe as other operations?

    Thank you for your guidance.

  • Operations don't dictate if your throttled or not, it depends on the consumed capacity. For example, a GetItem on a 6KB item will consume the exact same capacity as a Query which returns 3 * 2KB items. In that regard, there is not difference in API's. Query has the ability to consume more capacity simply because it can return more than 1 item. Recovery time is the same for all API's.

  • Thank you for your explanation about capacity consumption in DynamoDB. I'm curious about any distinct risks between table.query with GSI and table.get_item under identical capacity consumption. Specifically, are there efficiency or error rate differences in high-throughput scenarios? And do they share similar recovery mechanisms in case of errors or throttling? ( I want to know if the risks of getting throttles are the same?)

    Your insights would greatly assist our efforts in system optimization. Thanks again !

  • You cannot do a GetItem on an index. But as I stated, they are the exact same, throttling is based on consumed capacity not the API. For example, a GetItem and a Scan with a Limit=1 are the exact same. However, a Scan with no limit can consume much more capacity. So its not the API that determines if you will be throttled, thats dependent on your data model and access patterns.

  • Thank you for your detailed explanation. It has enhanced my understanding of the DynamoDB architecture. I have a humble inquiry regarding a specific aspect. A source from "https://repost.aws/questions/QUpYPN-PF0RGCEjU5JccvjTg/assessing-the-stability-and-recovery-from-potential-throttling-of-high-throughput-queries-on-dynamodb-with-on-demand-mode" mentions that "By default, the table throughput has a maximum of 40,000 read request units and a maximum of 40,000 write request units."

    Based on this, in addition to your previous insights, I gather that it's imperative to ensure each individual table operates under the 40,000 RCU/WCU threshold to maintain what is considered a safe range. Would that be a correct assessment?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions