ThrottlingExceptions while using on-demand Bedrock runtime for invoking Claude v2.1

1

Hi! we are currently using Bedrock runtime for invoking the Claude v2.1 model on-demand. During our testing, we noticed Throttling exceptions being thrown by Bedrock indicating too many requests were performed:

java.util.concurrent.ExecutionException: software.amazon.awssdk.services.bedrockruntime.model.ThrottlingException: Too many requests, please wait before trying again. You have sent too many requests.  Wait before trying again. (Service: BedrockRuntime, Status Code: 429, Request ID: )

We then cross-checked with the quota guide and there we see that for Claude v2.1 the quotas are 100 req/min and 200000 tokens/min. We did not even come close to these limits. We barely scratched 1 request per minute in that timeframe. Refer to Cloudwatch screenshot: Bedrock Cloudwatch

So my questions regarding this:

  1. Why did we get these ThrottllingExceptions?
  2. Can these ThrottlingExceptions occur even when individual quota limits are not reached just because throughput is shared across all customers?
  3. Is switching to provisioned throughput the only option to mitigate this issue?

Thanks for your time!

Osama
asked 7 months ago3675 views
1 Answer
1
Accepted Answer

Hello, I understand that you are seeing ThrottlingException for invoking Claude v2.1 model in on-demand mode, even through the requests are lesser than the quota limit as per the documentation. Let me address each of your queries below-

  1. Why did we get these ThrottllingExceptions?

In case of on-demand mode, a shared capacity pool will be used across multiple customers. So at times when the demand is high and base model is processing a large number of requests, there will be a possibility of throttling even though you may have the necessary limits in place.

  1. Can these ThrottlingExceptions occur even when individual quota limits are not reached just because throughput is shared across all customers?

Yes, please note that since on-demand models make use of a shared capacity pool, during periods of high demand across the service, individual accounts may be throttled below their expected rates. Kindly note that the internal team is working on long-term fixes to expand capacity and address this issue, but we currently do not have an ETA in place.

  1. Is switching to provisioned throughput the only option to mitigate this issue?

You can also try using retry mechanisms/ exponential backoffs to mitigate throttling. That being said, it is suggested to consider provisioned throughput as it provides reserved capacity for your account specifically, so you can avoid the inherent peaks and valleys of on-demand and maintain a consistent level of performance [1].

I hope you found this helpful. If you face any other issues or require further assistance, please reach out to AWS Support [2] along with your use case details, and we would be happy to assist you further. Thank you!

References:

[1] https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html

[2] https://console.aws.amazon.com/support/home#/case/create

AWS
answered 7 months ago
profile picture
EXPERT
reviewed 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions