- 新しい順
- 投票が多い順
- コメントが多い順
Hello, ThrottlingException while invoking models in on-demand mode, despite requests being below the documented quota limit can arise because the on-demand mode utilizes a shared capacity pool across multiple customers. Consequently, during periods of high demand when the base model processes a substantial number of requests, throttling may occur even if you have the necessary limits in place.
It's important to note that individual accounts can be throttled below their expected rates due to the shared capacity pool being utilized by all customers during high-demand periods. The internal team is actively working on long-term solutions to expand capacity and address this issue, but a specific timeline is currently unavailable.
To mitigate this issue, you can consider implementing retry mechanisms or exponential backoffs. However, switching to provisioned throughput might be the most effective option, as it provides reserved capacity specifically for your account. This approach ensures consistent performance by avoiding the inherent peaks and valleys of the on-demand mode.
Additionally, you could try using a different AWS region to see if that alleviates the throttling issues.
If further assistance is needed please feel free to reach out to AWS Support.
Have had the same issue with Llama 3. Had to pull it from our production application because of this. No other models have been an issue. Happy to find this question after wasting 3 days with the support center. Thank you OP and Zeekg
only issue here is that you cannot provision llama 3.8 capacity as of now. hopefully this gets fixed one way or the other.
関連するコンテンツ
- AWS公式更新しました 1年前
- AWS公式更新しました 2年前