- Newest
- Most votes
- Most comments
Input context window is 200,000 tokens but as you are generating tokens to answer your question you are most likely hitting output token limits which are 4096 tokens see Anthropic user guide for more details.
You should ensure the model is giving you the most concise answer possible e.g. use prompt engineering to minimize preamble.
You would need to give more context on why you are being throttled. For Claude 3.5 Sonnet on Bedrock if you are using on-demand provisioning you have both a requests per minute and tokens per minute limit you need to remain under. More details on Bedrock quota limits available in Quotas for Bedrock documentation.
If you are being throttled the best course of action will depend on your use case. Provisioned throughput would offer dedicated capacity or depending on your use case a batch process may make sense.
In AI models, a token is a chunk of text that the model reads at a time. For Claude Sonnet 3.5, the context window is 200,000 tokens, which is equivalent to approximately 150,000 words or 300 pages of text. If your input exceeds this limit, the model might return incomplete answers or errors indicating that the max token limit has been reached.
Relevant content
- asked 2 months ago
- asked 3 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 7 months ago