Skip to content

AWS Bedrock Llama limit tokens to 8k

0

I tried all available Meta models (some of them are not available), and when I read this page: https://aws.amazon.com/bedrock/meta/ , I can see some models have Max tokens of 128k, but when I use AWS Bedrock, all of them have a limited response of 8k.

Is there something else that I missed in setting up to use the full model potential?

Models that I tried:

Llama 3.1 8B Instruct meta.llama3-1-8b-instruct-v1:0 - 8192 limit.

Llama 3.1 70B Instruct meta.llama3-1-70b-instruct-v1:0 - 8192 limit.

Llama 3.2 1B Instruct meta.llama3-2-1b-instruct-v1:0 - 8192 limit.

Llama 3.2 3B Instruct meta.llama3-2-3b-instruct-v1:0 - 8192 limit.

Llama 3.3 70B Instruct meta.llama3-3-70b-instruct-v1:0 - 8192 limit.

Llama 3.2 11B Instruct meta.llama3-2-11b-instruct-v1:0 - "Access to multimodal models is not available for customers of Amazon Web Services EMEA SARL".

Llama 3.2 90B Instruct meta.llama3-2-90b-instruct-v1:0 - "Access to multimodal models is not available for customers of Amazon Web Services EMEA SARL".

Llama 4 Maverick 17B Instruct meta.llama4-maverick-17b-instruct-v1:0 - "Access to multimodal models is not available for customers of Amazon Web Services EMEA SARL".

Llama 4 Scout 17B Instruct meta.llama4-scout-17b-instruct-v1:0 - "Access to multimodal models is not available for customers of Amazon Web Services EMEA SARL".

2 Answers
0

You are correct that some Meta Llama models on AWS Bedrock have larger context windows than what you're experiencing. However, there's an important distinction between the model's context window (input tokens) and the maximum output tokens.

The 8,192 token limit you're encountering is likely the maximum number of tokens for the generated response, not the total context window. This is a common limitation set by the API to manage resource usage and response times.

To utilize the full context window of these models (e.g., 128K for Llama 3.3), you can input longer prompts or documents up to that limit. However, the generated response will still be capped at 8,192 tokens.

If you need to generate longer outputs, you may need to implement a strategy of breaking down your task into smaller chunks or using multiple API calls to generate a longer response in parts.

It's worth noting that the actual token limits can vary between different API endpoints and may be subject to change. Always refer to the most up-to-date AWS documentation for the specific limits of each model and API endpoint you're using.

If you believe you should have access to generate longer outputs with these models, it would be best to contact AWS support for clarification on your account's specific capabilities and any potential configuration options that might allow for longer outputs.
Sources
InferenceConfiguration - Amazon Bedrock
TextInferenceConfig - Amazon Bedrock
InferenceConfiguration - Amazon Bedrock

answered a year ago
0

You're encountering a discrepancy between the advertised 128K token context window for Meta's Llama models on AWS Bedrock and the 8K token limit you're experiencing during usage. This issue likely stems from the specific model versions you're accessing and potential regional restrictions. 1

Understanding the Token Limits

While AWS Bedrock supports Llama models with up to a 128K token context window, not all model versions or configurations offer this capacity. For instance, earlier versions like Llama 3 8B and 70B have an 8K token limit. In contrast, newer versions such as Llama 3.1, 3.2, and 3.3 are designed to support the extended 128K context window. 1, 2

Regional Availability and Access

Access to specific Llama models can vary by AWS region. For example, some models are available in the US West (Oregon) region but may not be accessible in other regions. Additionally, certain models, especially those with multimodal capabilities, might have restrictions based on your AWS account's region or configuration. 3

Steps to Access Models with 128K Token Limit

  1. Verify Model Version: Ensure you're selecting a Llama model version that supports the 128K token context window, such as Llama 3.1, 3.2, or 3.3. 1

  2. Check Regional Availability: Confirm that the desired model is available in your AWS region. If not, consider using cross-region inference or switching to a region where the model is accessible.

  3. Request Access: If a model isn't available in your region, you might need to request access through the AWS Management Console or contact AWS support for assistance.

  4. Review Documentation: Consult the AWS Bedrock documentation for detailed information on model capabilities, regional availability, and access procedures.

By ensuring you're using the appropriate model versions and verifying regional availability, you should be able to leverage the full 128K token context window offered by the newer Llama models on AWS Bedrock.

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.