Skip to content

how many tokens are served in 1 "Model unit"-hour of Llama 3.1 8B in US West?

0

how many tokens are served in 1 "Model unit"-hour of Llama 3.1 8B in US West?

1 Answer
0

Based on the available information, I cannot provide a specific number of tokens that are served in 1 "Model unit"-hour for Llama 3.1 8B in US West (Oregon). The documentation doesn't explicitly state the token throughput per model unit for Llama 3.1 8B.

What we do know is that Llama 3.1 8B supports a context length of 27,000 tokens when deployed on the default instance type (ml.g5.4xlarge) in Amazon SageMaker JumpStart. The model is also available in Amazon Bedrock in the US West (Oregon) region with a maximum context length of 128,000 tokens.

For specific information about tokens per model unit-hour, you would need to check the AWS Bedrock quotas documentation which lists "Tokens processed per minute" for each model, or contact AWS Support for these details as the token throughput varies by model.
Sources
Llama 3.1 models are now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog
Amazon Bedrock Knowledge Bases now supports Llama 3.1 405B, 70B, and 8B - AWS
Bedrock: 1 model unit = how many tokens/min throughput? | AWS re:Post

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.