- Newest
- Most votes
- Most comments
Based on the available information, I cannot provide a specific number of tokens that are served in 1 "Model unit"-hour for Llama 3.1 8B in US West (Oregon). The documentation doesn't explicitly state the token throughput per model unit for Llama 3.1 8B.
What we do know is that Llama 3.1 8B supports a context length of 27,000 tokens when deployed on the default instance type (ml.g5.4xlarge) in Amazon SageMaker JumpStart. The model is also available in Amazon Bedrock in the US West (Oregon) region with a maximum context length of 128,000 tokens.
For specific information about tokens per model unit-hour, you would need to check the AWS Bedrock quotas documentation which lists "Tokens processed per minute" for each model, or contact AWS Support for these details as the token throughput varies by model.
Sources
Llama 3.1 models are now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog
Amazon Bedrock Knowledge Bases now supports Llama 3.1 405B, 70B, and 8B - AWS
Bedrock: 1 model unit = how many tokens/min throughput? | AWS re:Post
Relevant content
- asked 6 months ago
- asked a year ago
- AWS OFFICIALUpdated 3 years ago
