Rate Limits and Statistics for LLama 3 8B on Bedrock

0

I want to understand the following benchmarks for LLama 3 8B on AWS Bedrock:

  1. Max concurrency for a single user/API
  2. Latency
  3. Tokens per second
  4. Rate limits, if any (number of requests per minute/day or number of tokens per minute/day) We have a use case where we plan to scale to 30 concurrent users, with good amount of token and request usage.
1 Answer
0

Hi. It's hard to give a precise answer because it depends on a lot of factors. Just to give you a few examples, latency can depend on the region where the model is deployed, the prompt itself, and so on. However, I'd suggest you use e benchmarking tooling that you can find on github. I'm proving the repo link:

https://github.com/aws-samples/foundation-model-benchmarking-tool

The quotas can be found here:

https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html

profile pictureAWS
answered a month ago
profile pictureAWS
EXPERT
reviewed a month ago
  • Can I just hit API for LLM and all my agents/knowledge bases/context-history can be separately stored in another cloud? Eg: How we hit bare OpenAI's APIs with LangChain and Knowledgebase and everything custom-built on our side

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions