- Newest
- Most votes
- Most comments
I've looked into this, and it turns out the <|eot_id|> tokens appearing in your responses are actually expected behavior for Llama 4. Here's what's happening:
Llama 4 returns special tokens like <|eot_id|>, <|start_header_id|>, and <|end_header_id|> as part of its native output. These are structural markers that Meta's Llama models use internally to indicate things like end-of-turn in conversations. The difference you're noticing compared to Llama 3.1 and 3.3 is that those models have built-in wrappers that automatically strip out these tokens before returning the response to you. With Llama 4, you're getting the raw, unfiltered output from the model.
The good news is this is straightforward to handle on your end. I'd recommend adding a simple post-processing step in your code to strip out these tokens. Something like:
re.sub(r'<\|eot_id\|>', '', text)
This should clean up the output and give you just the JSON response you're expecting.
Alternatively, if you're open to using a wrapper library like LangChain or llama-index-llms-bedrock, those can handle this token cleanup automatically.
For more details on this behavior, you can refer to the official documentation here:
- Meta Llama models - Request and response: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html#model-parameters-meta-request-response
It appears you're encountering an issue with Llama 4 Maverick in Amazon Bedrock where the model is returning duplicate JSON responses separated by the <|eot_id|> token.
This behavior seems specific to Llama 4 Maverick, as you've mentioned that other models like Llama 3.1 and Llama 3.3 don't exhibit this issue with the same prompt format. Llama 4 Maverick is Meta's multimodal model designed for image and text understanding, and it may have different response handling characteristics compared to other models in the Llama family.
When using Amazon Bedrock's invoke_model API, the response format can vary between models. For Meta's Llama models, the response typically includes fields like "generation" (the generated text), "prompt_token_count", "generation_token_count", and "stop_reason".
This duplicate response issue could be related to:
- How Llama 4 Maverick processes the end-of-text tokens in your prompt format
- A potential configuration issue with how the model is deployed in Bedrock
- A mismatch between the prompt format and what Llama 4 Maverick expects
Since this appears to be model-specific behavior, you might want to:
- Try adjusting your prompt format slightly for Llama 4 Maverick
- Parse your response to handle this duplicate output pattern
- Consider reaching out to AWS Support if this is impacting your application functionality
Sources
Meta Llama models - Amazon Bedrock
Meta's Llama - Models in Amazon Bedrock – AWS
This doesn't solve the issue
Relevant content
- asked 4 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 8 months ago

Hey, thanks for the answer and it helps me clear a bit of doubt of why not with llama3 models but why does this not happen with Converse api? Have AWS build an internal stripping of such tokens in converse?
Also if possible can you share the source from where you found out about the internal truncation of end of turn token in llama3 family