Bedrock Llama 3 empty string response problem

Question

I'm experimenting with several llms in Bedrock. I have no problem with claude models but there is a problem that I can not solve with llama 3 instruct models. I'm using correct format of llama 3 models in my prompt as stated in this guide [https://aws.amazon.com/blogs/aws/metas-llama-3-models-are-now-available-in-amazon-bedrock/#:~:text=Input%3A-,%3C%7Cbegin_of_text%7C%3E%3C%7Cstart_header_id%7C%3Euser%3C%7Cend_header_id,%7Cstart_header_id%7C%3Eassistant%3C%7Cend_header_id%7C%3E%5Cn%5Cn,-Output%3A%20The%20Eiffel](llama 3 guide) The problem is I'm getting empty string output sometimes :

"output": {
        "outputContentType": "application/json",
        "outputBodyJson": {
            "generation": "",
            "prompt_token_count": 1056,
            "generation_token_count": 1,
            "stop_reason": "stop"
        },
        "outputTokenCount": 1
    }

Example input (only ending part):

<|eot_id|><|start_header_id|>assistant<|end_header_id|>/n

I'm expecting model to continue with assistant answer and I'm getting meaningful response 80% of my calls. At 20% of my calls, I'm getting this empty response. I'm suspecting that model returns <|end_of_text|> token at starting but it does not makes sense to me. (I'm not sure about it since Aws doesn't provide explicit model output.) I'm trying with the same prompt in the playground and it returns correct output. Is there a problem with llama 3 API or is it another problem?

I'm getting this input prompt in the logs. As far as I know

"inputBodyJson": {
            "temperature": 0.5,
            "prompt": "[INST] <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n<|eot_id|>
--REST OF THE PROMPT--
<|start_header_id|>assistant<|end_header_id|>/n/n [/INST]"
        },
        "inputTokenCount": 1056

[INST] [/INST] format used in llama 2 as far as I know. API adds these tokens automatically. This might be the cause of the problem. What do you think?

Accepted Answer

I found the problem. The problem stems from langchain

Answer

It seems like you're experiencing intermittent issues with receiving empty responses from the Llama 3 model in Bedrock, despite providing correct input prompts. Here are a few possible reasons and troubleshooting steps to consider:

Prompt Format: Ensure that the prompt format follows the guidelines provided for Llama 3 models. Based on the example provided in the AWS blog post you mentioned, the prompt should start with <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n followed by your actual prompt content, and end with <|start_header_id|>assistant<|end_header_id|>\n\n.
Token Count: Check the token count of your input prompt. In your example, the token count is 1056. Make sure this count accurately reflects the length of your input prompt.
Temperature Setting: Experiment with different temperature settings to see if it affects the model's behavior. Higher temperatures can result in more creative but potentially less coherent responses, while lower temperatures can produce more conservative and predictable outputs.
Retry Logic: Implement retry logic in your application to handle cases where you receive empty responses. If you consistently receive empty responses for the same input prompt, retrying the request may yield a valid response.
Check API Limits: Verify that you're not hitting any API rate limits or quotas that could cause intermittent issues with model responses. AWS provides documentation on API usage limits for Bedrock services that you can review.
Contact Support: If the issue persists and you're unable to determine the cause, consider reaching out to AWS support for assistance. They can investigate further and provide guidance based on the specifics of your situation.

Regarding your suspicion about the [INST] tokens, if these tokens are automatically added by the API and are not part of the expected input format for Llama 3 models, they could potentially cause issues. You may want to confirm with AWS support or consult the documentation to ensure that your input prompt format is correct.

Overall, troubleshooting intermittent issues with model responses can be challenging, but by systematically checking and adjusting various factors, you may be able to identify the root cause and resolve the problem.

Bedrock Llama 3 empty string response problem

Relevant content