Using own data AWS Bedrock GenAI Chatbot

0

Hi All,

Thanks for everyone's help so far. It's been indispensable to my progress as a semi-beginner coder.

I am currently in the process of making a generative AI chatbot using the following GitHub repository https://github.com/aws-samples/rag-using-langchain-amazon-bedrock-and-opensearch.git that relies on AS Bedrock, Opensearch, and LangChain.

I've managed to replicate the chatbot in the above code that uses their own dataset by employing RAG.

However, an issue arises when I try to replace the dataset used in the code above ("https://huggingface.co/datasets/sentence-transformers/embedding-training-data/resolve/main/gooaq_pairs.jsonl.gz") with my own dataset.

My dataset is currently a text file in .docx format. The code above processes data in jsonl.gz format. I have tried converting my text file to jsonl, but I don't think this will work (and it didn't work) as my .docx file is just paragraphs of text, and not compatible with jsonl format.

I then tried modifying the code in the repository where it handles things to do with the dataset (I believe that's just files dataset.py and load-data-to-opensearch.py) to read in data from .docx files from my local device, rather than jsonl.gz files from a URL, but I also can't quite seem to make that work either.

Any suggestions would be immensely valued.

I can provide the modified code if it would help you see where I'm at with it, but wanted to keep this question fairly short and easy to follow!

Thanks, Em

3 Answers
1

Maybe you confused with Amazon Q, where indeed you have limited abilities for customisation, but knowledge base is API based service. The UI that you see in the Bedrock is only playground. You have all the freedom to create and build your custom UI. Please check this example https://aws.amazon.com/blogs/machine-learning/knowledge-bases-in-amazon-bedrock-now-simplifies-asking-questions-on-a-single-document/

AWS
answered 4 months ago
  • Thank you! I'll give this a try tomorrow (:

0

Hi,

what you see if fully normal: the dataset is used at line compressed_file_path = dataset.download_dataset(dataset_url)

If you look at impl of download_dataset in utils.py, it downloads the file at url via http.

You either have to place your own in a place where you can download it via http (AWS S3, Google Drive, etc.) or modify the code more deeply to be able to access your file locally.

Best,

Didier

profile pictureAWS
EXPERT
answered 4 months ago
  • Hi Didier,

    Thank you very much for your reply.

    That makes sense. I've obtained a https download link for my dataset in my AWS S3 bucket. I found it in the section "Object URL" when I click on the object/file inside my S3 bucket. The link is as follows: https://my-bucket-name.s3.amazonaws.com/my-file-name.docx

    Unfortunately, when I put this link into the code (line 50 of load-data-to-opensearch.py "dataset_url = "https://huggingface.co/datasets/sentence-transformers/embedding-training-data/resolve/main/gooaq_pairs.jsonl.gz""), and then query ask-bedrock-with-rag.py with the following terminal command: "python3 ask-bedrock-with-rag.py --ask “What makes [my company name] better than its competitors?”" the model still cannot retrieve the correct answer (even though the answer is in the file that loads via the URL above). It just responds with "The answer from Bedrock anthropic.claude-3-sonnet-20240229-v1:0 is: I don't have specific information about [my company name] or its competitors to make a comparison. As an AI assistant without direct knowledge about that particular company and its industry, I cannot provide a substantive answer to what makes [my company name] better than its rivals. I'd need more context or details about their products, services, and the specific industry landscape to give an informed response."

    Have I used the correct link, or am I meant to find the https download link elsewhere?

0

Hi. Did you consider Bedrock Knowledge bases? It does all the heavy lifting that you described in the transparent way.

AWS
answered 4 months ago
  • Hi @Mi_Sha,

    The reason I want to code everything is so that I can then host the chatbot on my website for customers to ask questions to, rather than have the AWS Bedrock interface. I believe creating a custom chatbot hosted on our website is only possible via the code and then using something like Streamlit for example to display it in my browser on my website.

    Do you know if it's possible to host a Bedrock chatbot on my own website and customise the UI without the code?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions