- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
Textract is trained on a wide variety of documents and is great for extracting the data. Amazon Bedrock Large Language Models (LLMs) are great at both classification with few shot learning and summarisation tasks. Depending on your usecase textract will have low latency, relaitively lower costs (are you dealing with 100's or millions of documents), and offer integrations with other AWS servies such as Amazon Comprehend.
I would recommend a hybrid approach using textract to extract the data and then either comprehend or Bedrock LLMs for classification, LLMs for summarisation, and LLMs for any Q&A type tasks.
The blog post Enhancing AWS intelligent document processing with generative AI offers methods to include generative AI LLMs in an intelligent document processing pipeline.
Also review the following workshop, Intelligent Document Processing with AWS AI Services , and the companion github Intelligent Document Processing with AWS AI Services
Hi Scott,
I would like to share with you some of my understanding and thoughts on Textract and Bedrock. First of all, I believe that Textract and Bedrock are not contradictory, in fact, they can work well together. We can use Textract to recognize unstructured data and then use Bedrock to organize and process the data in a structured way. I believe the following two examples will be helpful to you:
- serverless-pdf-chat: This sample application allows you to ask natural language questions about any PDF document you upload.
- Amazon Bedrock: This blog introduces how to use Amazon Bedrock to accelerate accurate extraction in OCR scenes, and it is also a good reference resource.
I hope this information is helpful to you!
Best regards, Keith
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor 4 Monaten
- AWS OFFICIALAktualisiert vor 2 Jahren
This was extremely helpful, Francis! I was able to follow both examples and have made considerable progress.
Do you have any suggestions/articles for tips on sending long prompts/documents to an LLM? When including an LLM into the IDP pipeline for Q&A, the LLM endpoint runs out of memory due to the size of the prompt.
I was able to find the information I was looking for - https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.html