Bedrock versus Textract for document text/meaning extraction

0

I'm looking to start a project focused on text/meaning extraction from semi-structured to unstructured documents. There are specific categories of data that I'm looking for, but they may be presented in different formats in the document.

Which would be the right technology to use for such a use case? Would it be Bedrock or Textract? What is the difference between these two services in the context of document text/meaning extraction? What types of use cases are better on Textract versus Bedrock?

Thank you for the help!

Scott

Scott
asked 6 months ago783 views
2 Answers
0
Accepted Answer

Textract is trained on a wide variety of documents and is great for extracting the data. Amazon Bedrock Large Language Models (LLMs) are great at both classification with few shot learning and summarisation tasks. Depending on your usecase textract will have low latency, relaitively lower costs (are you dealing with 100's or millions of documents), and offer integrations with other AWS servies such as Amazon Comprehend.

I would recommend a hybrid approach using textract to extract the data and then either comprehend or Bedrock LLMs for classification, LLMs for summarisation, and LLMs for any Q&A type tasks.

The blog post Enhancing AWS intelligent document processing with generative AI offers methods to include generative AI LLMs in an intelligent document processing pipeline.

Also review the following workshop, Intelligent Document Processing with AWS AI Services , and the companion github Intelligent Document Processing with AWS AI Services

AWS
EXPERT
answered 6 months ago
0

Hi Scott,

I would like to share with you some of my understanding and thoughts on Textract and Bedrock. First of all, I believe that Textract and Bedrock are not contradictory, in fact, they can work well together. We can use Textract to recognize unstructured data and then use Bedrock to organize and process the data in a structured way. I believe the following two examples will be helpful to you:

  1. serverless-pdf-chat: This sample application allows you to ask natural language questions about any PDF document you upload.
  2. Amazon Bedrock: This blog introduces how to use Amazon Bedrock to accelerate accurate extraction in OCR scenes, and it is also a good reference resource.

I hope this information is helpful to you!

Best regards, Keith

AWS
keithyu
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions