By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Train an Adapter without providing an answer to each query

0

Hello, I am creating a Custom Query Adapter to process tables in PDF files.

I want to use the custom query to ask these questions:

  • "What is the total quantity ordered?"
  • "What is the total quantity of small units ordered"?
  • What is the total quantity of large units ordered"?

Therefore I need to provide a label to each of those question in my training set. However, not every PDF file contains all of this information.

How can I keep a label empty when labelling the files or should I select a random word in my PDF?

Thanks and best regards, Anton

asked 2 months ago79 views
1 Answer
0
Accepted Answer

When creating a Custom Query Adapter for Amazon Textract to process tables in PDF files, it's important to handle cases where not all information is present in every document. Here's how you should approach this situation:

  1. Do not select a random word or provide incorrect information when labeling your training documents. This would negatively impact the adapter's performance and accuracy.

  2. Instead, when a particular piece of information is not present in a document, you should leave the label empty for that specific query. This is the correct way to handle missing information in your training set.

  3. When annotating your documents, only provide labels for the information that is actually present. For queries where the answer doesn't exist in a particular document, simply don't annotate anything for that query.

  4. Consistency is key in annotation. Make sure you follow the same approach across all your training documents. If you decide to leave a field empty when the information is not present, do this consistently for all similar cases.

  5. It's beneficial to have a diverse set of training documents that represent all possible scenarios your adapter might encounter, including documents with partial information.

  6. When training your adapter, Amazon Textract will learn from the patterns in your annotations, including the cases where certain information is missing. This will help the adapter handle similar situations during inference.

By following these practices, you'll create a more robust and accurate Custom Query Adapter that can handle variations in your PDF files, including cases where not all queried information is present in every document.
Sources
Customizing you Queries Responses - Amazon Textract
Best practices for Amazon Textract Custom Queries - Amazon Textract

profile picture
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago
  • It turns out you can submit your query label when it is empty. Since it was marked red I thought I had to provide something 😅

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions