- Newest
- Most votes
- Most comments
When creating a Custom Query Adapter for Amazon Textract to process tables in PDF files, it's important to handle cases where not all information is present in every document. Here's how you should approach this situation:
-
Do not select a random word or provide incorrect information when labeling your training documents. This would negatively impact the adapter's performance and accuracy.
-
Instead, when a particular piece of information is not present in a document, you should leave the label empty for that specific query. This is the correct way to handle missing information in your training set.
-
When annotating your documents, only provide labels for the information that is actually present. For queries where the answer doesn't exist in a particular document, simply don't annotate anything for that query.
-
Consistency is key in annotation. Make sure you follow the same approach across all your training documents. If you decide to leave a field empty when the information is not present, do this consistently for all similar cases.
-
It's beneficial to have a diverse set of training documents that represent all possible scenarios your adapter might encounter, including documents with partial information.
-
When training your adapter, Amazon Textract will learn from the patterns in your annotations, including the cases where certain information is missing. This will help the adapter handle similar situations during inference.
By following these practices, you'll create a more robust and accurate Custom Query Adapter that can handle variations in your PDF files, including cases where not all queried information is present in every document.
Sources
Customizing you Queries Responses - Amazon Textract
Best practices for Amazon Textract Custom Queries - Amazon Textract
Relevant content
- asked 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 10 months ago
It turns out you can submit your query label when it is empty. Since it was marked red I thought I had to provide something 😅