AWS Textract Training - Preparing annotation files

1

I have been using AWS Textract to scan forms and invoices. Previously, I trained the adapter after auto-labelling it and reviewing annotations. I wanted to prepare my own data components using already available polygons as per the following docs, using : https://docs.aws.amazon.com/textract/latest/dg/textract-preparing-training-testing.html#:~:text=and%20testing%20sets.-,Dataset%20components,-Datasets%20contain%20the I want to do this to avoid the tedious technique of verifying auto-labelling annotations. However, other than "Prelabeling files", which are a result of AnalyzeDocument operations, how do we prepare annotation files? Is there a structure we can follow? I was not able to find that structure in the docs.

asked 2 months ago642 views
1 Answer
2
Accepted Answer

Hi Hamza,

Thank you for reaching out with your question.

Pre-labeling and annotation files have the same format, i.e. they both follow the Textract Analyze Document (Queries) JSON response. The primary difference being the annotation files have Query Answers corrected. This is what the Custom Queries console does when you annotate the right answers. Given you have the geometry / polygon information available, you can use it to programmatically update the QUERY_RESULT BlockType's Geometry and Text for that QUERY. If a Query in pre-labeling does not contain a QUERY_RESULT BlockType (i.e. the pre-labelling didn't detect any answers), you should create a new BlockType with a random ID and add it to the QUERY BlockType's Relationships.

I recommend inspecting the annotation and pre-labeling file for a sample the console has created. You can refer to the annotations here for ease of use - https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/custom-queries/samples/checks-annotations.zip.

Compare the ID: "9eb821b4-4c12-4be5-b521-b915ac7fef44" across both files:

  • Prelabel: /checks-annotations/prelabels/93cffd3a4649a5bf6ed80e7895dd841b9397c11c2bf04a9d15604000aa0dc2a0/1
  • Annotations: /checks-annotations/annotations/0cdc276f-7346-4fd4-ab32-5d7e375d5941_1.jpg.json

Please mark this response if it answered your question.

Regards, Keith

AWS
keithm
answered a month ago
profile picture
EXPERT
reviewed 25 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions