AWS Textract Training - Preparing annotation files

1

I have been using AWS Textract to scan forms and invoices. Previously, I trained the adapter after auto-labelling it and reviewing annotations. I wanted to prepare my own data components using already available polygons as per the following docs, using : https://docs.aws.amazon.com/textract/latest/dg/textract-preparing-training-testing.html#:~:text=and%20testing%20sets.-,Dataset%20components,-Datasets%20contain%20the I want to do this to avoid the tedious technique of verifying auto-labelling annotations. However, other than "Prelabeling files", which are a result of AnalyzeDocument operations, how do we prepare annotation files? Is there a structure we can follow? I was not able to find that structure in the docs.

已提問 2 個月前檢視次數 653 次
1 個回答
2
已接受的答案

Hi Hamza,

Thank you for reaching out with your question.

Pre-labeling and annotation files have the same format, i.e. they both follow the Textract Analyze Document (Queries) JSON response. The primary difference being the annotation files have Query Answers corrected. This is what the Custom Queries console does when you annotate the right answers. Given you have the geometry / polygon information available, you can use it to programmatically update the QUERY_RESULT BlockType's Geometry and Text for that QUERY. If a Query in pre-labeling does not contain a QUERY_RESULT BlockType (i.e. the pre-labelling didn't detect any answers), you should create a new BlockType with a random ID and add it to the QUERY BlockType's Relationships.

I recommend inspecting the annotation and pre-labeling file for a sample the console has created. You can refer to the annotations here for ease of use - https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/custom-queries/samples/checks-annotations.zip.

Compare the ID: "9eb821b4-4c12-4be5-b521-b915ac7fef44" across both files:

  • Prelabel: /checks-annotations/prelabels/93cffd3a4649a5bf6ed80e7895dd841b9397c11c2bf04a9d15604000aa0dc2a0/1
  • Annotations: /checks-annotations/annotations/0cdc276f-7346-4fd4-ab32-5d7e375d5941_1.jpg.json

Please mark this response if it answered your question.

Regards, Keith

AWS
keithm
已回答 2 個月前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南