AWS Textract Training - Preparing annotation files

1

I have been using AWS Textract to scan forms and invoices. Previously, I trained the adapter after auto-labelling it and reviewing annotations. I wanted to prepare my own data components using already available polygons as per the following docs, using : https://docs.aws.amazon.com/textract/latest/dg/textract-preparing-training-testing.html#:~:text=and%20testing%20sets.-,Dataset%20components,-Datasets%20contain%20the I want to do this to avoid the tedious technique of verifying auto-labelling annotations. However, other than "Prelabeling files", which are a result of AnalyzeDocument operations, how do we prepare annotation files? Is there a structure we can follow? I was not able to find that structure in the docs.

質問済み 2ヶ月前654ビュー
1回答
2
承認された回答

Hi Hamza,

Thank you for reaching out with your question.

Pre-labeling and annotation files have the same format, i.e. they both follow the Textract Analyze Document (Queries) JSON response. The primary difference being the annotation files have Query Answers corrected. This is what the Custom Queries console does when you annotate the right answers. Given you have the geometry / polygon information available, you can use it to programmatically update the QUERY_RESULT BlockType's Geometry and Text for that QUERY. If a Query in pre-labeling does not contain a QUERY_RESULT BlockType (i.e. the pre-labelling didn't detect any answers), you should create a new BlockType with a random ID and add it to the QUERY BlockType's Relationships.

I recommend inspecting the annotation and pre-labeling file for a sample the console has created. You can refer to the annotations here for ease of use - https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/custom-queries/samples/checks-annotations.zip.

Compare the ID: "9eb821b4-4c12-4be5-b521-b915ac7fef44" across both files:

  • Prelabel: /checks-annotations/prelabels/93cffd3a4649a5bf6ed80e7895dd841b9397c11c2bf04a9d15604000aa0dc2a0/1
  • Annotations: /checks-annotations/annotations/0cdc276f-7346-4fd4-ab32-5d7e375d5941_1.jpg.json

Please mark this response if it answered your question.

Regards, Keith

AWS
keithm
回答済み 2ヶ月前
profile picture
エキスパート
レビュー済み 1ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ