AWS Textract Training - Preparing annotation files

1

I have been using AWS Textract to scan forms and invoices. Previously, I trained the adapter after auto-labelling it and reviewing annotations. I wanted to prepare my own data components using already available polygons as per the following docs, using : https://docs.aws.amazon.com/textract/latest/dg/textract-preparing-training-testing.html#:~:text=and%20testing%20sets.-,Dataset%20components,-Datasets%20contain%20the I want to do this to avoid the tedious technique of verifying auto-labelling annotations. However, other than "Prelabeling files", which are a result of AnalyzeDocument operations, how do we prepare annotation files? Is there a structure we can follow? I was not able to find that structure in the docs.

질문됨 2달 전654회 조회
1개 답변
2
수락된 답변

Hi Hamza,

Thank you for reaching out with your question.

Pre-labeling and annotation files have the same format, i.e. they both follow the Textract Analyze Document (Queries) JSON response. The primary difference being the annotation files have Query Answers corrected. This is what the Custom Queries console does when you annotate the right answers. Given you have the geometry / polygon information available, you can use it to programmatically update the QUERY_RESULT BlockType's Geometry and Text for that QUERY. If a Query in pre-labeling does not contain a QUERY_RESULT BlockType (i.e. the pre-labelling didn't detect any answers), you should create a new BlockType with a random ID and add it to the QUERY BlockType's Relationships.

I recommend inspecting the annotation and pre-labeling file for a sample the console has created. You can refer to the annotations here for ease of use - https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/custom-queries/samples/checks-annotations.zip.

Compare the ID: "9eb821b4-4c12-4be5-b521-b915ac7fef44" across both files:

  • Prelabel: /checks-annotations/prelabels/93cffd3a4649a5bf6ed80e7895dd841b9397c11c2bf04a9d15604000aa0dc2a0/1
  • Annotations: /checks-annotations/annotations/0cdc276f-7346-4fd4-ab32-5d7e375d5941_1.jpg.json

Please mark this response if it answered your question.

Regards, Keith

AWS
keithm
답변함 2달 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠