AWS Textract Training - Preparing annotation files

1

I have been using AWS Textract to scan forms and invoices. Previously, I trained the adapter after auto-labelling it and reviewing annotations. I wanted to prepare my own data components using already available polygons as per the following docs, using : https://docs.aws.amazon.com/textract/latest/dg/textract-preparing-training-testing.html#:~:text=and%20testing%20sets.-,Dataset%20components,-Datasets%20contain%20the I want to do this to avoid the tedious technique of verifying auto-labelling annotations. However, other than "Prelabeling files", which are a result of AnalyzeDocument operations, how do we prepare annotation files? Is there a structure we can follow? I was not able to find that structure in the docs.

gefragt vor 2 Monaten654 Aufrufe
1 Antwort
2
Akzeptierte Antwort

Hi Hamza,

Thank you for reaching out with your question.

Pre-labeling and annotation files have the same format, i.e. they both follow the Textract Analyze Document (Queries) JSON response. The primary difference being the annotation files have Query Answers corrected. This is what the Custom Queries console does when you annotate the right answers. Given you have the geometry / polygon information available, you can use it to programmatically update the QUERY_RESULT BlockType's Geometry and Text for that QUERY. If a Query in pre-labeling does not contain a QUERY_RESULT BlockType (i.e. the pre-labelling didn't detect any answers), you should create a new BlockType with a random ID and add it to the QUERY BlockType's Relationships.

I recommend inspecting the annotation and pre-labeling file for a sample the console has created. You can refer to the annotations here for ease of use - https://github.com/aws-samples/amazon-textract-code-samples/blob/master/python/custom-queries/samples/checks-annotations.zip.

Compare the ID: "9eb821b4-4c12-4be5-b521-b915ac7fef44" across both files:

  • Prelabel: /checks-annotations/prelabels/93cffd3a4649a5bf6ed80e7895dd841b9397c11c2bf04a9d15604000aa0dc2a0/1
  • Annotations: /checks-annotations/annotations/0cdc276f-7346-4fd4-ab32-5d7e375d5941_1.jpg.json

Please mark this response if it answered your question.

Regards, Keith

AWS
keithm
beantwortet vor 2 Monaten
profile picture
EXPERTE
überprüft vor einem Monat

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen