AWS Textract Adapters - How to get geometry for annotation files

0

I am trying to create my own browser-based console that I can use instead of the AWS Custom Queries console to create and manage Textract adapters. I've found this answer very helpful for generating the annotation files - in short, the annotation files are similar to the prelabel files, but with the correct text and geometry filled in for query results.

Filling in the correct text is easy, but how about the correct geometry (BoundingBox and in particular Polygon)? I can use DetectDocumentText to discern where the words and lines of text are on a page, but how do I use that information to get the geometry for a specific selection of words?

Matthew
asked 4 months ago193 views
2 Answers
0

Hello,


A bounding box (BoundingBox) has the following properties:

  • Height – The height of the bounding box as a ratio of the overall document page height.
  • Left – The X coordinate of the top-left point of the bounding box as a ratio of the overall document page width.
  • Top – The Y coordinate of the top-left point of the bounding box as a ratio of the overall document page height.
  • Width – The width of the bounding box as a ratio of the overall document page width.

Each BoundingBox property has a value between 0 and 1. The value is a ratio of the overall image width (applies to Left and Width) or height (applies to Height and Top). For example, if the input image is 700 x 200 pixels, and the top-left coordinate of the bounding box is (350,50) pixels, the API returns a Left value of 0.5 (350/700) and a Top value of 0.25 (50/200).


The polygon returned by AnalyzeDocument is an array of Point objects. Each Point has an X and Y coordinate for a specific location on the document page. Like the BoundingBox coordinates, the polygon coordinates are normalized to the document width and height, and are between 0 and 1.

You can use points in the polygon array to display a finer-grain bounding box around a Block object. You calculate the position of each polygon point on the document page by using the same technique used for BoundingBoxes. Multiply the X coordinate by the document page width, and multiply the Y coordinate by the document page height.


We have scripts that show how to use these exact values for geometry and polygon:

This document [1] shows how to calculate the bounding box, and this document [2] shows how to use the polygon values.

Please follow this guide [3] for a full working example, whereby we upload an image for text detection, we then use python code to draw bounding boxes around the detected text in the image and then display the image in a browser.

References:

[1]. https://docs.aws.amazon.com/textract/latest/dg/text-location.html#bounding-box

[2]. https://docs.aws.amazon.com/textract/latest/dg/text-location.html#polygon

[3]. https://docs.aws.amazon.com/textract/latest/dg/detecting-document-text.html

AWS
answered 4 months ago
-1

Hi, you will need to compute the bounding box coordinates for the selection of words which are annotating using the coordinates of the corresponding OCR words.

AWS
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions