TEXTRACT: Incorrect Layout response objects results and results not in desired order.

Question

Background: I am using Textract Analyze document API to detect Layout response objects in a PDF page. The page has Page Headers, Title, Sub-headers, tables, figures, and text. The page is divided into 3 vertical columns, each having some text and tables. 
Challenge: I have 2 challenges:
1) Upon using the Layout option from Analyze document API, Textract can correctly identify about 90% of response objects. Some Sub-headers are identified as Text, and sometimes sub-headers are identified as a part of the Table. How can I train my model to identify the response objects correctly?
2) The order in which these Layout response objects are being presented is completely wrong. Eg. I first want all the response objects of column 1 to be presented followed by that of Column 2 and so on. Is there a way by which I can train the Textract to first identify and print the objects from Column 1 then followed by Column 2?

I am attaching some snippets to better understand my challenges:

![Enter image description here](/media/postImages/original/IMMbp43guwQ3ilAHknjrk__w)
![Enter image description here](/media/postImages/original/IMB2R-edgRQ12WzntNigz0rA)

Accepted Answer

Using bounding boxes might be helpful. You should try the [Textractor Package ](https://aws-samples.github.io/amazon-textract-textractor/index.html)([amazon-textract-overlayer](https://pypi.org/project/amazon-textract-overlayer/))

TEXTRACT: Incorrect Layout response objects results and results not in desired order.

相關內容