Textract multiple answers missing geometry

0

Hello, I would like to know if anything changed in the way Textract gives back answers. Meaning: If I ask : What is the title of this doc? and set it up to look on page1, I get an answer with text and coordinates. However, if I get 'interpreted' answers e.g. What are the standards of this doc, same lookup on page1: I have geometry set given back on None

query is TBlock(geometry=None, id='d1a1bac6-8c00-4b8b-91ef-72ff7d3398d9', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what are the standards of the certified weight?', alias='tc_certified_shipping_standards'))

rels is TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3']) [TBlock(geometry=None, id='d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3', block_type='QUERY_RESULT', relationships=None, confidence=43.0, text='GRS, GRS', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None)]

I have a quite big chunk of code depending on coordinates and for 5 months straight, I had no issue. I did check for having same other libraries related to Textract to the old version and tested on old git branches.

So, is this a new way Textract answers to questions?

Please and thank you!

  • Were you seeing a bounding box on interpreted answers previously with the same document?

  • To be frankly honest, I inherited a tiny piece of code , grew from there, and didnt have to look into it as it was going smooth. So I assume there was geometry before as it didnt crash at the same step within the app.

  • I use the polygon coordinates and I will paste what I get from Textract: Without polygon and geometry, where it now fails: TBlock(geometry=None, id='6e5deb40-4c90-47e7-b99d-933ac8c73231', block_type='QUERY_RESULT', relationships=None, confidence=43.0, text='GRS, GRS', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None), TBlock(geometry=None, id='d84596f0-3e59-4279-b907-f8f39a3b49dd', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['6e5deb40-4c90-47e7-b99d-933ac8c73231'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what are the standards of the certified weight?', alias='tc_certified_shipping_standards')),

    My result has no TPoints with coordinates. Maybe this helps

  • Answer from Textract with coordinates: TBlock(geometry=TGeometry(bounding_box=TBoundingBox(width=0.061864323914051056, height=0.010403391905128956, left=0.5233812928199768, top=0.3567923903465271), polygon=[TPoint(x=0.5233926177024841, y=0.3567923903465271), TPoint(x=0.5852456092834473, y=0.35685184597969055), TPoint(x=0.585235059261322, y=0.3671957850456238), TPoint(x=0.5233812928199768, y=0.367136150598526)]), id='d7fe92f2-c1d0-4298-857a-77cfd5d95c8e', block_type='QUERY_RESULT', relationships=None, confidence=94.0, text='803.28 kg', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None), TBlock(geometry=None, id='2aaa4f9f-6f0e-4ba0-8973-6a4462ca9bce', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['d7fe92f2-c1d0-4298-857a-77cfd5d95c8e'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what is the net shipping weight?', alias='tc_net_shipping_weight')),

anyaovi
asked a year ago314 views
1 Answer
0

On May 15, 2023, Amazon Textract's Query feature in the AnalyzeDocument API received an update that improved the quality of its machine-learning models 1. This reduced latency when using the AnalyzeDocument API with the Queries feature. Furthermore, the update improved the data extraction accuracy for 14 new document types. To take advantage of these improvements, please ensure that you have updated your AWS CLI/SDK to the latest version.

If the issue persists, I suggest opening a case with AWS Premium Support. Their team has access to internal tools that can help identify and resolve the root cause of the issue.

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions