Textract returning fields irrelevant to the block type

0

I'm seeing some odd behaviour with Textract when performing StartDocumentAnalysis operations, specifically with the TABLES feature selected.

In the result JSON, I'm getting all fields returned with null values where they are irrelevant to the block type. I've never seen this before, and causes the amazon-textract-response-parser to break.

For example:

{
  "AnalyzeDocumentModelVersion": "1.0",
  "Blocks": [
    {
      "BlockType": "PAGE",
      "ColumnIndex": null,
      "ColumnSpan": null,
      "Confidence": null,
      "EntityTypes": null,
      "Geometry": {
        "BoundingBox": {
          "Height": 1.0,
          "Left": 0.0,
          "Top": 0.0,
          "Width": 0.9997645020484924
        },
...

Where previously this would have been more like:

  "AnalyzeDocumentModelVersion": "1.0",
  "Blocks": [
    {
      "BlockType": "PAGE",
      "Geometry": {
        "BoundingBox": {
          "Height": 1.0,
          "Left": 0.0,
          "Top": 0.0,
          "Width": 0.9997645020484924
        },
...

What's going on here?

Edited to add: this is the eu-west-2 region.

已提問 2 年前檢視次數 301 次
3 個答案
0

Thank you for using AWS Textract. Sorry to hear that you are seeing a discrepancy in response when using TABLES feature. Can you please provide the region where you are operating in ? Thanks !

AWS
已回答 2 年前
0

To update this with some more information; the null values are returned when submitting a StartDocumentAnalysis call from a Lambda function, with the output configuration set to write the JSON to an S3 bucket. Running GetDocumentAnalysis for the same job ID gives the correct JSON output.

I would expect to get the same output for the same job ID.

已回答 2 年前
0

Hello, did you have a solution to this problem?

Rik
已回答 5 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南