Textract returning fields irrelevant to the block type

0

I'm seeing some odd behaviour with Textract when performing StartDocumentAnalysis operations, specifically with the TABLES feature selected.

In the result JSON, I'm getting all fields returned with null values where they are irrelevant to the block type. I've never seen this before, and causes the amazon-textract-response-parser to break.

For example:

{
  "AnalyzeDocumentModelVersion": "1.0",
  "Blocks": [
    {
      "BlockType": "PAGE",
      "ColumnIndex": null,
      "ColumnSpan": null,
      "Confidence": null,
      "EntityTypes": null,
      "Geometry": {
        "BoundingBox": {
          "Height": 1.0,
          "Left": 0.0,
          "Top": 0.0,
          "Width": 0.9997645020484924
        },
...

Where previously this would have been more like:

  "AnalyzeDocumentModelVersion": "1.0",
  "Blocks": [
    {
      "BlockType": "PAGE",
      "Geometry": {
        "BoundingBox": {
          "Height": 1.0,
          "Left": 0.0,
          "Top": 0.0,
          "Width": 0.9997645020484924
        },
...

What's going on here?

Edited to add: this is the eu-west-2 region.

已提问 2 年前301 查看次数
3 回答
0

Thank you for using AWS Textract. Sorry to hear that you are seeing a discrepancy in response when using TABLES feature. Can you please provide the region where you are operating in ? Thanks !

AWS
已回答 2 年前
0

To update this with some more information; the null values are returned when submitting a StartDocumentAnalysis call from a Lambda function, with the output configuration set to write the JSON to an S3 bucket. Running GetDocumentAnalysis for the same job ID gives the correct JSON output.

I would expect to get the same output for the same job ID.

已回答 2 年前
0

Hello, did you have a solution to this problem?

Rik
已回答 5 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则