AWS Comprehend Custom Entity Recognition Error: The augmented manifest referenced in your InputDataConfig.AugmentedManifests at index 0 doesn't have any annotations.

0

Hello

I followed this page https://docs.aws.amazon.com/comprehend/latest/dg/cer-annotation-pdf.html to prepare data for the training. All data are collected in the dedicated S3 bucket.

When I try to create a model if fails with this error message: *"The augmented manifest referenced in your InputDataConfig.AugmentedManifests at index 0 doesn't have any annotations.Check the AttributeNames you provided for this Augmented Manifest. The AttributeNames in the API must match the top-level keys containing the annotation json or reference in json lines of the manifest file. , exit code: 255" *

I do not understand why it happens because I double checked the manifest file and it looks ok. For example sample line looks like this (here I used fake account id).

    "source-ref":"s3://comprehend-semi-structured-docs-us-east-1-11111111111/src/I9-001f.pdf",
    "page":"1",
    "metadata":{
        "pages":"3",
        "use-textract-only":false,
        "labels":[
            "LastName",
            "FirstName",
            "DocumentNumber"
        ]
    },
    "annotator-metadata":{
        "info":"I hope it will work",
        "Priority":"It has medium priority"
    },
    "first-job-for-Jacek-and-Matt-labeling-job-20230517T115059":{
        "annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I9-001f-1-7a2216d3-ann.json"
    },
    "first-job-for-Jacek-and-Matt-labeling-job-20230517T115059-metadata":{
        "type":"groundtruth/custom",
        "job-name":"first-job-for-jacek-and-matt-labeling-job-20230517t115059",
        "human-annotated":"yes",
        "creation-date":"2023-05-17T12:26:54.162000"
    }
    }

For model creation I use AWS Console with the following paths:

  • SageMaker Ground Truth augmented manifest file S3 location s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/manifests/output/output.manifest

  • S3 prefix for Annotation data files s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/

  • S3 prefix for Source documents s3://comprehend-semi-structured-docs-us-east-1-11111111111/src/

and the attribute name is first-job-for-Jacek-and-Matt-labeling-job-20230517T115059 - this attribute exists in the manifest file printed above.

Any ideas what I am doing wrong? I already spent a lot of time on this without success. Maybe the error message is misleading and the problem is somewhere else? Maybe this name is too long or has invalid chars?

-Jacek

  • I thought that maybe the attribute name should not contain suffix with date time but I watched this https://youtu.be/oDk5aOd400c?t=366 and they use the attribute as it is together with the date time suffix.

kicaj29
asked a year ago405 views
3 Answers
0

Hi Jacek

May you please share the contents of the "annotation-ref" json from the sample line you have shared above. It is located at: "s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I9-001f-1-7a2216d3-ann.json"

AWS
SUPPORT ENGINEER
Njabs
answered a year ago
0

In meantime I removed previous files but I am sharing new files even with less data but I still get the same error for these files when I use them to train a new model.

output.manifest

{"source-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/src_02/I-9_003.pdf","page":"1","metadata":{"pages":"1","use-textract-only":false,"labels":["LastName","FirstName"]},"annotator-metadata":{"info":"I hope it will work","Priority":"It has medium priority"},"job-for-labeling-02-labeling-job-20230518T063720":{"annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/output/job-for-labeling-02-labeling-job-20230518T063720/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I-9_003-1-1419c853-ann.json"},"job-for-labeling-02-labeling-job-20230518T063720-metadata":{"type":"groundtruth/custom","job-name":"job-for-labeling-02-labeling-job-20230518t063720","human-annotated":"yes","creation-date":"2023-05-18T06:41:59.467000"}}
{"source-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/src_02/I-9_002.pdf","page":"1","metadata":{"pages":"1","use-textract-only":false,"labels":["LastName","FirstName"]},"annotator-metadata":{"info":"I hope it will work","Priority":"It has medium priority"},"job-for-labeling-02-labeling-job-20230518T063720":{"annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/output/job-for-labeling-02-labeling-job-20230518T063720/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I-9_002-1-5dc996bf-ann.json"},"job-for-labeling-02-labeling-job-20230518T063720-metadata":{"type":"groundtruth/custom","job-name":"job-for-labeling-02-labeling-job-20230518t063720","human-annotated":"yes","creation-date":"2023-05-18T06:41:59.504000"}}
{"source-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/src_02/I-9_001.pdf","page":"1","metadata":{"pages":"1","use-textract-only":false,"labels":["LastName","FirstName"]},"annotator-metadata":{"info":"I hope it will work","Priority":"It has medium priority"},"job-for-labeling-02-labeling-job-20230518T063720":{"annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/output/job-for-labeling-02-labeling-job-20230518T063720/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I-9_001-1-5241949c-ann.json"},"job-for-labeling-02-labeling-job-20230518T063720-metadata":{"type":"groundtruth/custom","job-name":"job-for-labeling-02-labeling-job-20230518t063720","human-annotated":"yes","creation-date":"2023-05-18T06:43:05.079000"}}

I-9_003-1-1419c853-ann.json

{
	"Blocks": [
		{
			"BlockType": "LINE",
			"Id": "6f1f5d4b-c5bd-4474-9bfd-1a4aab73dead",
			"Text": "Employment Eligibility Verification USCIS",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.578890522875817,
					"Top": 0.03134469696969697,
					"Left": 0.32302287581699348,
					"Height": 0.016321969696969697
				},
				"Polygon": [
					{
						"X": 0.32302287581699348,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.32302287581699348,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"40893cf7-2e45-4e10-802a-c7c8b96dbe6b",
						"1516182f-5b28-4dc3-904e-8411f42be966",
						"96fca50a-f19f-40f0-8124-fcfacb999594",
						"2fb46811-30e5-4b48-9e69-e75946dacb2a"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "40893cf7-2e45-4e10-802a-c7c8b96dbe6b",
			"Text": "Employment",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.10801960784313726,
					"Top": 0.03251515151515151,
					"Left": 0.32302287581699348,
					"Height": 0.015151515151515152
				},
				"Polygon": [
					{
						"X": 0.32302287581699348,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.4310424836601307,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.4310424836601307,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.32302287581699348,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "1516182f-5b28-4dc3-904e-8411f42be966",
			"Text": "Eligibility",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.08268627450980393,
					"Top": 0.03251515151515151,
					"Left": 0.43594444444444449,
					"Height": 0.015151515151515152
				},
				"Polygon": [
					{
						"X": 0.43594444444444449,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.5186307189542484,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.5186307189542484,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.43594444444444449,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "96fca50a-f19f-40f0-8124-fcfacb999594",
			"Text": "Verification",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.10011764705882354,
					"Top": 0.03251515151515151,
					"Left": 0.5232777777777777,
					"Height": 0.015151515151515152
				},
				"Polygon": [
					{
						"X": 0.5232777777777777,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.6233954248366013,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.6233954248366013,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.5232777777777777,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "2fb46811-30e5-4b48-9e69-e75946dacb2a",
			"Text": "USCIS",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.05292647058823529,
					"Top": 0.03134469696969697,
					"Left": 0.8489869281045752,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.8489869281045752,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.045284090909090909
					},
					{
						"X": 0.8489869281045752,
						"Y": 0.045284090909090909
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "89f02916-fe15-4221-b583-308e502dd7ed",
			"Text": "Form I-9",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.0694117647058825,
					"Top": 0.04801136363636364,
					"Left": 0.8409477124183006,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.8409477124183006,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.061950757575757579
					},
					{
						"X": 0.8409477124183006,
						"Y": 0.061950757575757579
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"8c93cfde-2682-4a58-a72a-2956699f5f35",
						"0aa40458-88cf-4244-944f-c2edf2bd12af"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "8c93cfde-2682-4a58-a72a-2956699f5f35",
			"Text": "Form",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.04307843137254902,
					"Top": 0.04801136363636364,
					"Left": 0.8409477124183006,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.8409477124183006,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.8840261437908497,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.8840261437908497,
						"Y": 0.061950757575757579
					},
					{
						"X": 0.8409477124183006,
						"Y": 0.061950757575757579
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "0aa40458-88cf-4244-944f-c2edf2bd12af",
			"Text": "I-9",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.02215686274509804,
					"Top": 0.04801136363636364,
					"Left": 0.888202614379085,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.888202614379085,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.061950757575757579
					},
					{
						"X": 0.888202614379085,
						"Y": 0.061950757575757579
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "a676e7a0-71e5-4b6c-9ecb-30f0ac39a4aa",
			"Text": "Department of Homeland Security",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.2651764705882353,
					"Top": 0.05422348484848485,
					"Left": 0.3379248366013072,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.3379248366013072,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.3379248366013072,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"34c1d0c0-b7b2-4b84-a6da-a0306eb18d54",
						"adecd3fc-d54e-423d-80f4-5115fd76801f",
						"c7135e9e-d61a-49ee-809a-5e167005a3fc",
						"7658976a-32a0-4674-b257-97a6d83b2a8d"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "34c1d0c0-b7b2-4b84-a6da-a0306eb18d54",
			"Text": "Department",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.09284803921568628,
					"Top": 0.05422348484848485,
					"Left": 0.3379248366013072,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.3379248366013072,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4307728758169935,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4307728758169935,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.3379248366013072,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "adecd3fc-d54e-423d-80f4-5115fd76801f",
			"Text": "of",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.015026143790849673,
					"Top": 0.05422348484848485,
					"Left": 0.43533660130718956,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.43533660130718956,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4503627450980392,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4503627450980392,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.43533660130718956,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "c7135e9e-d61a-49ee-809a-5e167005a3fc",
			"Text": "Homeland",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.08002124183006536,
					"Top": 0.05422348484848485,
					"Left": 0.45472875816993466,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.45472875816993466,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.53475,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.53475,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.45472875816993466,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "7658976a-32a0-4674-b257-97a6d83b2a8d",
			"Text": "Security",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.06409313725490197,
					"Top": 0.05422348484848485,
					"Left": 0.5390081699346405,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.5390081699346405,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.5390081699346405,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "932b6392-79b7-4ebc-8c2c-308c27776c3a",
			"Text": "OMB No. 1615-0047",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.11224836601307187,
					"Top": 0.06476893939393939,
					"Left": 0.8193790849673203,
					"Height": 0.010151515151515148
				},
				"Polygon": [
					{
						"X": 0.8193790849673203,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8193790849673203,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"2ae600f4-2715-4c8f-af20-915ef9fcf1a8",
						"a9f4d9a4-6ffe-4f8e-b2ab-6459350f7ce1",
						"73f669bf-ecc4-4b75-9e7f-6fcf8fa030d8"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "2ae600f4-2715-4c8f-af20-915ef9fcf1a8",
			"Text": "OMB",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.029926470588235299,
					"Top": 0.06476893939393939,
					"Left": 0.8193790849673203,
					"Height": 0.010151515151515151
				},
				"Polygon": [
					{
						"X": 0.8193790849673203,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8493055555555555,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8493055555555555,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8193790849673203,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "a9f4d9a4-6ffe-4f8e-b2ab-6459350f7ce1",
			"Text": "No.",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.019142156862745099,
					"Top": 0.06476893939393939,
					"Left": 0.8526813725490197,
					"Height": 0.010151515151515151
				},
				"Polygon": [
					{
						"X": 0.8526813725490197,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8718235294117648,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8718235294117648,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8526813725490197,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "73f669bf-ecc4-4b75-9e7f-6fcf8fa030d8",
			"Text": "1615-0047",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.056625816993464059,
					"Top": 0.06476893939393939,
					"Left": 0.8750016339869281,
					"Height": 0.010151515151515151
				},
				"Polygon": [
					{
						"X": 0.8750016339869281,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8750016339869281,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "399fbd18-1ee5-47bc-ae34-ca318957f363",
			"Text": "U.S. Citizenship and Immigration Services",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.3084346405228758,
					"Top": 0.07104166666666667,
					"Le
kicaj29
answered a year ago
0

I see that the content of file I-9_003-1-1419c853-ann.json has been cropped so I am adding here some lines from the bottom of the file.

			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "1a6b5a92-0ded-489c-a299-dd883b778179",
			"Text": "3",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.007352941176470588,
					"Top": 0.9692727272727273,
					"Left": 0.9340653594771242,
					"Height": 0.011363636363636364
				},
				"Polygon": [
					{
						"X": 0.9340653594771242,
						"Y": 0.9692727272727273
					},
					{
						"X": 0.9414183006535948,
						"Y": 0.9692727272727273
					},
					{
						"X": 0.9414183006535948,
						"Y": 0.9806363636363636
					},
					{
						"X": 0.9340653594771242,
						"Y": 0.9806363636363636
					}
				]
			},
			"Relationships": [],
			"Page": 1
		}
	],
	"BlocksS3Ref": "s3://comprehend-semi-structured-docs-us-east-1-111111111111/comprehend-semi-structured-docs-intermediate-output/job-for-labeling-02-labeling-job-20230518t063720/I-9_003_1_blocks.json",
	"DocumentMetadata": {
		"Pages": "1",
		"PageNumber": "1"
	},
	"Version": "2021-04-30",
	"DocumentType": "NativePDF",
	"Entities": [
		{
			"BlockReferences": [
				{
					"BlockId": "3e4881f0-fa14-4a72-b31f-b3323817a36f",
					"ChildBlocks": [
						{
							"BeginOffset": 0,
							"EndOffset": 35,
							"ChildBlockId": "94572034-c5d3-4b33-beec-9485d5ddcbeb"
						}
					],
					"BeginOffset": 0,
					"EndOffset": 35
				}
			],
			"Text": "Wolfeschlegelsteinhausenbergerdorff",
			"Type": "LastName",
			"Score": 1
		},
		{
			"BlockReferences": [
				{
					"BlockId": "3e4881f0-fa14-4a72-b31f-b3323817a36f",
					"ChildBlocks": [
						{
							"BeginOffset": 0,
							"EndOffset": 6,
							"ChildBlockId": "025da007-2c08-4cc7-9b47-ee08e99b9a52"
						}
					],
					"BeginOffset": 36,
					"EndOffset": 42
				}
			],
			"Text": "Hubert",
			"Type": "FirstName",
			"Score": 1
		}
	],
	"File": "I-9_003-1-1419c853-ann.json"
}
kicaj29
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions