AWS Comprehend Custom Entity Recognition Error: The augmented manifest referenced in your InputDataConfig.AugmentedManifests at index 0 doesn't have any annotations.

0

Hello

I followed this page https://docs.aws.amazon.com/comprehend/latest/dg/cer-annotation-pdf.html to prepare data for the training. All data are collected in the dedicated S3 bucket.

When I try to create a model if fails with this error message: *"The augmented manifest referenced in your InputDataConfig.AugmentedManifests at index 0 doesn't have any annotations.Check the AttributeNames you provided for this Augmented Manifest. The AttributeNames in the API must match the top-level keys containing the annotation json or reference in json lines of the manifest file. , exit code: 255" *

I do not understand why it happens because I double checked the manifest file and it looks ok. For example sample line looks like this (here I used fake account id).

    "source-ref":"s3://comprehend-semi-structured-docs-us-east-1-11111111111/src/I9-001f.pdf",
    "page":"1",
    "metadata":{
        "pages":"3",
        "use-textract-only":false,
        "labels":[
            "LastName",
            "FirstName",
            "DocumentNumber"
        ]
    },
    "annotator-metadata":{
        "info":"I hope it will work",
        "Priority":"It has medium priority"
    },
    "first-job-for-Jacek-and-Matt-labeling-job-20230517T115059":{
        "annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I9-001f-1-7a2216d3-ann.json"
    },
    "first-job-for-Jacek-and-Matt-labeling-job-20230517T115059-metadata":{
        "type":"groundtruth/custom",
        "job-name":"first-job-for-jacek-and-matt-labeling-job-20230517t115059",
        "human-annotated":"yes",
        "creation-date":"2023-05-17T12:26:54.162000"
    }
    }

For model creation I use AWS Console with the following paths:

  • SageMaker Ground Truth augmented manifest file S3 location s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/manifests/output/output.manifest

  • S3 prefix for Annotation data files s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/

  • S3 prefix for Source documents s3://comprehend-semi-structured-docs-us-east-1-11111111111/src/

and the attribute name is first-job-for-Jacek-and-Matt-labeling-job-20230517T115059 - this attribute exists in the manifest file printed above.

Any ideas what I am doing wrong? I already spent a lot of time on this without success. Maybe the error message is misleading and the problem is somewhere else? Maybe this name is too long or has invalid chars?

-Jacek

  • I thought that maybe the attribute name should not contain suffix with date time but I watched this https://youtu.be/oDk5aOd400c?t=366 and they use the attribute as it is together with the date time suffix.

kicaj29
gefragt vor einem Jahr412 Aufrufe
3 Antworten
0

Hi Jacek

May you please share the contents of the "annotation-ref" json from the sample line you have shared above. It is located at: "s3://comprehend-semi-structured-docs-us-east-1-11111111111/output/first-job-for-Jacek-and-Matt-labeling-job-20230517T115059/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I9-001f-1-7a2216d3-ann.json"

AWS
SUPPORT-TECHNIKER
Njabs
beantwortet vor einem Jahr
0

In meantime I removed previous files but I am sharing new files even with less data but I still get the same error for these files when I use them to train a new model.

output.manifest

{"source-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/src_02/I-9_003.pdf","page":"1","metadata":{"pages":"1","use-textract-only":false,"labels":["LastName","FirstName"]},"annotator-metadata":{"info":"I hope it will work","Priority":"It has medium priority"},"job-for-labeling-02-labeling-job-20230518T063720":{"annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/output/job-for-labeling-02-labeling-job-20230518T063720/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I-9_003-1-1419c853-ann.json"},"job-for-labeling-02-labeling-job-20230518T063720-metadata":{"type":"groundtruth/custom","job-name":"job-for-labeling-02-labeling-job-20230518t063720","human-annotated":"yes","creation-date":"2023-05-18T06:41:59.467000"}}
{"source-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/src_02/I-9_002.pdf","page":"1","metadata":{"pages":"1","use-textract-only":false,"labels":["LastName","FirstName"]},"annotator-metadata":{"info":"I hope it will work","Priority":"It has medium priority"},"job-for-labeling-02-labeling-job-20230518T063720":{"annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/output/job-for-labeling-02-labeling-job-20230518T063720/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I-9_002-1-5dc996bf-ann.json"},"job-for-labeling-02-labeling-job-20230518T063720-metadata":{"type":"groundtruth/custom","job-name":"job-for-labeling-02-labeling-job-20230518t063720","human-annotated":"yes","creation-date":"2023-05-18T06:41:59.504000"}}
{"source-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/src_02/I-9_001.pdf","page":"1","metadata":{"pages":"1","use-textract-only":false,"labels":["LastName","FirstName"]},"annotator-metadata":{"info":"I hope it will work","Priority":"It has medium priority"},"job-for-labeling-02-labeling-job-20230518T063720":{"annotation-ref":"s3://comprehend-semi-structured-docs-us-east-1-111111111111/output/job-for-labeling-02-labeling-job-20230518T063720/annotations/consolidated-annotation/consolidation-response/iteration-1/annotations/I-9_001-1-5241949c-ann.json"},"job-for-labeling-02-labeling-job-20230518T063720-metadata":{"type":"groundtruth/custom","job-name":"job-for-labeling-02-labeling-job-20230518t063720","human-annotated":"yes","creation-date":"2023-05-18T06:43:05.079000"}}

I-9_003-1-1419c853-ann.json

{
	"Blocks": [
		{
			"BlockType": "LINE",
			"Id": "6f1f5d4b-c5bd-4474-9bfd-1a4aab73dead",
			"Text": "Employment Eligibility Verification USCIS",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.578890522875817,
					"Top": 0.03134469696969697,
					"Left": 0.32302287581699348,
					"Height": 0.016321969696969697
				},
				"Polygon": [
					{
						"X": 0.32302287581699348,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.32302287581699348,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"40893cf7-2e45-4e10-802a-c7c8b96dbe6b",
						"1516182f-5b28-4dc3-904e-8411f42be966",
						"96fca50a-f19f-40f0-8124-fcfacb999594",
						"2fb46811-30e5-4b48-9e69-e75946dacb2a"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "40893cf7-2e45-4e10-802a-c7c8b96dbe6b",
			"Text": "Employment",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.10801960784313726,
					"Top": 0.03251515151515151,
					"Left": 0.32302287581699348,
					"Height": 0.015151515151515152
				},
				"Polygon": [
					{
						"X": 0.32302287581699348,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.4310424836601307,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.4310424836601307,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.32302287581699348,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "1516182f-5b28-4dc3-904e-8411f42be966",
			"Text": "Eligibility",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.08268627450980393,
					"Top": 0.03251515151515151,
					"Left": 0.43594444444444449,
					"Height": 0.015151515151515152
				},
				"Polygon": [
					{
						"X": 0.43594444444444449,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.5186307189542484,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.5186307189542484,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.43594444444444449,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "96fca50a-f19f-40f0-8124-fcfacb999594",
			"Text": "Verification",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.10011764705882354,
					"Top": 0.03251515151515151,
					"Left": 0.5232777777777777,
					"Height": 0.015151515151515152
				},
				"Polygon": [
					{
						"X": 0.5232777777777777,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.6233954248366013,
						"Y": 0.03251515151515151
					},
					{
						"X": 0.6233954248366013,
						"Y": 0.04766666666666666
					},
					{
						"X": 0.5232777777777777,
						"Y": 0.04766666666666666
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "2fb46811-30e5-4b48-9e69-e75946dacb2a",
			"Text": "USCIS",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.05292647058823529,
					"Top": 0.03134469696969697,
					"Left": 0.8489869281045752,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.8489869281045752,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.03134469696969697
					},
					{
						"X": 0.9019133986928105,
						"Y": 0.045284090909090909
					},
					{
						"X": 0.8489869281045752,
						"Y": 0.045284090909090909
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "89f02916-fe15-4221-b583-308e502dd7ed",
			"Text": "Form I-9",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.0694117647058825,
					"Top": 0.04801136363636364,
					"Left": 0.8409477124183006,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.8409477124183006,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.061950757575757579
					},
					{
						"X": 0.8409477124183006,
						"Y": 0.061950757575757579
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"8c93cfde-2682-4a58-a72a-2956699f5f35",
						"0aa40458-88cf-4244-944f-c2edf2bd12af"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "8c93cfde-2682-4a58-a72a-2956699f5f35",
			"Text": "Form",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.04307843137254902,
					"Top": 0.04801136363636364,
					"Left": 0.8409477124183006,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.8409477124183006,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.8840261437908497,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.8840261437908497,
						"Y": 0.061950757575757579
					},
					{
						"X": 0.8409477124183006,
						"Y": 0.061950757575757579
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "0aa40458-88cf-4244-944f-c2edf2bd12af",
			"Text": "I-9",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.02215686274509804,
					"Top": 0.04801136363636364,
					"Left": 0.888202614379085,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.888202614379085,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.04801136363636364
					},
					{
						"X": 0.9103594771241831,
						"Y": 0.061950757575757579
					},
					{
						"X": 0.888202614379085,
						"Y": 0.061950757575757579
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "a676e7a0-71e5-4b6c-9ecb-30f0ac39a4aa",
			"Text": "Department of Homeland Security",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.2651764705882353,
					"Top": 0.05422348484848485,
					"Left": 0.3379248366013072,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.3379248366013072,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.3379248366013072,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"34c1d0c0-b7b2-4b84-a6da-a0306eb18d54",
						"adecd3fc-d54e-423d-80f4-5115fd76801f",
						"c7135e9e-d61a-49ee-809a-5e167005a3fc",
						"7658976a-32a0-4674-b257-97a6d83b2a8d"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "34c1d0c0-b7b2-4b84-a6da-a0306eb18d54",
			"Text": "Department",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.09284803921568628,
					"Top": 0.05422348484848485,
					"Left": 0.3379248366013072,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.3379248366013072,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4307728758169935,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4307728758169935,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.3379248366013072,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "adecd3fc-d54e-423d-80f4-5115fd76801f",
			"Text": "of",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.015026143790849673,
					"Top": 0.05422348484848485,
					"Left": 0.43533660130718956,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.43533660130718956,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4503627450980392,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.4503627450980392,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.43533660130718956,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "c7135e9e-d61a-49ee-809a-5e167005a3fc",
			"Text": "Homeland",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.08002124183006536,
					"Top": 0.05422348484848485,
					"Left": 0.45472875816993466,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.45472875816993466,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.53475,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.53475,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.45472875816993466,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "7658976a-32a0-4674-b257-97a6d83b2a8d",
			"Text": "Security",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.06409313725490197,
					"Top": 0.05422348484848485,
					"Left": 0.5390081699346405,
					"Height": 0.013939393939393939
				},
				"Polygon": [
					{
						"X": 0.5390081699346405,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.05422348484848485
					},
					{
						"X": 0.6031013071895425,
						"Y": 0.06816287878787879
					},
					{
						"X": 0.5390081699346405,
						"Y": 0.06816287878787879
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "932b6392-79b7-4ebc-8c2c-308c27776c3a",
			"Text": "OMB No. 1615-0047",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.11224836601307187,
					"Top": 0.06476893939393939,
					"Left": 0.8193790849673203,
					"Height": 0.010151515151515148
				},
				"Polygon": [
					{
						"X": 0.8193790849673203,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8193790849673203,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [
				{
					"Ids": [
						"2ae600f4-2715-4c8f-af20-915ef9fcf1a8",
						"a9f4d9a4-6ffe-4f8e-b2ab-6459350f7ce1",
						"73f669bf-ecc4-4b75-9e7f-6fcf8fa030d8"
					],
					"Type": "CHILD"
				}
			],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "2ae600f4-2715-4c8f-af20-915ef9fcf1a8",
			"Text": "OMB",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.029926470588235299,
					"Top": 0.06476893939393939,
					"Left": 0.8193790849673203,
					"Height": 0.010151515151515151
				},
				"Polygon": [
					{
						"X": 0.8193790849673203,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8493055555555555,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8493055555555555,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8193790849673203,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "a9f4d9a4-6ffe-4f8e-b2ab-6459350f7ce1",
			"Text": "No.",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.019142156862745099,
					"Top": 0.06476893939393939,
					"Left": 0.8526813725490197,
					"Height": 0.010151515151515151
				},
				"Polygon": [
					{
						"X": 0.8526813725490197,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8718235294117648,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.8718235294117648,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8526813725490197,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "73f669bf-ecc4-4b75-9e7f-6fcf8fa030d8",
			"Text": "1615-0047",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.056625816993464059,
					"Top": 0.06476893939393939,
					"Left": 0.8750016339869281,
					"Height": 0.010151515151515151
				},
				"Polygon": [
					{
						"X": 0.8750016339869281,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.06476893939393939
					},
					{
						"X": 0.9316274509803921,
						"Y": 0.07492045454545454
					},
					{
						"X": 0.8750016339869281,
						"Y": 0.07492045454545454
					}
				]
			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "LINE",
			"Id": "399fbd18-1ee5-47bc-ae34-ca318957f363",
			"Text": "U.S. Citizenship and Immigration Services",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.3084346405228758,
					"Top": 0.07104166666666667,
					"Le
kicaj29
beantwortet vor einem Jahr
0

I see that the content of file I-9_003-1-1419c853-ann.json has been cropped so I am adding here some lines from the bottom of the file.

			},
			"Relationships": [],
			"Page": 1
		},
		{
			"BlockType": "WORD",
			"Id": "1a6b5a92-0ded-489c-a299-dd883b778179",
			"Text": "3",
			"Geometry": {
				"BoundingBox": {
					"Width": 0.007352941176470588,
					"Top": 0.9692727272727273,
					"Left": 0.9340653594771242,
					"Height": 0.011363636363636364
				},
				"Polygon": [
					{
						"X": 0.9340653594771242,
						"Y": 0.9692727272727273
					},
					{
						"X": 0.9414183006535948,
						"Y": 0.9692727272727273
					},
					{
						"X": 0.9414183006535948,
						"Y": 0.9806363636363636
					},
					{
						"X": 0.9340653594771242,
						"Y": 0.9806363636363636
					}
				]
			},
			"Relationships": [],
			"Page": 1
		}
	],
	"BlocksS3Ref": "s3://comprehend-semi-structured-docs-us-east-1-111111111111/comprehend-semi-structured-docs-intermediate-output/job-for-labeling-02-labeling-job-20230518t063720/I-9_003_1_blocks.json",
	"DocumentMetadata": {
		"Pages": "1",
		"PageNumber": "1"
	},
	"Version": "2021-04-30",
	"DocumentType": "NativePDF",
	"Entities": [
		{
			"BlockReferences": [
				{
					"BlockId": "3e4881f0-fa14-4a72-b31f-b3323817a36f",
					"ChildBlocks": [
						{
							"BeginOffset": 0,
							"EndOffset": 35,
							"ChildBlockId": "94572034-c5d3-4b33-beec-9485d5ddcbeb"
						}
					],
					"BeginOffset": 0,
					"EndOffset": 35
				}
			],
			"Text": "Wolfeschlegelsteinhausenbergerdorff",
			"Type": "LastName",
			"Score": 1
		},
		{
			"BlockReferences": [
				{
					"BlockId": "3e4881f0-fa14-4a72-b31f-b3323817a36f",
					"ChildBlocks": [
						{
							"BeginOffset": 0,
							"EndOffset": 6,
							"ChildBlockId": "025da007-2c08-4cc7-9b47-ee08e99b9a52"
						}
					],
					"BeginOffset": 36,
					"EndOffset": 42
				}
			],
			"Text": "Hubert",
			"Type": "FirstName",
			"Score": 1
		}
	],
	"File": "I-9_003-1-1419c853-ann.json"
}
kicaj29
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen