Textract Not Read Bold Clear Computer Generated 'X' . Need Suggestions on how to tackle this.

0

Use Case : We are extracting data from a table to convert it into a json file

Issue : Despite of bold and clear letters , textract is unable to read ' X ' from the table elements (pls check the attachment for better reference [image as well as json , where the data is missing (For Ex. The 'X' besides Demographics is not being captured )])

Textract API Used : response = client.analyze_document(Document={'Bytes': bytes_test}, FeatureTypes=['TABLES'])

Remedies Tried : adjusting the image threshold, de-blurring , adjusting contrast, cropping and magnifying the object in the page with no visible success I would greatly appreciate it if you could provide guidance, suggest best practices, or help troubleshoot the issues we are experiencing. Your expertise and assistance are crucial to the success of our project. Enter image description here
Enter image description here

질문됨 6달 전242회 조회
1개 답변
0

Thank you for using Textract and bringing this issue to our notice. With machine learning models, we cannot guarantee 100% accuracy. We are continuously improving the accuracy of our models in response to our customer feedback.

As for now, these Xs can be either returned as a selection element or be returned as a word when they are detected. We recommend to add some post processing logic to for addressing this issue.

AWS
Alan_L
답변함 6달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠