Document cleaning tools for AWS Textract

0

Hey, we were interested in using Amazon Textract to extract tables from our financial documents. However, we noticed that it falters when the document has watermarkd. Does AWS offer document cleaning tools that can work with AWS textract? Are there any AWS tools to remove watermarks from documents or enhance document quality that can be leveraged prior to uploading documents to Textract.

Shivam
질문됨 7달 전265회 조회
1개 답변
2

While AWS doesn’t have any explicit tools for watermark removal or advanced document quality enhancement, there are methods you can use to enhance the documents that being input to Textract. Some options available to increase document quality are firstly to make sure you are using high quality images (resolution of at least 150 DPI), use supported formats of PDF, TIFF, JPEG or PNG and to try to avoid converting or down sampling the documents before uploading to Amazon Textract

In terms of extracting text from tables in these documents you can also make sure to clearly separate the tables, and ensure the text in the tables is not rotated and is in fact up right. You may also bump into some challenges if you're dealing with merged table cells spanning multiple columns or tables with inconsistent cell structures. In these cases, consider using the text detection tool from Textract as a workaround. For more information check out the best practices for queries in Textract.

AWS
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠