Textract : which Font is best recognized

0

Hi, I'm using Textract to read certain forms where Textract mixes up these cases - 1 & I 0 & O S & 5 Z & 2 i & 1 ....

Can some one clarify if there are any fonts in which the Textract would mix up these cases? The Best Practices page for Textract(https://docs.aws.amazon.com/textract/latest/dg/textract-best-practices.html) did not provide such a recommendation.

Thanks in advance. Ed.

asked a month ago44 views
1 Answer
1

While Amazon Textract is designed to recognize a wide variety of fonts and text styles, there isn't a specific font recommendation that guarantees perfect recognition for the character pairs you mentioned (1 & I, 0 & O, S & 5, Z & 2, i & 1). Textract uses advanced machine learning algorithms to detect and recognize text, but it can still face challenges with certain character combinations, especially when they are visually similar.

To improve recognition accuracy, consider the following recommendations:

  1. Image Quality: Ensure that the documents or images you're processing are of high quality. Clear, well-lit, and properly focused images can significantly improve recognition accuracy.

  2. Font Choice: While there's no "best" font for Textract, using clear, sans-serif fonts with distinct character shapes can help. Avoid stylized or decorative fonts that might make character distinction more difficult.

  3. Character Spacing: Ensure adequate spacing between characters to help Textract distinguish them more easily.

  4. Document Layout: Make sure the text is well-separated from other elements on the document, such as lines, boxes, or images.

  5. Preprocessing: Before submitting images to Textract, you might consider applying image enhancement techniques to improve text clarity.

  6. Post-processing: Implement post-processing logic in your application to handle known character confusions. For example, you could use context or expected patterns to disambiguate between similar characters.

  7. Custom Queries: For specific data fields where you know the expected format (e.g., dates, ID numbers), you can use Textract's custom query feature to target and extract particular information patterns.

If you continue to experience issues with specific document types or formats, you may want to consider using Amazon Textract's adapter training feature. This allows you to create a custom model tailored to your specific document types, which could potentially improve recognition for your particular use case.

Remember that while these strategies can help improve results, some level of error or confusion may still occur, especially with visually similar characters. It's important to implement appropriate error checking and validation in your application to handle these cases.

Sources
Amazon Textract Features | AWS
Best Practices - Amazon Textract
Amazon Textract Training | AWS re:Post

profile picture
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions