1 Answer
- Newest
- Most votes
- Most comments
1
Textract can have issues with certain special characters and does not handle hyphenated words across lines well. It also does not support the detection of italics.
These issues are inherent to Textract and there’s no direct way to overcome them within Textract itself. Post-processing the Textract output with custom code could help mitigate some of these issues.
Alternatives to Amazon Textract include Docparser, Nanonets, and Rossum. These services might offer better handling of the issues you’re facing with Textract. However, testing them with your specific use case is recommended.
Relevant content
- asked a month ago
- asked 4 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago