1 Answer
- Newest
- Most votes
- Most comments
0
Hi,
I did a quick test strikethrough doesn't seem to work in PDF. But if you manage to convert it into an Image it works well (at least for my example below)
Thanks, Rama
Relevant content
- asked 4 months ago
- Accepted Answerasked a year ago
- asked 4 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Thank you! I'm not sure I understand. From what I can see, Textract is reading and including the strikethrough text. I want to do the exact opposite. I want option to remove the strikethrough text from extraction. Or, at least identify that there is strikethrough text. Is this possible?
Apologies, I didn't understand your question correctly. In this case, you may want to try Claude Sonnet 3 or the newer 3.5. It was able to detect strikethrough text. I think you discussed this here https://repost.aws/questions/QUyLEIglprTPiGEZN1L6j2eg/extracting-data-from-pdf-that-contains-strikeout-text-using-amazon-textract-in-python