AWS Textract not detecting Trademark or superscripts

0

Hi, I am using textract to extract text from Pdf file. I have an issue that most of the time Textract is not showing Trademark symbol in correct form. Some of the time it shows "R" in place of "®" and sometimes not showing anything. Please let me know how to handle this. As its important to capture this in text

Thanks

Waqas
asked 4 months ago133 views
2 Answers
1

Hi Wagas,

To improve the detection of the trademark symbol (®) with AWS Textract, try the following:

  1. Increase PDF Resolution: Ensure your PDF is high resolution to enhance Textract's accuracy.
  2. Convert PDF to Image: Convert the PDF to a high-resolution image (e.g., PNG) before processing it with Textract.

These steps can help improve symbol detection.

If possible, please share the document so we can further investigate and provide more specific guidance.

profile picture
EXPERT
answered 4 months ago
  • Basically i have pdf of 200 pages or 100 pages, so

    1. not possible to split pdf into images of 200 files
    2. sharing an example screenshot.
0

Hello Waqas,

As others have mentioned, to accurately provide a solution please do provide us with a sample of document of where you are seeing this issue so we can best assist.

As Vitor has mentioned, images do tend to work better for character recognition. Please try that as an alternative

AWS
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions