AWS Textract not detecting Trademark or superscripts

0

Hi, I am using textract to extract text from Pdf file. I have an issue that most of the time Textract is not showing Trademark symbol in correct form. Some of the time it shows "R" in place of "®" and sometimes not showing anything. Please let me know how to handle this. As its important to capture this in text

Thanks

preguntada hace 4 meses146 visualizaciones
2 Respuestas
1

Hi Wagas,

To improve the detection of the trademark symbol (®) with AWS Textract, try the following:

  1. Increase PDF Resolution: Ensure your PDF is high resolution to enhance Textract's accuracy.
  2. Convert PDF to Image: Convert the PDF to a high-resolution image (e.g., PNG) before processing it with Textract.

These steps can help improve symbol detection.

If possible, please share the document so we can further investigate and provide more specific guidance.

profile picture
EXPERTO
respondido hace 4 meses
  • Basically i have pdf of 200 pages or 100 pages, so

    1. not possible to split pdf into images of 200 files
    2. sharing an example screenshot.
0

Hello Waqas,

As others have mentioned, to accurately provide a solution please do provide us with a sample of document of where you are seeing this issue so we can best assist.

As Vitor has mentioned, images do tend to work better for character recognition. Please try that as an alternative

AWS
respondido hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas