How to formulate Textract queries for documents in languages other than English?

0

I have been testing Textract on German insurance documents and text extraction seems to work pretty well. However, I'm not sure what is the best practice for extracting values using queries for best results?

  • Shall the queries be formulated in English? (e.g. "What is the customer's insurance policy number?")
  • Shall the queries be formulated in English with German field names? (e.g. "What is the customer's Versicherungsscheinnummer or VSNR?")
  • Shall the queries be completely formulated in German?

Thank you for your tips!

질문됨 일 년 전614회 조회
3개 답변
1
수락된 답변

At the moment Textract Queries officially only supports English (see Textract Quotas). However, I have seen success in using other languages as well, but there is no official support and no official statement on what works best. At the moment testing and evaluating is your best option.

AWS
답변함 일 년 전
profile picture
전문가
검토됨 7시간 전
  • Thank you! Just to avoid opening yet another related question: the Quotas don't say anything about language limitations of receipt analysis API. Does it mean that receipts in German are officially supported?

1

Hello I had the same problem as you but with French. In the beginning, my query was " What is the city or zip-code on this page? " but the query couldn't find the city on every page. So I try to be more explicit and finally, my new query is " This page is part of a French contract for selling a house. What is the city or zip-code of this house? " and it works all the time. You need to talk to Textract.

답변함 9달 전
  • This is fascinating, we'll try to add some context to our queries, thanks! Never thought that would make a (positive) difference.

  • That is proper prompt-engineering. Thx @Andemion!

0

The results of some limited testing on German documents:

  • Making queries in German seems to be possible, but generally leads to worse confidence
  • Interestingly, possessives improve confidence (company's name vs. company name)
  • Making queries in English with German nouns seems to give best confidence
  • Making English queries with translated nouns doesn't seem to work
  • Umlauts in queries are not supported

Not bad for an easy start!

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠