- 最新
- 最多得票
- 最多評論
Hi,
If you want to extract the structure of the document, the best way would be to use the AnalyzeDocument API, it will extract the different relations and structural element such as Table, Key Value pair, ...
However if you want to only use the DetectText Apis, you will get the bounding box coordinate for each of the WORD or LINE detected, which you can use to reconstruct the document by placing the text in it's original position. (https://docs.aws.amazon.com/textract/latest/dg/how-it-works-document-layout.html the information is in Geometry
) With this you will just have the text and no information regarding the Table structured or any other information that was previously in the document.
Regarding your second question, Textract doesn't do document conversion, we are extracting text and structure information from the document, but we are not recreating a document similar to the one that you sent.
I hope it helps. Happy New Year to you as well :)
Not being a developer, it's a bit complicated for me. May I ask where you have to put the Json code? I thought there was a link where you put the pdf file to get the ocr. Thanks for the answer though :)
相關內容
- 已提問 1 年前
- AWS 官方已更新 2 年前
hey, may I know if you finally figure it out? I have a similar requirement with you. thanks