- Newest
- Most votes
- Most comments
The blog post Didier linked is great, and demonstrates in particular that merging tables detected between tables is something you need to do in post-processing - Textract won't do it for you.
However it sounds like your problem might be more straightforward - just not able to iterate through all tables detected in multi-page documents? I can definitely confirm Textract returns these. From your code snippet I'd guess this is a bug somewhere, most likely in the getTableCsvResults
function that isn't shared?
You might be interested in our Amazon Textract Response Library for JS library to simplify your code parsing & navigating the returned blocks. That has built-in iterators like:
doc.listPages().forEach((page) => { const tables = page.listTables(); });
While the TRP library for Python already implements inter-page table merging, we don't have it yet in TRP.js... But if you're interested in that please do raise it as a feature request on the GitHub!
Hi,
I think that you want to read this blog detailing how to handle multi-page tables with AWS Textract: https://aws.amazon.com/blogs/machine-learning/postprocessing-with-amazon-textract-multi-page-table-handling/
Best
Didier
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 5 months ago
Thanks for your answers. I found the solution. I used the push method to append paginated responses to an array and it worked.