Using Textract for a table of contents where each line has** TITLE . . . . Author PageNo.** Resultant table has Title and Author merges ignoring dot-leader as one column and page numbers has 2nd column. How can I get Textract to treat dot-leader as a column separator

2 Answers

Could you provide sample image for better understanding?

Would it be feasible to process text before textract ? So you could insert some kind of well known separator to be easily recognized by the ML behind it.

