We have a pipeline to extract queries from PDF documents, async. Here's a sample from teh JSON
" block_type='QUERY_RESULT', relationships=None, confidence=77.0, text='1 NON-SENSITIVE', "
This confirms that our AWS pipeline is working for us.
However, now matter what combination of the sample AWS scripts we use, we get errors.
Anybody out there have an idea on how to format the extraction script so that it will work as AWS intends?
Input In [29], in <cell line: 19>()
16 print('d =', d)
18 #get_query_answers
---> 19 query_answers = d.get_query_answers(page=page)
21 #for x in query_answers:
22 # print(f"{image_filename},{x[1]},{x[2]}")
24 print(tabulate(query_answers, tablefmt="github"))
File ~/anaconda3/envs/aws-local/lib/python3.9/site-packages/trp/trp2.py:569, in TDocument.get_query_answers(self, page)
567 if answers:
568 for answer in answers:
--> 569 result_list.append([query.query.text, query.query.alias, answer.text])
570 else:
571 result_list.append([query.query.text, query.query.alias, ""])
AttributeError: 'NoneType' object has no attribute 'text'