How can I improve the retrieval speed of Textract document analysis?

Question

I am using a Python lambda which is triggered by the Textract SNS message to read the results of my document analysis. Most of my PDFs that I process on a daily basis are only a few pages long, so the time to get the results is not an issue.
However I have a new use case where I am processing some documents that are up to 50 pages long. The time to retrieve all of the data from Textract using the get_document_analysis function and the NextTokens is now taking several minutes. Is there a way I can retrieve the data any faster? I want to avoid the overhead of splitting up the document into smaller chunks if at all possible, I would prefer to solve this issue after Textract rather than before in the execution cycle.

Any insight is appreciated, thank you.

Answer

I understand that you require guidance regarding your use case where the retrieval time of get_document_analysis function while processing documents with larger page sizes long, is much higher than expected.

There are different ways to improve the performance of your function specifially while calling a downstream api endpoint -

- Increasing the memory of your function, which can range between between 128 MB and 10,240 MB. Please note that memory proportionally increases the amount of CPU, increasing the overall computational power available. If a function is CPU-, network- or memory-bound, then changing the memory setting can dramatically improve its performance.
You will be able to locate the memory consumption of your function from its CloudWatch invocation logs. You need to monitor the 'Max Memory Used'.

```
...
REPORT RequestId: 1234abcd-1730-4395-85cb-4f71bd2b87b8  Duration: 79.67 ms  Billed Duration: 80 ms  Memory Size: 128 MB  Max Memory Used: 73 MB
```

- Using the Static initialization to define the clients to connect any downstream API or Database. The static init code gets executed before the handler code starts running in a function. This is the “INIT” code is often used to import libraries and dependencies, set up configuration and initialize connections to other services.

However, for in-depth troubleshooting, I would request you to create a support case with our Premium Support - Lambda technical team. Please navigate to Support Center and follow the steps as suggested following this [link](https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case). Please add the code snippet used in your function along with the complete invocation logs with proper timestamp so that assigned engineer can assist you better with all the data at hand.

How can I improve the retrieval speed of Textract document analysis?

Relevant content