Textract Throttling

0

So I just had my get document text increased to 35 but they kept everything else the same... Here is the error I still get: An error occurred (ProvisionedThroughputExceededException) when calling the GetDocumentTextDetection operation (reached max retries: 4): Provisioned rate exceeded

My documents are 1-3 pages long and im only using detect text funtionality. I don't understand why it takes 5 minutes to literally get 15 documents that are 1-3 pages long individually OCR'd

Bobby
已提問 2 年前檢視次數 1712 次
1 個回答
0

Hi,

If you're processing just 15 documents, but hitting 35+ TPS on GetDocumentTextDetection, I would think maybe you're polling for job completion and the polling retry configuration is a bit off?

In my experience yes, you should be able to achieve higher throughput than the 5min, 15doc, 1-3 pages you mention. But it's worth mentioning that response time through the async APIs can vary, and high throughput for many documents is a different topic from end-to-end response time for one small doc.

The more docs you process in parallel (the default quota limit for that is in the hundreds), the less appropriate polling will be as a strategy for fetching the results: Using SNS callbacks via Lambda will ensure you only try to fetch each result as it becomes available... And in that Lambda function you should still check your AWS SDK retry configuration (e.g. docs here for Python boto3), to try and avoid issues if e.g. many jobs happen to finish at once.

Code samples like the Amazon-Textract-Caller (not sure if you're using this?) try to offer a helpful utility for small-scale projects, but have to make a trade-off between keeping to a simple client-only solution, versus being scalable for bigger workloads.

Stacks like Large scale document processing with Amazon Textract optimize more for scalability, but it means you need to deploy some cloud components too (SNS, Lambda, SQS, etc).

For CDK infrastructure-as-code, you could check out the patterns in amazon-textract-idp-stack-samples or maybe (a bit more complex because it combines with post-processing models) amazon-textract-transformer-pipeline.

AWS
專家
Alex_T
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南