Textract Throttling

0

So I just had my get document text increased to 35 but they kept everything else the same... Here is the error I still get: An error occurred (ProvisionedThroughputExceededException) when calling the GetDocumentTextDetection operation (reached max retries: 4): Provisioned rate exceeded

My documents are 1-3 pages long and im only using detect text funtionality. I don't understand why it takes 5 minutes to literally get 15 documents that are 1-3 pages long individually OCR'd

Bobby
질문됨 2년 전1713회 조회
1개 답변
0

Hi,

If you're processing just 15 documents, but hitting 35+ TPS on GetDocumentTextDetection, I would think maybe you're polling for job completion and the polling retry configuration is a bit off?

In my experience yes, you should be able to achieve higher throughput than the 5min, 15doc, 1-3 pages you mention. But it's worth mentioning that response time through the async APIs can vary, and high throughput for many documents is a different topic from end-to-end response time for one small doc.

The more docs you process in parallel (the default quota limit for that is in the hundreds), the less appropriate polling will be as a strategy for fetching the results: Using SNS callbacks via Lambda will ensure you only try to fetch each result as it becomes available... And in that Lambda function you should still check your AWS SDK retry configuration (e.g. docs here for Python boto3), to try and avoid issues if e.g. many jobs happen to finish at once.

Code samples like the Amazon-Textract-Caller (not sure if you're using this?) try to offer a helpful utility for small-scale projects, but have to make a trade-off between keeping to a simple client-only solution, versus being scalable for bigger workloads.

Stacks like Large scale document processing with Amazon Textract optimize more for scalability, but it means you need to deploy some cloud components too (SNS, Lambda, SQS, etc).

For CDK infrastructure-as-code, you could check out the patterns in amazon-textract-idp-stack-samples or maybe (a bit more complex because it combines with post-processing models) amazon-textract-transformer-pipeline.

AWS
전문가
Alex_T
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠