Textract is timing out

0

I have been using lambda function since last two years. Recently one of the databases moved to VPC environment so lambda is configured that connection with since then when Im trying to access text using python textract methods such as textract.detect_document_text(Document=doc_spec) I'm getting an timed out error. However I increased lambda function time to 2 mins despite that Task timed out after 120.03 seconds" error is showing up.

asked 4 months ago257 views
2 Answers
0

Hi Ajay,

To address the timeout issue when using Amazon Textract with a Lambda function within a VPC, you can try the following steps:

  1. Increase the Lambda Function Timeout: You have already increased the timeout to 2 minutes, but it might be necessary to increase it further depending on the size of the document you are processing. Consider increasing the timeout to a higher value, such as 5 minutes (300 seconds).

  2. Configure the Lambda Function to Access Amazon Textract: Ensure your Lambda function is configured correctly to access Amazon Textract. This includes:

    • Configuring the IAM role with the necessary permissions to call Textract.
    • Ensuring that the Lambda has internet access if Textract is outside the VPC (e.g., using a NAT Gateway).
  3. Subnets and Security Groups: Make sure your Lambda function is associated with subnets that have routes to a NAT Gateway or Internet Gateway, allowing external communication. Also, configure the security groups to allow the necessary outbound traffic.

  4. Check the Regional Endpoint for Textract: If you are using VPC endpoints for AWS services, ensure that the endpoint for Amazon Textract is correctly configured. Add a VPC endpoint for Textract if needed.

  5. Divide the Document: If the document is very large, consider splitting the document into smaller parts and processing them separately.

  6. Logs and Metrics: Use CloudWatch Logs to get more information about the timeout error. Check the Lambda function's metrics in CloudWatch to see the average execution time and adjust accordingly.

If these solutions do not resolve the issue, please share some logs with us for further investigation.

Bests.

profile picture
EXPERT
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
0

In the middle of the other answer is the thing that I would rate as the most likely:

Make sure that you add a VPC endpoint for the Textract service. If you have not done this then the Lambda function (most probably) cannot reach the Textract API endpoint.

Note that you can also use a NAT Gateway/Internet Gateway combination - so what you do here depends on the routing and other networking within your VPC.

For references, here's a list of the services that can be reached via VPC endpoints: https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html

profile pictureAWS
EXPERT
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions