Textract API with Lambda - Getting InvalidS3ObjectException error


Hi, I am trying to run the same

  1. directly from AWS CloudShell with 'python3 textract_doc_analysis.py' command,
  2. running it through Lambda. In both the cases I modified the code. But, they didn't work.

For Lambda Role, I added the policies for S3 and Textract full access, apart from Lambda credentials. Also, explicitly added the S3 object paths also. ----------- Lambda Role -------------- { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "logs:CreateLogGroup", "Resource": "arn:aws:logs:us-east-2:xxxxxxxxx:" }, { "Effect": "Allow", "Action": [ "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:us-east-2:xxxxxxxx:log-group:/aws/lambda/textract_doc_analysis:" ] }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::learn-textract-bucket-20230626*", "arn:aws:s3:::learn-textract-bucket-20230626/*", "arn:aws:s3:::learn-textract-bucket-20230626/pdf-invoices3.pdf", "arn:aws:s3:::learn-textract-bucket-20230626/pdf-sample1.pdf" ] } ] }

---- error ------ "errorMessage": "An error occurred (InvalidS3ObjectException) when calling the StartDocumentAnalysis operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.", "errorType": "InvalidS3ObjectException",

----- code -------

import json import boto3 def lambda_handler(event, context): boto3.set_stream_logger(name='botocore')

s = boto3.Session(profile_name="default")

s = boto3.Session() # ???? Not sure whether this is right ?????

tx = s.client("textract", region_name='us-east-2') doc = "/pdf-files/sample_pay_stub.pdf" bucket = "learn-textract-bucket-20230626"

resp = tx.start_document_analysis(
        "S3Object": {
            "Bucket": bucket,
            "Name": doc


return {
    'statusCode': 200,
    'body': json.dumps('Hello from Lambda!')
asked 10 months ago284 views
2 Answers

Please refer to a sample of Lambda function using Textract : https://docs.aws.amazon.com/textract/latest/dg/lambda.html

in its simplest form, it should look like this :

import boto3
import json
import os

def lambda_handler(event, context):
    # Get the service resource
    textract = boto3.client('textract')
    # Call Amazon Textract
    response = textract.detect_document_text(
            'S3Object': {
                'Bucket': os.environ['BUCKET_NAME'],
                'Name': event['Records'][0]['s3']['object']['key']
    # Print detected text
    return response

answered 10 months ago

Yes. Got it. Thanks.

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions