How to refactor this function in my lambda? I have a loop in my function.

0

Hello,

I am using a lambda function with amazon textract services and I have this function which I just saw something wrong that I did:

export async function startExpenseAnalysis(fileObj: FileObj): Promise<string> {
  try {
    const command = new StartExpenseAnalysisCommand({
      DocumentLocation: {
        S3Object: {
          Bucket: fileObj.s3.bucket.name,
          Name: fileObj.s3.object.key,
        },
      },
    });
    const response = await textractClient.send(command);
    return response.JobId;
  } catch (error) {
    console.error(
      `Error starting expense analysis for object ${fileObj.s3.object.key} from bucket ${fileObj.s3.bucket.name}:`,
      error
    );
    throw error;
  }
}

export async function waitForCompletionAndDeleteMessage(
  jobId: string,
  receiptHandle: string
): Promise<ApiAnalyzeExpenseResponse> {
  let data;
  while (!data || data.JobStatus !== "SUCCEEDED") {
    const command = new GetExpenseAnalysisCommand({ JobId: jobId });
    data = await textractClient.send(command);
    console.log(data.JobStatus);
    if (data.JobStatus !== "SUCCEEDED") {
      // await new Promise((resolve) => setTimeout(resolve, 12000000));
      await new Promise((resolve) => setTimeout(resolve, 5000));
    }
  }
  await deleteMessageFromQueue(receiptHandle);
  return data;
}

At first it worked well, because I was just testing with one request at time, but if there are more than one request, if one request takes a long time, or get stuck in the while, there wont be new request as the first one will be there using it.

How should I use GetExpenseAnalysisCommand when I know the Job is finished?

I tried looking on my own, and found some possible solutions, adding a timeout or number of retries or using a poll service but I am not sure.

GerLC
asked 2 months ago162 views
2 Answers
1
Accepted Answer

Not sure what exactly is the problem. You start an analysis operation, you get back a jobId and then you use it to poll the Textract service. If you have more thn one command running at a time, they should not affect each other.

BTW, you may consider moving to asynchronous, i.e., start the analysis in one function and then use the SNS Notification channel to invoke a different function when the operation finishes. This way you do not need to poll the service. This works only if there is no client waiting for the response from the first function of course.

profile pictureAWS
EXPERT
Uri
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago
  • Hello Uri, thanks for your time!

    There is indeed a client waiting for the response, that's why I am looking for alternatives. I have a client in the frontend, that makes the request and wait for the answer extracted from textract.

  • What exactly is the problem you have at the moment?

  • For the waitForCompletionAndDeleteMessage function, I want to refactor it so it doesn't use a while until GetExpenseAnalysisCommand returns a SUCCEEDED, as some of my colleagues told me to change it because it could get stuck in a infinite loop, and stop receiving new request. How can I know when the jobStatus is SUCCEEDED, so I can call the GetExpenseAnalysisCommand without using a while. I tried making it asynchronous, as the flow is the following: Upload a receive or invoice to S3 bucket, send a SQS message with the bucket information and trigger the lambda with the functions above (StartExpenseAnalysisCommand and GetExpenseAnalysisCommand). After processing it will send another SQS message with the data extracted to another lambda function that my team created.

    I am currently implementing your suggestion, using SNS.

  • If there is a client, i.e., the client made a call to API Gateway, which invoked your Lambda and your Lambda invokes Textract, you have no oprtion but to do the loop and wait for it to complete. Otherwise, you can't return the response back to the client in a synchronous manner. I have no idea why you think it can get stuck in an infinite loop, but if you are concerned, add a condition to your while loop to abort after a few tries.

    If you are open to change the way you notify the client, then you should start the operation in one function, and wait for the notification from Textract via SNS in a different function, that will send the response to the client. You can send the notification to the client in different ways: If it is a mobile device, send an SMS or mobile push notification (using SNS), Establish a websocket, and send the response on that connection, poll from the client (not very fond of this option).

0

Hello,

I was trying to add the SNS Notification channel that is in the following function:

    const command = new StartExpenseAnalysisCommand({
      DocumentLocation: {
        S3Object: {
          Bucket: fileObj.s3.bucket.name,
          Name: fileObj.s3.object.key,
        },
      },
      NotificationChannel: {
        SNSTopicArn: process.env.SNS_TOPIC_ARN,
        RoleArn: process.env.ROLE_ARN || undefined,
      },
    });

And I have defined in my serverless.yml the following:

provider:
......
  iam:
    role:
      statements:
        - Effect: Allow
          Action:
            - sqs:SendMessage
            - sqs:ReceiveMessage
            - sqs:DeleteMessage
          Resource: !GetAtt MySQSQueue.Arn
        - Effect: Allow
          Action:
            - textract:StartExpenseAnalysis
            - textract:GetExpenseAnalysis
          Resource: "*"
        - Effect: Allow
          Action:
            - s3:GetObject
            - s3:PutObject
          Resource: "*"
        - Effect: Allow
          Action:
            - sns:Publish
          Resource: !GetAtt TextractCompletionTopic.TopicArn
        - Effect: Allow
          Action:
            - lambda:InvokeFunction
          Resource: "*"
functions:
  startExpenseAnalysisJob:
    handler: src/functions/startExpenseAnalysisJob.handler
    timeout: 180
    maximumRetryAttempts: 0
    events:
      - sqs:
          arn: !GetAtt MySQSQueue.Arn
          batchSize: 1
  processExpenseAnalysisJob:
    handler: src/functions/processExpenseAnalysisJob.handler
    events:
      - sns:
          arn: !GetAtt TextractCompletionTopic.TopicArn
          topicName: TextractCompletionTopic-${self:provider.stage}
resources:
  Resources:
    TextractCompletionTopic:
      Type: AWS::SNS::Topic
      Properties:
        TopicName: TextractCompletionTopic-${self:provider.stage}
......

But the trigger to the lambda function processExpenseAnalysisJob, it is not receiving any sns messages or subscriptions, if I publish a message directly from the AWS console SNS it does trigger the event, but just using this part to post the is not working(The trigger to the lambda function):

      NotificationChannel: {
        SNSTopicArn: process.env.SNS_TOPIC_ARN,
        RoleArn: process.env.ROLE_ARN || undefined,
      },

As I read in the docs: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-textract/Interface/StartExpenseAnalysisRequest/, where for NotificationChannel it describes it as: "The Amazon SNS topic ARN that you want Amazon Textract to publish the completion status of the operation to".

Please help :(

GerLC
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions