DynamoDB used to synchronize same Lambda instances in java

0

Hello,

Which is the fastest and recommended way to check if an item just exists in a DynamoDB table (without retrieving any info from it), using AWS java SDK ?

The use case is that I want to use this result to prevent an idempotent Lambda Function to re-launch complex calculations (some sort of synchronization for the pool with same Lambda Function). The business of Lambda Function is that it receive events with different keys but same picture from S3, make some calculations, and insert into DynamoDB a result. Is it recommended to make the synchronization on this DynamoDB item, so that the Lambda Function to be invoked only once for same picture ?

Does AWS have another preferred method for this (faster, cheaper, standard) ?

Thank you,
Mihai ADAM

已提问 8 个月前183 查看次数
3 回答
0

Hi all,

I will think about the solution, because I will consider a balance among the complexity of the idempotent Lambda code if multiple execution happened, and the complexity of the mechanism built to prevent the multiple execution. (Idempotent function = function that when you call more than once with the same input parameters, it will have no additional effect)

What do you think about using a static field in the Lambda Function (it suppose to hold the inserted keys, something similar with solution 2) ? Is there a good practice ? Does the Lambda model supports this ?

Thank you, Mihai

已回答 8 个月前
  • Lambda functions are stateless. You can utilize gloval variables to save values between invocations, however, you have no guarantee that the next invocation will come to the same instance, so you can rely on that.

  • Hello,

    When I was talking about a static field in JAVA Lambda Function, I was talking about a field that exists in the loaded class of the instance, and not in each instance. So, I thought that, because the loaded class is only one, the next invocation will find the value set in different instances (there is lot of theory about Java keywords: static, volatile).

    But I had a surprise: after some tests I found that not even the class is unique. The class is loaded for each instance, again and again. Isn't this a performance drawback, or there is a reason in the JAVA Lambda model that classes to be loaded so often ? Or have I done something wrong, or a miss-configuration ?

    In the end, I've chosen something like solution 1, but with only the retrieval for processing of the existent record (no fast check with partial retrieval), and with an added SQS to delay events and avoid concurrency in Lambda Function. If concurrency will happen, for very few records maybe, the same record will be overwritten in DynamoDB. In my use-case I found that is the less complex to implement and most reliable.

    Does DynamoDB has transactional lock at the database level, for update ?

    Thank you,
    Mihai

  • Each Lambda instance runs in its own micro VM. For that reason, they share nothing between them (just as if they were running on two different hosts), and each instance creates its own copy of the classes, even if they are a singleton.

    There is some performance penalty for that. It is cold a cold start. There are some ways to reduce it, such as Snap Start for Java. Usually the percentage of cold starts compared to warm starts is very low (as we reuse execution environments). You will see higher number of cold starts in dev environments because you make often code changes and because warn instances remain in memory for a only a few minutes.

0

Hey Mihai ADAM,

For your use case, the fastest and recommended way to check if an item exists in a DynamoDB table using the AWS Java SDK is by using the GetItem operation with a ProjectionExpression to retrieve only a minimal set of attributes (ideally just the primary key). This method minimizes data transfer and operation cost.

Resources:

profile picture
专家
已回答 8 个月前
0

I don't think that checking if an item exist is enough. You may get into race conditions where two instances check, they both get a negative answer, so they both start processing.

What you should do is use conditional operations, for example, when you start processing the object, insert an item to the DB, with a specific key. If two instances will try to insert the same key at the same time, one will fail. Another approach is when the item is already in the DB, is to use the UpdateItem to change the, e.g., Status attribute to InProgress, but only if the current value is Idle (or something like this).

profile pictureAWS
专家
Uri
已回答 8 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则