Handle Duplicate S3 events.

0

Hey, I got to know that there's a chance, S3 might trigger duplicate s3 event notifications. In this case how can develop my code? i.e the lambda which handles the event, in such a way that it ignores the duplicate event.

Can we make use of S3 sequencer field, To keep the track of s3 events notifications? if yes how? if not, Is there any other solution? It would be great if the solution is given in typescript.

Kishore
asked a month ago245 views
4 Answers
2

Handling duplicate S3 event notifications in a Lambda function can be achieved by implementing some form of deduplication mechanism. While using the S3 sequencer field is a potential approach, it's not specifically designed for deduplication. Instead, you can leverage other strategies like maintaining a record of processed events and comparing incoming events against that record. Here's how you can implement this in TypeScript:

import { S3Handler } from 'aws-lambda';
import { S3EventRecord } from 'aws-lambda/trigger/s3';

// Define a data structure to store processed event keys
let processedEventKeys: Set<string> = new Set();

export const s3EventHandler: S3Handler = async (event, context) => {
    try {
        for (const record of event.Records) {
            const s3Event: S3EventRecord = record.s3;
            const eventKey: string = s3Event.object.key;

            // Check if the event key has been processed before
            if (!processedEventKeys.has(eventKey)) {
                // Process the S3 event
                await processS3Event(s3Event);

                // Add the event key to the processed set
                processedEventKeys.add(eventKey);
            } else {
                console.log(`Skipping duplicate event for key: ${eventKey}`);
            }
        }
    } catch (error) {
        console.error('Error processing S3 event:', error);
        throw error;
    }
};

async function processS3Event(s3Event: S3EventRecord) {
    // Your S3 event processing logic goes here
    console.log('Processing S3 event:', s3Event);
}

We maintain a Set called processedEventKeys to store the keys of S3 events that have been processed.

When a new S3 event is received, we check if its key is already in the processedEventKeys set.

If the event key is not in the set, we proceed with processing the event and add its key to the set. Otherwise, we skip the event as it's considered a duplicate.

The processS3Event function represents your actual S3 event processing logic.

answered a month ago
  • Sorry, I don't think this solves my problem, Because if user has pushed the data to same s3 location or object, I need to process it. I should not processes it only when AWS duplicates the event due to it's own infra issue.

2

Hi Kishore

Yes, you can use the S3 event's sequencer field to help prevent processing duplicate events in your Lambda function. The sequencer field contains a unique identifier for each event, which can be used to track the order of events and identify duplicates.

Here's how you can implement this in TypeScript:

import { S3EventRecord, S3Handler } from 'aws-lambda';

const processedSequencers = new Set<string>();

export const s3EventHandler: S3Handler = async (event) => {
  try {
    for (const record of event.Records) {
      const sequencer = record.s3.object.sequencer;

      // Check if the sequencer has been processed before
      if (processedSequencers.has(sequencer)) {
        console.log(`Duplicate event detected. Skipping processing for sequencer: ${sequencer}`);
        continue;
      }

      // Process the S3 event
      console.log(`Processing S3 event for sequencer: ${sequencer}`);
      
      // Your processing logic here
      
      // Add the sequencer to the set of processed sequencers
      processedSequencers.add(sequencer);
    }
  } catch (error) {
    console.error('Error processing S3 event:', error);
    throw error;
  }
};

In this code:

We use a Set processedSequencers to keep track of the sequencers that have been processed. For each S3 event record, we check if the sequencer has already been processed. If it has, we skip processing the event. If the sequencer has not been processed before, we proceed with processing the event and add the sequencer to the set of processed sequencers.

answered a month ago
  • What if I rebuild my service? I will loose the sequence Ids stored right? The Example you have shown is fine, But is it not going to increase the lambda memory and crash?

1

Hi,

I'd suggest you to read this blog post in details for the right solution: https://aws.amazon.com/blogs/storage/manage-event-ordering-and-duplicate-events-with-amazon-s3-event-notifications/

I am not sure that solutions proposed above deal with the fact that different lambda instances may execute in different containers with no access to a shared runtime-stored data structure. The solution proposed in post rather uses DynamoDb which can be shared across containers on same server, different servers, etc.

Best,

Didier

profile pictureAWS
EXPERT
answered a month ago
profile pictureAWS
EXPERT
SriniV
reviewed a month ago
profile picture
EXPERT
reviewed a month ago
0

You can leverage AWS Lambda Powertools for TypeScript to achieve idempotency. Extract a unique key from the payload (e.g., etag or key) as the idempotency key, and use the Idempotency utility to ensure your Lambda function performs idempotent operations. This uses a DynamoDB to store idempotent keys and it's safe to run across multiple Lambda instances.

Repository: https://github.com/aws-powertools/powertools-lambda-typescript

Workshop: https://catalog.workshops.aws/powertools-for-aws-lambda/en-US/003-workshop/002-typescript/module1/004-making-the-operation-idempotent

AWS
answered 23 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions