- Newest
- Most votes
- Most comments
Handling duplicate S3 event notifications in a Lambda function can be achieved by implementing some form of deduplication mechanism. While using the S3 sequencer field is a potential approach, it's not specifically designed for deduplication. Instead, you can leverage other strategies like maintaining a record of processed events and comparing incoming events against that record. Here's how you can implement this in TypeScript:
import { S3Handler } from 'aws-lambda';
import { S3EventRecord } from 'aws-lambda/trigger/s3';
// Define a data structure to store processed event keys
let processedEventKeys: Set<string> = new Set();
export const s3EventHandler: S3Handler = async (event, context) => {
try {
for (const record of event.Records) {
const s3Event: S3EventRecord = record.s3;
const eventKey: string = s3Event.object.key;
// Check if the event key has been processed before
if (!processedEventKeys.has(eventKey)) {
// Process the S3 event
await processS3Event(s3Event);
// Add the event key to the processed set
processedEventKeys.add(eventKey);
} else {
console.log(`Skipping duplicate event for key: ${eventKey}`);
}
}
} catch (error) {
console.error('Error processing S3 event:', error);
throw error;
}
};
async function processS3Event(s3Event: S3EventRecord) {
// Your S3 event processing logic goes here
console.log('Processing S3 event:', s3Event);
}
We maintain a Set called processedEventKeys to store the keys of S3 events that have been processed.
When a new S3 event is received, we check if its key is already in the processedEventKeys set.
If the event key is not in the set, we proceed with processing the event and add its key to the set. Otherwise, we skip the event as it's considered a duplicate.
The processS3Event function represents your actual S3 event processing logic.
Hi Kishore
Yes, you can use the S3 event's sequencer field to help prevent processing duplicate events in your Lambda function. The sequencer field contains a unique identifier for each event, which can be used to track the order of events and identify duplicates.
Here's how you can implement this in TypeScript:
import { S3EventRecord, S3Handler } from 'aws-lambda';
const processedSequencers = new Set<string>();
export const s3EventHandler: S3Handler = async (event) => {
try {
for (const record of event.Records) {
const sequencer = record.s3.object.sequencer;
// Check if the sequencer has been processed before
if (processedSequencers.has(sequencer)) {
console.log(`Duplicate event detected. Skipping processing for sequencer: ${sequencer}`);
continue;
}
// Process the S3 event
console.log(`Processing S3 event for sequencer: ${sequencer}`);
// Your processing logic here
// Add the sequencer to the set of processed sequencers
processedSequencers.add(sequencer);
}
} catch (error) {
console.error('Error processing S3 event:', error);
throw error;
}
};
In this code:
We use a Set processedSequencers to keep track of the sequencers that have been processed. For each S3 event record, we check if the sequencer has already been processed. If it has, we skip processing the event. If the sequencer has not been processed before, we proceed with processing the event and add the sequencer to the set of processed sequencers.
What if I rebuild my service? I will loose the sequence Ids stored right? The Example you have shown is fine, But is it not going to increase the lambda memory and crash?
Hi,
I'd suggest you to read this blog post in details for the right solution: https://aws.amazon.com/blogs/storage/manage-event-ordering-and-duplicate-events-with-amazon-s3-event-notifications/
I am not sure that solutions proposed above deal with the fact that different lambda instances may execute in different containers with no access to a shared runtime-stored data structure. The solution proposed in post rather uses DynamoDb which can be shared across containers on same server, different servers, etc.
Best,
Didier
You can leverage AWS Lambda Powertools for TypeScript to achieve idempotency. Extract a unique key from the payload (e.g., etag or key) as the idempotency key, and use the Idempotency utility to ensure your Lambda function performs idempotent operations. This uses a DynamoDB to store idempotent keys and it's safe to run across multiple Lambda instances.
Repository: https://github.com/aws-powertools/powertools-lambda-typescript
Relevant content
- Accepted Answerasked 5 years ago
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Sorry, I don't think this solves my problem, Because if user has pushed the data to same s3 location or object, I need to process it. I should not processes it only when AWS duplicates the event due to it's own infra issue.