What is the best practice to archive items to S3 using dynamoDB TTL?

2

I am considering ways to archive items to S3 using dynamoDB TTL. According to this official AWS blog , the architect is

**DyanmoDB Stream => Lambda => Firehose => S3. **

Why not directly writing from Lambda to S3?

** **DyanmoDB Stream => Lambda => S3 ****

Thank you!

hai
asked 2 years ago1602 views
3 Answers
4
Accepted Answer

DyanmoDB Stream => Lambda => Firehose => S3 This would be the recommended method as Marco has already mentioned Firehose acts as a buffer. While the Lambda does take a batch of 100 for example, this will result in 100 PutObject requests to S3. Whereas, using Firehose, it will merge the objects in to larger files and can also handle partitioning which will result in cheaper Puts to S3 and more efficient retrieval if needed. See CompressionFormat here.

Furthermore, you can make use of Lambda Event Filters to only invoke your functions should the item be evicted for expiring TTL. I wrote a short blog about it here. It highlights how you can make the entire process more efficient.

profile pictureAWS
EXPERT
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago
  • Thank you @Leeroy. Is this possible if I merge 100 records inside my lambda function, and do one PutObject to S3?

  • Yes, you can do that, but you will have to manually write the code to compress the items and you will also lose partitioning ability.

2

I also reached out to the author of the blog post and he also gave a similar answer with details. So I would like to share that detail

 Lambda has a max invocation payload of 6MB, so even if your batch size is 10,000, with larger records you'll be limited by the 6MB limit. With firehose it's easy to get objects in the 100s of MB range on a busy stream. 

Thank you all & I feel great supports and learn a lot!

hai
answered 2 years ago
1

Firehose acts as a buffer. When you write directly to S3 this can result in a lot of requests to S3. You have to pay also for the amount of requests to S3. Without that buffering you can end up with a huge portion of your bill just the S3 requests.
This happens to me with an IoT example long time ago.

AWS
Marco
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago
  • @Macro Thank you! However according to this official AWS blog, the batch size of 100 has been configured between AWS DynamoDB stream and Lambda. It means that Lambda receives a batch of 100 records, then it can write 100 records to S3 directly?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions