What is the best practice to archive items to S3 using dynamoDB TTL?

2

I am considering ways to archive items to S3 using dynamoDB TTL. According to this official AWS blog , the architect is

**DyanmoDB Stream => Lambda => Firehose => S3. **

Why not directly writing from Lambda to S3?

** **DyanmoDB Stream => Lambda => S3 ****

Thank you!

hai
已提問 2 年前檢視次數 1667 次
3 個答案
4
已接受的答案

DyanmoDB Stream => Lambda => Firehose => S3 This would be the recommended method as Marco has already mentioned Firehose acts as a buffer. While the Lambda does take a batch of 100 for example, this will result in 100 PutObject requests to S3. Whereas, using Firehose, it will merge the objects in to larger files and can also handle partitioning which will result in cheaper Puts to S3 and more efficient retrieval if needed. See CompressionFormat here.

Furthermore, you can make use of Lambda Event Filters to only invoke your functions should the item be evicted for expiring TTL. I wrote a short blog about it here. It highlights how you can make the entire process more efficient.

profile pictureAWS
專家
已回答 2 年前
AWS
專家
已審閱 2 年前
  • Thank you @Leeroy. Is this possible if I merge 100 records inside my lambda function, and do one PutObject to S3?

  • Yes, you can do that, but you will have to manually write the code to compress the items and you will also lose partitioning ability.

2

I also reached out to the author of the blog post and he also gave a similar answer with details. So I would like to share that detail

 Lambda has a max invocation payload of 6MB, so even if your batch size is 10,000, with larger records you'll be limited by the 6MB limit. With firehose it's easy to get objects in the 100s of MB range on a busy stream. 

Thank you all & I feel great supports and learn a lot!

hai
已回答 2 年前
1

Firehose acts as a buffer. When you write directly to S3 this can result in a lot of requests to S3. You have to pay also for the amount of requests to S3. Without that buffering you can end up with a huge portion of your bill just the S3 requests.
This happens to me with an IoT example long time ago.

AWS
Marco
已回答 2 年前
AWS
專家
已審閱 2 年前
  • @Macro Thank you! However according to this official AWS blog, the batch size of 100 has been configured between AWS DynamoDB stream and Lambda. It means that Lambda receives a batch of 100 records, then it can write 100 records to S3 directly?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南