What is the best practice to archive items to S3 using dynamoDB TTL?

2

I am considering ways to archive items to S3 using dynamoDB TTL. According to this official AWS blog , the architect is

**DyanmoDB Stream => Lambda => Firehose => S3. **

Why not directly writing from Lambda to S3?

** **DyanmoDB Stream => Lambda => S3 ****

Thank you!

hai
질문됨 2년 전1667회 조회
3개 답변
4
수락된 답변

DyanmoDB Stream => Lambda => Firehose => S3 This would be the recommended method as Marco has already mentioned Firehose acts as a buffer. While the Lambda does take a batch of 100 for example, this will result in 100 PutObject requests to S3. Whereas, using Firehose, it will merge the objects in to larger files and can also handle partitioning which will result in cheaper Puts to S3 and more efficient retrieval if needed. See CompressionFormat here.

Furthermore, you can make use of Lambda Event Filters to only invoke your functions should the item be evicted for expiring TTL. I wrote a short blog about it here. It highlights how you can make the entire process more efficient.

profile pictureAWS
전문가
답변함 2년 전
AWS
전문가
검토됨 2년 전
  • Thank you @Leeroy. Is this possible if I merge 100 records inside my lambda function, and do one PutObject to S3?

  • Yes, you can do that, but you will have to manually write the code to compress the items and you will also lose partitioning ability.

2

I also reached out to the author of the blog post and he also gave a similar answer with details. So I would like to share that detail

 Lambda has a max invocation payload of 6MB, so even if your batch size is 10,000, with larger records you'll be limited by the 6MB limit. With firehose it's easy to get objects in the 100s of MB range on a busy stream. 

Thank you all & I feel great supports and learn a lot!

hai
답변함 2년 전
1

Firehose acts as a buffer. When you write directly to S3 this can result in a lot of requests to S3. You have to pay also for the amount of requests to S3. Without that buffering you can end up with a huge portion of your bill just the S3 requests.
This happens to me with an IoT example long time ago.

AWS
Marco
답변함 2년 전
AWS
전문가
검토됨 2년 전
  • @Macro Thank you! However according to this official AWS blog, the batch size of 100 has been configured between AWS DynamoDB stream and Lambda. It means that Lambda receives a batch of 100 records, then it can write 100 records to S3 directly?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인