Incremental Data Capture from DynamoDB to S3

0

I have a use case where I would like to replicate the data from DynamoDB to S3 (for backup and later for the analytical processing). I don't need real-time data updates or any notifications on DDB item changes.

The "regular" design with DDB Streams, Lambda, Kinesis Firehouse and S3 destination is possible, but I am concerned about high cost, complexity and effort for initial setup.

I have implemented the following design: https://aws.amazon.com/blogs/database/simplify-amazon-dynamodb-data-extraction-and-analysis-by-using-aws-glue-and-amazon-athena/

(Download data from DDB via ETL Glue Job to S3)

That works fine, but I have the following additional questions to anybody having more experience with that design:

  1. Is there any possibility to perform the incremental data load, e.g. not loading the full DDB table via Glue Job each time?

  2. Does anybody has any practical experience for that design in terms of performance? Main concern here that for the large DDB Tables (100x millions of items) the full data load will A) consume too much read capacity and B) take hours/days to complete

  3. Is there any other practical design approaches working with large DDB tables?

AWS
已提問 4 年前檢視次數 3198 次
2 個答案
0
已接受的答案

It is not possible to perform an incremental 'bookmarked' load from a DynamoDB table without data modeling to design for this (i.e. a sharded-GSI that allows time based queries across the entire data set), which would then require a custom reader (Glue doesn't support GSI queries).

Using streams --> lambda --> firehose is currently the most 'managed' and cost effective way to deliver incremental changes from a DynamoDB table to S3.

Reading DynamoDB streams only has the computational cost of Lambda associated with it, and the Lambdas can read hundreds of items from a single invocation. Having Firehose buffer/package/store these changes as compressed/partitioned/queryable data on S3 is simple and cost effective.

If you are concerned about cost it could be worth opening a specreq to have a specialist take a look at the analysis - these configurations are both common and generally cost effective (the cost is not relative to the size of the table, but rather the velocity/size of the writes - which will often be more efficient than a custom reader/loader).

已回答 4 年前
0

Export to S3 feature has been enhanced to support incremental data load https://aws.amazon.com/blogs/database/introducing-incremental-export-from-amazon-dynamodb-to-amazon-s3/

.

AWS
已回答 7 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南