Incremental Data Capture from DynamoDB to S3

0

I have a use case where I would like to replicate the data from DynamoDB to S3 (for backup and later for the analytical processing). I don't need real-time data updates or any notifications on DDB item changes.

The "regular" design with DDB Streams, Lambda, Kinesis Firehouse and S3 destination is possible, but I am concerned about high cost, complexity and effort for initial setup.

I have implemented the following design: https://aws.amazon.com/blogs/database/simplify-amazon-dynamodb-data-extraction-and-analysis-by-using-aws-glue-and-amazon-athena/

(Download data from DDB via ETL Glue Job to S3)

That works fine, but I have the following additional questions to anybody having more experience with that design:

  1. Is there any possibility to perform the incremental data load, e.g. not loading the full DDB table via Glue Job each time?

  2. Does anybody has any practical experience for that design in terms of performance? Main concern here that for the large DDB Tables (100x millions of items) the full data load will A) consume too much read capacity and B) take hours/days to complete

  3. Is there any other practical design approaches working with large DDB tables?

AWS
質問済み 4年前3197ビュー
2回答
0
承認された回答

It is not possible to perform an incremental 'bookmarked' load from a DynamoDB table without data modeling to design for this (i.e. a sharded-GSI that allows time based queries across the entire data set), which would then require a custom reader (Glue doesn't support GSI queries).

Using streams --> lambda --> firehose is currently the most 'managed' and cost effective way to deliver incremental changes from a DynamoDB table to S3.

Reading DynamoDB streams only has the computational cost of Lambda associated with it, and the Lambdas can read hundreds of items from a single invocation. Having Firehose buffer/package/store these changes as compressed/partitioned/queryable data on S3 is simple and cost effective.

If you are concerned about cost it could be worth opening a specreq to have a specialist take a look at the analysis - these configurations are both common and generally cost effective (the cost is not relative to the size of the table, but rather the velocity/size of the writes - which will often be more efficient than a custom reader/loader).

回答済み 4年前
0

Export to S3 feature has been enhanced to support incremental data load https://aws.amazon.com/blogs/database/introducing-incremental-export-from-amazon-dynamodb-to-amazon-s3/

.

AWS
回答済み 7ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ