Incremental Data Capture from DynamoDB to S3

0

I have a use case where I would like to replicate the data from DynamoDB to S3 (for backup and later for the analytical processing). I don't need real-time data updates or any notifications on DDB item changes.

The "regular" design with DDB Streams, Lambda, Kinesis Firehouse and S3 destination is possible, but I am concerned about high cost, complexity and effort for initial setup.

I have implemented the following design: https://aws.amazon.com/blogs/database/simplify-amazon-dynamodb-data-extraction-and-analysis-by-using-aws-glue-and-amazon-athena/

(Download data from DDB via ETL Glue Job to S3)

That works fine, but I have the following additional questions to anybody having more experience with that design:

  1. Is there any possibility to perform the incremental data load, e.g. not loading the full DDB table via Glue Job each time?

  2. Does anybody has any practical experience for that design in terms of performance? Main concern here that for the large DDB Tables (100x millions of items) the full data load will A) consume too much read capacity and B) take hours/days to complete

  3. Is there any other practical design approaches working with large DDB tables?

2개 답변
0
수락된 답변

It is not possible to perform an incremental 'bookmarked' load from a DynamoDB table without data modeling to design for this (i.e. a sharded-GSI that allows time based queries across the entire data set), which would then require a custom reader (Glue doesn't support GSI queries).

Using streams --> lambda --> firehose is currently the most 'managed' and cost effective way to deliver incremental changes from a DynamoDB table to S3.

Reading DynamoDB streams only has the computational cost of Lambda associated with it, and the Lambdas can read hundreds of items from a single invocation. Having Firehose buffer/package/store these changes as compressed/partitioned/queryable data on S3 is simple and cost effective.

If you are concerned about cost it could be worth opening a specreq to have a specialist take a look at the analysis - these configurations are both common and generally cost effective (the cost is not relative to the size of the table, but rather the velocity/size of the writes - which will often be more efficient than a custom reader/loader).

답변함 4년 전
0

Export to S3 feature has been enhanced to support incremental data load https://aws.amazon.com/blogs/database/introducing-incremental-export-from-amazon-dynamodb-to-amazon-s3/

.

AWS
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠