Incremental Data Capture from DynamoDB to S3

0

I have a use case where I would like to replicate the data from DynamoDB to S3 (for backup and later for the analytical processing). I don't need real-time data updates or any notifications on DDB item changes.

The "regular" design with DDB Streams, Lambda, Kinesis Firehouse and S3 destination is possible, but I am concerned about high cost, complexity and effort for initial setup.

I have implemented the following design: https://aws.amazon.com/blogs/database/simplify-amazon-dynamodb-data-extraction-and-analysis-by-using-aws-glue-and-amazon-athena/

(Download data from DDB via ETL Glue Job to S3)

That works fine, but I have the following additional questions to anybody having more experience with that design:

  1. Is there any possibility to perform the incremental data load, e.g. not loading the full DDB table via Glue Job each time?

  2. Does anybody has any practical experience for that design in terms of performance? Main concern here that for the large DDB Tables (100x millions of items) the full data load will A) consume too much read capacity and B) take hours/days to complete

  3. Is there any other practical design approaches working with large DDB tables?

AWS
posta 4 anni fa3195 visualizzazioni
2 Risposte
0
Risposta accettata

It is not possible to perform an incremental 'bookmarked' load from a DynamoDB table without data modeling to design for this (i.e. a sharded-GSI that allows time based queries across the entire data set), which would then require a custom reader (Glue doesn't support GSI queries).

Using streams --> lambda --> firehose is currently the most 'managed' and cost effective way to deliver incremental changes from a DynamoDB table to S3.

Reading DynamoDB streams only has the computational cost of Lambda associated with it, and the Lambdas can read hundreds of items from a single invocation. Having Firehose buffer/package/store these changes as compressed/partitioned/queryable data on S3 is simple and cost effective.

If you are concerned about cost it could be worth opening a specreq to have a specialist take a look at the analysis - these configurations are both common and generally cost effective (the cost is not relative to the size of the table, but rather the velocity/size of the writes - which will often be more efficient than a custom reader/loader).

con risposta 4 anni fa
0

Export to S3 feature has been enhanced to support incremental data load https://aws.amazon.com/blogs/database/introducing-incremental-export-from-amazon-dynamodb-to-amazon-s3/

.

AWS
con risposta 7 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande