How to perform reliable asynchronous CDC for Amazon Aurora PostgreSQL DB

0

I’m planning on using Aurora RDS as my service database and I’d be versioning every record I store in the DB. Is there any managed or reliable way to do asynchronous CDC from the DB and store each version in a historical data store (S3) for audit purposes.

I’ve come across approaches using DMS or triggers but I’d like to avoid DMS since it doesn’t seem like the correct use case for it and I don’t want to use triggers because I’d like the CDC to be asynchronous.

In simple terms I’d like to build something like DynamoDB streams for my Aurora DB.

1 Answer
0

Though I havn't myself done the setup end to end . At solution high level the below approach should work. Aurora Database Activity Streams with AWS Lambda and S3 https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/DBActivityStreams.Overview.html

1/ Enable Database Activity Streams: This built-in Aurora feature captures database modifications (inserts, updates, deletes) in near real-time. It automatically creates a Kinesis data stream to push this activity data.

2/ AWS Lambda Function: Create a Lambda function triggered by the Kinesis data stream. This function will process the activity stream events and extract the relevant data changes.

3/ Data Transformation (Optional): The Lambda function can transform the extracted data if needed to match your desired format for historical storage in S3.

4/ Store Versions in S3: Use the AWS SDK within the Lambda function to write the extracted and potentially transformed data to S3 objects. Each object can represent a specific version of your data.

OR maybe , some thing similar AWS CDC (Change Data Capture) with Amazon EventBridge and S3 : https://aws.amazon.com/blogs/database/capturing-data-changes-in-amazon-aurora-using-aws-lambda/

Hope it helps.

profile pictureAWS
akad
answered 10 days ago
  • Thank you for your answer.

    For Asynchronous Database Activity Streams, the error handling doesn't seem to be straightforward (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/DBActivityStreams.Overview.html#DBActivityStreams.Overview.sync-mode). It's saying we would receive an RDS event; however, I'm unsure how we can use that to reprocess events. Additionally, failures could potentially disrupt the order of events (I forgot to mention this in my question, but the ordering of events is critical for my use case). Please let me know if you are aware of how we can handle this gracefully.

    Regarding the asynchronous lambda trigger approach, I really don't want to deal with triggers because they create a lot of unmanageable configurations in the database.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions