Plan a DR strategy for Data Ingestion pipeline on DMS with Aurora Source

0

Hello,

Our data pipeline is a CDC system built on DMS using logical replication slots on Aurora Postgres source. Currently, DMS replication is operational in the same region as source Aurora PG where DMS tasks copy data into s3 target.

Source DB is implementing a DR strategy, and I want the DMS replication to be aligned with failover to region 2 in case of the failure in region 1 along with source DB.

My initial workflow proposal -

  • Have a stand by DMS instance in region 2 with task in STOP status
  • keep listening to the error logs/notification(SNS, EventBridge) from source regd the failover
  • Start the replication task in Capture ongoing change mode in region 2
  • My target is s3 bucket. Basically, I ll have 2 buckets in different regions with bidirectional replication turned on. This makes sure both buckets are in sync. I am fine with eventually consistent mode since it meets my SLOs.

Once the task is running in region 2, it should capture all source activity without any loss, and data replication should keep both data stores (s3 buckets) in sync

In this workflow, I am concerned about #1 data leaks during the failover process to secondary region and failback process to primary. #2 How does the PG behave during region failure in maintaining checkpoints. Does this concern the logical replication since changes are captured in Ongoing mode in oth regions.

Our sources vary between Aurora PG and Aurora PG global DB instances.

#3 - How does it vary with Global since the data is replicated to multiple regions through physical replication. Is it an option to have a ongoing DMS task in secondary since the physical replication copies data to multiple regions.

I understand there are too many questions, any feedback is appreciated.

Thank you.

1 Answer
0

Thanks for contacting AWS.

I would like to answer your questions individually.

#1 data leaks during the failover process to secondary region and failback process to primary.

A. Aurora PostgreSQL-Compatible doesn't support cross-Region Aurora Replicas. However, for Aurora PostgreSQL DB clusters, you can use an Aurora global database. [+] Cross-Region Aurora Replicas : https://docs.aws.amazon.com/prescriptive-guidance/latest/aurora-replication-options/cross-region-aurora-replicas.html

#2 How does the PG behave during region failure in maintaining checkpoints. Does this concern the logical replication since changes are captured in Ongoing mode in oth regions.

A. Considering that you'll use Aurora global cluster, a good practice is to use the Aurora parameter group of the primary cluster and secondary cluster of an Aurora global database with the same settings. This way, in the event of a failure of the primary Region, the new primary cluster in the secondary Region has the same configuration as the old primary.

On the Amazon RDS console, identify the primary DB cluster’s parameter group of the global database. Open the primary DB cluster parameter group and set the global_db_rpo parameter.

Set the global_db_rpo parameter value to the number of seconds you want for the recover point objective. Valid values start from 20 seconds.

This approach ensures that Aurora PostgreSQL doesn’t allow transaction commits to complete that would result in a violation of your chosen RPO time.

I would strongly recommend you to go through article on failing over global cluster : [+] : https://aws.amazon.com/blogs/database/cross-region-disaster-recovery-using-amazon-aurora-global-database-for-amazon-aurora-postgresql/

#3. How does it vary with Global since the data is replicated to multiple regions through physical replication. Is it an option to have a ongoing DMS task in secondary since the physical replication copies data to multiple regions.

A. You cannot have ongoing DMS task in secondary in global cluster as logical replication slot cannot be created in secondary. Also, since you've task in secondary region in stopped state, you can restart the task as it will create the replication slot once you promote the secondary region to primary region.

AWS
SUPPORT ENGINEER
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions