DMS Serverless Migration Errors on scale-up migrating to Aurora Postgres

0

I'm using a Serverless migration (Full load and change data capture) to move from Oracle on-prem to Aurora Postgres. After a bunch of tuning, I've got it close to ready. However, I noticed that immediately after the serverless replication attempts to scale-up (about 30 minutes into the migration), the task goes into a "Running with Errors" status.

Scaling Uprunning with errors

Looking at the Aurora PG logs, I see that its running into "Duplicate key violates unique key" errors in target PG DB. I Didn't see this error using DMS Instances, and there are definitely NOT duplicate values in the source DB. It took some time to find the failure pattern, but its definitely immediately after scaling.

The target DB is running a db.r6g.2xlarge writer instance, and the CPU was running around 60% utilization prior to the scale event. The migration is configured to 1-16 DCUs, and when it failed, it was in the process of scaling from 2 to 4 DCU.

For now, I'm going to revert to FIXED DCU to avoid scaling, but I'd REALLY like to use scaling in the future.

I'll award meaningless hypothetical bonus points if anyone can tell me how to recover from "Running with Errors" without erasing the target DB and starting the migration over again! 😬 I'm definitely concerned about forward migrations in production running into these sort of errors and I won't be able to recover.

Thanks

  • hello there. oracle and postgres are two different databases. did you make use of a schema conversion tool?

  • @Phil, yes, sort of. SCT worked sort of. It kept hanging applying the changes, but I was able to dump the converted SQL/DDL which was even better since I have a GitHub action I call to erase my DB and rebuild from scratch every time DMS craps out on me.

2 Answers
0

Hey Brien, Looks like you have already spent a quite a bit of time troubleshooting the issue. Let's see in different angle now. Your description indicates DMS works little different when you use serverless over provisioned DMS configuration. It's worth checking DMS logs as well . Please enable cloudwatch log for the DMS migration task if not done already and check the all the log entries just before the time of time of errors in Aurora PG logs. This will give some useful information for root cause of the errors. Check the below document: https://repost.aws/knowledge-center/dms-task-error-status

Also as per the below AWS document, duplicate records on the target table is expected while running the Full load and CDC. In your case since the primary key is enabled on the target DMS is erroring out. Despite these table errors, let the job run (I assume it is not failing) and check the manually compare and validate the records on source and target tables. https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Troubleshooting.html#CHAP_Troubleshooting.General.DuplicateRecords

Joseph
answered 4 months ago
  • I have done a fair amount of troubleshooting. I had logging on, set to debug. Thats how I knew the problem. There was nothing fishy in the DMS logs, but found the corresponding logs in Aurora that stated the client (ie DMS Serverless Task) had dropped the connection. After scaling, it tried to send another file, and thats when it hit the duplicate rows, on all the tables that had been loading.

    You're right that it didn't stop. But it did not, and could not, resume on the "Table Error" tables. It finished loading, and entered replication mode, but the errored tabled were unrecoverable.

  • If you (or anyone) knows how to recover a job stuck in "Running with Errors", I'd love to hear the procedures. Nothing I did could get the migration to resume on those tables. DMS just seems to drop "Table Error" tables on the floor and pretend like they don't exist. No more rows ever go in, and options to "Re-Validate" and "Reload" the tables never become available. The ONLY path forward I could find was to destroy and rebuild ALL the underlying tables, and restart the migration everything completely from scratch. A FRIGHTENING proposition for production workloads!

-1

The "Duplicate key violates unique key" error typically occurs when attempting to insert a record into a PostgreSQL database table using a field value that already exists and is designated as a unique key constraint in the table. In a DMS (Database Migration Service) serverless environment, this issue can arise during data replication if the source and target tables are not in sync, leading to conflicts between primary key or unique key values.

To address this issue, you can take the following steps:

Data Analysis: Identify the specific table and field where the duplicate key violation occurs. This can be done through analyzing the error logs or monitoring the data replication process.

Data Consistency: Ensure that the source and target tables are consistent. Verify that no conflicting data exists that could violate unique key constraints.

Conflict Resolution: If there are conflicting data between the source and target tables, you'll need to resolve these conflicts before continuing the replication process. This can involve updating the existing data, changing unique key constraints, or using data transformation techniques.

DMS Configuration: Review the DMS configuration settings to ensure that it aligns with the unique key constraints of your PostgreSQL database. Pay close attention to the replication rules and conflict resolution mechanisms provided by the DMS service.

Error Handling: Implement error handling and logging mechanisms within your DMS setup to capture and track the "Duplicate key violates unique key" errors. This will help in identifying the root cause of the issue and applying appropriate corrective actions.

By following these steps, you can address the "Duplicate key violates unique key" errors in your target PostgreSQL database within a DMS serverless environment and ensure a smooth data replication process.

profile pictureAWS
answered 4 months ago
  • Argggg, this sound like a ChatGPT answer to my question and doesn't actually address any of the context of my problem.

    There are no duplicate rows in the SOURCE, I'm importing into an empty, brand new Aurora DB TARGET. I can see the rows throwing Duplicate Keys, and I've previously confirmed they're unique in the SOURCE (which makes sense, since they uniqueness constraints are there too!). I'm fully deleting and recreating empty TARGET DB's between each run.

    Also, no imaginary bonus points, as I still don't no how to recover without deleting everything. 😭

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions