How can I import from Aurora mysql to Redshift?

0

I'm aware of the Zero-ETL integration. But the integration fails because I have foreign keys with a CASCADE constraint.

I tried DMS but I have the exact same problem.

Tables in task scope have constraints with cascading referential actions. Please note that DMS does not propagate CASCADE events. Disable constraint during full load and ensure that corresponding cascade constraints are added to the target database for ongoing replication.

So how can I import a database then? No only that, but I would need continuous replication here.

  • Zero-etl is basically just free DMS under the hood, if it doesn’t work in dms it won’t work in zero-etl. Id look into other aftermarket cdc etl solutions that might support cascade references in mysql, unsure if there are any.

1 Answer
1

Hi Miguel

Please go through the below steps and Documentation links, I hope it will help to solve your issue.

1. Initial Full Load (Disabling Constraints)

Since DMS and Zero-ETL cannot handle CASCADE constraints directly, you will need to disable constraints during the initial full load. You can follow these steps:

A. Create a Database Schema Without Foreign Keys in Redshift:

  • Export the schema from Aurora MySQL but exclude foreign keys and constraints.
  • Use the mysqldump tool with the --no-data and --skip-add-drop-table options to export the table structures without foreign key
mysqldump --no-data --skip-add-drop-table --skip-foreign-key-checks --skip-comments --ignore-table-foreign-keys your_database > schema.sql

Create the corresponding tables in Redshift without foreign key constraints using the above dump.

B. Perform Full Load via DMS:

  • Configure an AWS DMS task for full load from Aurora MySQL to Redshift without applying constraints.
  • Set the EnableValidation option to false to skip validation on data migration, and use the table mapping options to only migrate the necessary data.

In the DMS task, use the following setting for the target endpoint:

  • Target Table Preparation Mode: Do nothing (to prevent DMS from truncating the table during subsequent tasks).
  • Include LOB Columns in Full Load: Limited LOB Mode or Full LOB Mode (depending on your data size).

AWS Documentation: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TaskSettings.FullLoad.html

C. Recreate Foreign Key Constraints in Redshift:

  • Once the full load is completed, manually create foreign keys and constraints in Redshift as necessary, ensuring that they align with your referential integrity requirements.
    • Amazon Redshift does not enforce foreign key constraints for performance reasons, so you can define them for reference, but actual cascading operations will not work like in Aurora MySQL.
    • You might need to use triggers for specific cases to simulate CASCADE DELETE or CASCADE UPDATE behaviors.

2. Ongoing Replication for Continuous Sync

After completing the initial full load, you will need to set up ongoing replication using AWS DMS or a custom solution to keep Redshift in sync with Aurora MySQL.

A. Set Up DMS for CDC (Change Data Capture):

  • AWS DMS supports CDC (Change Data Capture), which captures and applies changes from the source to the target.
    • Modify the DMS task to enable ongoing replication for CDC changes:
      • For the Source Endpoint, set the task to capture ongoing changes.
      • For the Target Endpoint (Redshift), ensure that you exclude any DDL operations that could modify the table structure.

B. Handling CASCADE Manually:

  • Since DMS does not handle CASCADE operations, consider writing custom SQL scripts or using AWS Lambda to handle cascading changes by:
    • Polling Aurora MySQL for updates that involve cascading constraints.
    • Applying equivalent changes manually in Redshift using UPDATE or DELETE queries.

3. Alternative Approach Using S3 as Staging (Optional)

If direct integration via DMS proves difficult, you can follow an alternative approach where:

  1. Export the data from Aurora MySQL to Amazon S3 using AWS Data Pipeline or AWS Glue.
  2. Use Amazon Redshift COPY to load the data from S3 into Redshift

https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

This approach gives you more flexibility in handling large data loads and constraints, though it introduces a slight complexity in managing continuous replication via periodic S3 exports.

4. Automating Cascade Handling with Triggers or AWS Lambda

If you want to simulate CASCADE behavior in Redshift, you can:

  • Write Redshift triggers to automatically perform cascading updates and deletes, although Redshift's support for such triggers is limited.
  • Alternatively, use AWS Lambda to track Aurora MySQL changes and apply the same cascades to Redshift.

References and AWS Documentation:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html#CHAP_Source.MySQL.Specifics.ForeignKey

https://docs.aws.amazon.com/dms/latest/sbs/chap-manageddatabases.mysql2rds.replication.html

https://docs.aws.amazon.com/lambda/latest/dg/services-rds.html

EXPERT
answered a month ago
  • I love this answer, like a full freaking book, buried in the middle is one sentence “don’t use cascades in mysql”. Lol, useless

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions