Amazon Redshift database as a target for AWS Database Migration Service (DMS) with best practices

5 minute read
Content level: Intermediate
0

This article guides you on how to use Amazon Redshift database as a target of Amazon Database Migration Service(DMS) while ingesting data from various OLTP source systems to AWS data warehouse service Redshift. It also covers some of the best practices to follow.

You can migrate data to Amazon Redshift databases using AWS Database Migration Service. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With an Amazon Redshift database as a target, you can migrate data from all of the other supported source databases.

AWS DMS supports Amazon Redshift as a target database for migrations from various source databases, including:

  • On-premises databases (Oracle, SQL Server, MySQL, PostgreSQL, etc.)
  • Amazon RDS databases (Oracle, SQL Server, MySQL, PostgreSQL, MariaDB, etc.)
  • Amazon Aurora databases
  • SAP

During a database migration to Amazon Redshift, AWS DMS first moves data to an Amazon S3 bucket. When the files reside in an Amazon S3 bucket, AWS DMS then transfers them to the proper tables in the Amazon Redshift data warehouse. AWS DMS creates the S3 bucket in the same AWS Region as the Amazon Redshift database. The AWS DMS replication instance must be located in that same AWS Region.

AWS Database Migration Service supports both full load and change processing operations. AWS DMS reads the data from the source database and creates a series of comma-separated value (.csv) files. For full-load operations, AWS DMS creates files for each table. AWS DMS then copies the table files for each table to a separate folder in Amazon S3. When the files are uploaded to Amazon S3, AWS DMS sends a copy command and the data in the files are copied into Amazon Redshift. For change-processing operations, AWS DMS copies the net changes to the .csv files. AWS DMS then uploads the net change files to Amazon S3 and copies the data to Amazon Redshift.

When migrating data to an Amazon Redshift target, DMS performs the following steps:

  1. Data Extraction: DMS connects to the source database and extracts the data based on the migration task settings.
  2. Data Transformation: If necessary, DMS can perform data transformations such as data type conversions, column filtering, or data validation.
  3. Data Loading: DMS loads the data into the Amazon Redshift target database using an optimized bulk load process. It creates temporary staging tables, performs parallel data loads, and then renames the staging tables to the target tables.

Some key points to consider when using Amazon Redshift as a DMS target:

  • Schema Conversion: DMS automatically converts the source database schema to the Amazon Redshift schema during the migration process.
  • Data Types: DMS maps source data types to compatible Amazon Redshift data types. If an exact match is not available, it uses the closest possible data type.
  • Parallel Load: DMS can perform parallel data loads into Amazon Redshift to improve migration performance.
  • Target Table Management: DMS can create, recreate, or truncate target tables in Amazon Redshift based on the migration task settings.
  • CDC Support: DMS supports ongoing replication and change data capture (CDC) for Amazon Redshift targets, allowing you to keep the target database synchronized with the source.

Overall, AWS DMS simplifies the process of migrating data from various sources to an Amazon Redshift data warehouse, providing a managed service for efficient and secure data migrations.

There are several best practices and configurations that can help optimize performance when migrating large datasets to Amazon Redshift using AWS Database Migration Service (DMS):

  1. Redshift Cluster Configuration:

    • Use a dense compute node type (e.g., ra3.16xlarge) for better compression and faster data loading.
    • Enable the max_parallel_wlr_encoders parameter to maximize parallel encoding during data loads.
    • Increase the wlr_batch_count parameter to improve compression efficiency.
  2. DMS Configuration:

    • Enable parallelization by increasing the MaxFullLoadSubTasks parameter in the DMS task settings. This allows DMS to perform parallel data loads into Redshift.
    • Use the RedshiftConnectionAttributes parameter to specify Redshift-specific configurations, such as maxFileSize to control the size of data files loaded into Redshift.
    • Enable the UseMultipleWriters option to use multiple writers for loading data into Redshift, improving throughput.
  3. Data Distribution:

    • Choose an appropriate distribution key and sort key for your Redshift tables based on your query patterns and data characteristics.
    • Consider using the auto_redistribute option in DMS to automatically redistribute data during the migration process.
  4. Staging Data:

    • Use Amazon S3 as a staging area for large datasets before loading into Redshift. DMS can directly load data from S3 into Redshift, leveraging parallel processing.
    • Compress data files in S3 using a columnar format (e.g., Parquet) for better compression and faster loading into Redshift.
  5. Network Optimization:

    • Deploy the DMS replication instance in the same AWS Region and VPC as your Redshift cluster to minimize network latency.
    • Use AWS Direct Connect or AWS VPN to establish a dedicated network connection between your on-premises environment and AWS if migrating from an on-premises source.
  6. Monitoring and Tuning:

    • Monitor the DMS task and Redshift cluster performance using Amazon CloudWatch metrics and logs.
    • Tune Redshift's memory and disk configuration based on your workload characteristics and resource utilization.
  7. Incremental Loading:

    • After the initial full load, consider using DMS's change data capture (CDC) feature to perform incremental updates, minimizing the amount of data transferred and loaded into Redshift.
  8. Data Partitioning:

    • Partition your Redshift tables based on appropriate columns (e.g., date, region) to improve query performance and enable more efficient data loading and maintenance.

By following these best practices and optimizing configurations based on your specific data characteristics and workload requirements, you can achieve efficient and high-throughput data migrations from various sources to Amazon Redshift using AWS Database Migration Service.

profile pictureAWS
EXPERT
published 2 months ago350 views