DMS instance selection - Any documentation / guidelines ?

0

Background :

We use DMS as the tool to export DB CDC changes to S3 and then ETL the data to S3 datalake . The flow is as below :

MYSQL->DMS->S3->Glue->S3 Datalake

The DMS job is of the type : Migrate existing data and replicate ongoing changes

  1. Currently we have around 20 tables of varying row counts and row sizes , ranging from a few hundred rows to a million
  2. The DMS runs on a T3 Medium replication instance
  3. The volume is expected to grow in future

We have been tasked to add a few more tables to this above process. We tried adding these tables to the existing DMS job, which fails , sighting memory issue.

Question :

Question1: Is there a defined guideline on how to select the DMS instance sizes or is it on a trial and error basis ? I did some research around it , and came accross https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.Types.html , however this just has an overall description and not a detailed instance selection process.

Can you please help us by directing to the right documentatiokn if any ?

***Question2 : *** Is it a good practice to distribute the load accross multiple DMS tasks of varied Instance types ?

For example

Max 25 Tables with Rows < 1000 - dms.t3.medium Max 2 Tables with Rows < 1 million - dms.r5.large Max 2 Tables with Rows > 1 million - dms.r5.xlarge

asked a year ago436 views
1 Answer
0
Accepted Answer

Hello there,

From the use-case mentioned, I see that you are performing DMS with CDC to S3 using T3 medium replication instance for around 20 tables with varying row counts and sizes. When you tried adding few more tables to this process, the DMS task is failing suspecting a memory issue.

Let me clarify on the queries you have following the above approach -

Question1: Is there a defined guideline on how to select the DMS instance sizes or is it on a trial and error basis ? I did some research around it , and came across https://docs.aws.amazon.com/dms/latest/userguide/CHAP_ReplicationInstance.Types.html , however this just has an overall description and not a detailed instance selection process. Can you please help us by directing to the right documentation if any ?

In general, it is suggested to use c instance class during heterogeneous migration and r instance class for homogeneous or memory intensive work load.

[+] You can also review the Best practices we have for selecting a Replication Instance in below documentation- https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.SizingReplicationInstance.html#CHAP_BestPractices.SizingReplicationInstance.BestPractices

As stated in above documentation, the best approach is to identify if your workload is memory intensive or compute intensive and based on it you can select the suitable Instance class to use ( T3/R5/C5, etc).

To estimate the actual memory requirements for a migration task, AWS DMS roughly uses the following methods.

For Full LOB mode (using single row+update, commit rate)

Memory: (# of lob columns in a table) x (Number of table in parallel, default is 8) x (lob chunk size) x (Commit rate during full load) = 2 * 8 *64(k) * 10000k

Note: You can modify your task to reduce Commit rate during full load. To change this number in the AWS Management Console, open the console, choose Tasks, choose to create or modify a task, and then choose Advanced Settings. Under Tuning Settings, change the Commit rate during full load option.

For Limited LOB mode (using array)

Memory: (# of lob columns in a table) x (Number of table in parallel, default is 8) x maxlobSize x bulkArraySize = 2 * 8 * 4096(k) * 1000

[+] Monitoring AWS DMS tasks - AWS Database Migration Service metrics - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Monitoring.html#CHAP_Monitoring.Metrics

[+] https://aws.amazon.com/premiumsupport/knowledge-center/dms-memory-optimization/

Question2 : Is it a good practice to distribute the load across multiple DMS tasks of varied Instance types ? For example Max 25 Tables with Rows < 1000 - dms.t3.medium Max 2 Tables with Rows < 1 million - dms.r5.large Max 2 Tables with Rows > 1 million - dms.r5.xlarge

Using multiple tasks for a single migration can improve performance. If you have sets of tables that don't participate in common transactions, you might be able to divide your migration into multiple tasks. Transactional consistency is maintained within a task, so it's important that tables in separate tasks don't participate in common transactions. Also, each task independently reads the transaction stream, so be careful not to put too much stress on the source database.

You can use multiple tasks to create separate streams of replication. By doing this, you can parallelize the reads on the source, the processes on the replication instance, and the writes to the target database.

[+] Best practices for AWS Database Migration Service - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.html

AWS
SUPPORT ENGINEER
answered a year ago
  • Thanks for the detailed answer , very helpful.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions