I have an AWS Database Migration Service (AWS DMS) task that's in the full-load complete or replication ongoing stage of migration. But, the data in the target database doesn't match the data on the source.
Short description
When you use AWS DMS to migrate data in a heterogeneous environment, you might see data mismatch for the following reasons:
- Incorrect large binary object (LOB) settings
- The source or target data type isn't supported
- Errors or exceptions that occur during change data capture (CDC) replication cause the data manipulation language (DML) transaction to fail at the target database
If you don't use the correct LOB settings, then data is truncated based on the LobMaxSize task setting. So, the target LOB column doesn't contain the exact same data as the source.
During heterogeneous migration, AWS DMS converts the source data type to an internal data type. Then, AWS DMS converts the internal data to the target data type. Because certain source and target data types aren't fully supported by AWS DMS, you might see data mismatch between the source and target.
Resolution
Perform initial checks
If you use a full-load-only task, then make sure that migration is complete and the task is in a stopped state.
If you use a full-load and CDC task or CDC-only task, then check the CDCLatencySource and CDCLatencyTarget Amazon CloudWatch metrics. Confirm that there isn't latency.
Check that there are no other applications connected to the source or target databases because this can cause data manipulation. For example, if you're running a full-load-only task and the source date is modified by another application, then there's data mismatch at the target. Or, if another application writes data to the migrated target table on the target database, then there's data mismatch between the target and source.
Query the awsdms_validation_failures_v1 table on the target
If your goal is data consistency, then turn on validation when you create the AWS DMS task.
Note: If a table has LOB columns and you use Limited LOB mode for migration, then you must set ValidationPartialLobSize to the same value as LobMaxSize.
Troubleshoot when validation is turned on
If validation is turned on, then check the awsdms_Validation_failures_v1 table in your target database. If a record enters the ValidationSuspended or ValidationFailed state during migration, then AWS DMS writes diagnostic information to awsdms_validation_failures_v1. To troubleshoot validation errors, run a command similar to the following:
select * from awsdms_validation_failures_v1 where TASK_NAME = 'ABC123FGJASHKNA345';
To get information about the failure, check the Details column in the output. Use the Key column to compare the record data between your source and target. For more information about how to troubleshoot issues with data validation, see Troubleshooting.
Troubleshoot when validation is turned off
If you haven't turned on validation on your AWS DMS task, then create a validation-only task.
For one-time migration, use the full-load validation-only feature to quickly compare all rows between the source and target.
For ongoing replication, use the CDC validation-only task. A CDC validation-only task validates the existing rows between the source and target tables. The task continues with ongoing changes as they appear, and reports data validation failures.
Check for limitations in the source and target data
After you identify the mismatched data, check the source and target for limitations associated with their data types. For example, when you use PostgreSQL as a source, you can't migrate ENUM data types.
Check for errors in the task log
Check the task log for errors at the time of validation failure. Or, check the control tables to view exceptions that are logged during the data replication stage.
Resolve mismatched data that's caused by truncation
When you use Limited LOB mode, AWS DMS pre-allocates memory in the replication instance. Then, AWS DMS uses the LobMaxSize task setting to load the LOB data in bulk. AWS DMS truncates LOBs that exceed the maximum LOB size, and then issues a warning message to the log file.
Check the log files for warning messages that show that data was truncated, and then check the corresponding LOB column's maximum size. Define a LobMaxSize that's larger than the LOB column size so that data isn't truncated. Use a diagnostic support script to find the tables that have LOB data, and then query the LOB column's size.
If the LOB column's maximum size is more than 100 MB, then use Full LOB or Inline LOB mode to stop LOB column data truncation.
Related information
Validation-only tasks
Setting LOB support for source databases in an AWS DMS task