Why do the source filter conditions in my AWS DMS task no longer work?
I want to learn how to troubleshoot and fix issues when source filter conditions are not working correctly in my AWS Database Migration Service (AWS DMS) tasks.
Resolution
Check if your engine supports the source filtering feature
Most AWS DMS sources support source filtering. However, MongoDB or Amazon DocumentDB that has MongoDB compatibility don't support the source filtering feature. For more information, see Sources for data migration.
The following limitations might affect source filtering:
- Filters don't calculate columns of right-to-left languages.
- You can't apply filters to large object (LOB) columns.
- You can apply filters only to fixed columns that you can't update after you create them. If you apply source filters to fixed columns that you can update after you create them, then the source filters might not work.
Troubleshoot filters that don't work during full load
Determine the phase that the source filtering doesn't work in.
If the source filtering doesn't work during full load, then take the following actions:
- Confirm that the case sensitivity in the mapping rules match the source engine.
- When you filter date data types, use the formats that AWS DMS requires.
- Run the debug logging level on SOURCE_UNLOAD to reproduce the issue. Then, capture the query that AWS DMS runs on the source to unload the data.
Example of a filtering issue on a source Oracle table:
CREATE TABLE DMS.FILTERS ( ID NUMBER(10) NOT NULL, ENTRY_DATE DATE, CONSTRAINT FILTERS_PK PRIMARY KEY (ID) ); SQL> SELECT * FROM FILTERS; ID ENTRY_DATE ---------- --------- 1 01-JAN-22 2 01-JUN-22 3 01-JAN-21 4 01-JUN-21 5 01-JAN-20 6 01-JUN-20
Create an AWS DMS task with mapping rules that you configure to replicate only rows that have ENTRY_DATE greater than or equal to 01/01/2022.
Example task:
{ "rules": [ { "rule-type": "selection", "rule-id": "893662253", "rule-name": "893662253", "object-locator": { "schema-name": "DMS", "table-name": "FILTERS" }, "rule-action": "include", "filters": [ { "filter-type": "source", "column-name": "ENTRY_DATE", "filter-conditions": [ { "filter-operator": "gte", "value": "01/01/2022" } ] } ] } ] }
To make sure that no records are replicated and the errors appear in the task logs, run the following query:
01786264: 2022-06-22T10:36:53 [SOURCE_UNLOAD ]E: ORA-01843: not a valid month [1020417] (oracle_endpoint_unload.c:171)
Because debug logs are turned on for SOURCE_UNLOAD, the task logs show the exact query that AWS DMS runs on the source database.
Example query:
1786264: 2022-06-22T10:36:53 [SOURCE_UNLOAD ]D: Select statement for UNLOAD is 'SELECT "ID","ENTRY_DATE" FROM "DMS"."FILTERS" WHERE ((("ENTRY_DATE" >= TO_DATE('0000-00-00','YYYY-MM-DD'))))' (oracle_endpoint_utils.c:1979)
In the following log output, AWS DMS runs the query on the source database:
SELECT "ID","ENTRY_DATE" FROM "DMS"."FILTERS" WHERE ((("ENTRY_DATE" >= TO_DATE('0000-00-00','YYYY-MM-DD'))));
AWS DMS doesn't recognize the date in the mapping rules. So, the date doesn't match the date format that AWS DMS expects.
To modify the mapping rules to match the expected date format, run the following query:
{ "filter-operator": "gte", "value": "2022-01-01" }
Troubleshoot filters that don't work during change data capture (CDC)
When you filter fixed columns, the filtering issues happen only during the CDC phase. The issues might occur only on specific data manipulation language (DML) statements, such as UPDATES or DELETES. Make sure that you turn on sufficient logging on the source table.
To allocate additional logging, use one either Oracle, PostgreSQL, or Microsoft SQL Server.
Oracle
Oracle uses supplemental logging to add additional logs on table columns. If the column that you filter isn't a primary key column, then activate supplemental logging for the column and primary key columns.
The following example replicates a table named TEST.LOGGING that has a primary key ID and a filter on the column NAME. To create the log group supplemental logging, run the following command:
ALTER TABLE TEST.LOGGING ADD SUPPLEMENTAL LOG GROUP TEST_LOG_GROUP (ID, NAME) ALWAYS;
If supplemental logging is already added on all columns in the table, then don't add more logging.
Example with supplemental logging already added:
ALTER TABLE TableName ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
PostgreSQL
PostgreSQL uses the REPLICA IDENTITY property to configure the logging level for a table. When you set REPLICA IDENTITY to DEFAULT, PostgreSQL records the old values of the columns of the primary key in write-ahead logs (WAL). However, when you use a non-primary key column, the default logging level might not be sufficient for deletes.
Make sure that the AWS DMS task uses the plugin that's set to one of the following properties:
If you use test_decoding, then set REPLICA IDENTITY to FULL:
ALTER TABLE tablename REPLICA IDENTITY FULL;
Note: If you don't set REPLICA IDENTITY to FULL, then AWS DMS might send all the deletes to the target table.
If you use pglogical, then set REPLICA IDENTITY to FULL after you add the table to the replication set:
ALTER TABLE tablename REPLICA IDENTITY FULL;
Note: When you set REPLICA IDENTITY to FULL, pglogical includes a limitation so that you can't add a table to a replication set. You also increase the number of WAL logs that are generated on the source database. When the number of WAL logs increases, all columns are logged in to WAL.
Microsoft SQL Server
Check that you meet all of the following AWS DMS CDC logging requirements to turn on MS-CDC.
For every table that has a primary key, run the following query:
exec sys.sp_cdc_enable_table @source_schema = N'schema_name', @source_name = N'table_name', @role_name = NULL, @supports_net_changes = 1 GO
Note: The preceding query is required for cloud-based sources.
For every table that has unique keys but no primary key, run the following query:
exec sys.sp_cdc_enable_table @source_schema = N'schema_name', @source_name = N'table_name', @index_name = N'unique_index_name', @role_name = NULL, @supports_net_changes = 1 GO
Note: The preceding query is required for both on-premises and cloud-based sources. If you use MS-Replication for on-premises sources, then you don't need to run the preceding query.
For every table that has no primary or unique keys, run the following query:
exec sys.sp_cdc_enable_table @source_schema = N'schema_name', @source_name = N'table_name', @role_name = NULL GO
Note: The preceding query requires both on-premises and cloud-based sources.
MySQL
In MySQL, the binlog_row_image system variable controls the row images in binlogs. For more information, see binlog_row_image on the MySQL website. AWS DMS requires that you set binlog_row_image to FULL and binlog_format to ROW.
MySQL logs all columns in both the before image and the after image. To confirm the maximum logging level in the binlogs, you must set binlog_row_image FULL on the source database.
Related information
How do I use source filters in my AWS DMS tasks?
![AWS OFFICIAL](/static/images/aws.png)
Relevant content
- asked 2 years agolg...
- asked 3 years agolg...
- Accepted Answer
- Accepted Answerasked 3 months agolg...
- AWS OFFICIALUpdated 3 years ago
- Why is no data migrated from my Amazon S3 source endpoint even though my AWS DMS task is successful?AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago