AWD DMS Replace Character

0

Hi !

I configured a AWS DMS containing an Oracle as Source database and a S3 Bucket as target.

Problem Statement: My Oracle database contains Horizontal Tab and Line feed characters on the row content. I would like to replace/remove those characters.

I tried to add CharacterSetSettings to Migration Task Settings but it's not supported for S3 as target. Also the rfc4180 property add quotes to the value but it doesn't solve the problem when the consumers try to load the row in a redshift.

Is there another way that I can remove those characters without setting an extra resources/infrastructure?

Source endpoint settings:

{
    "AddSupplementalLogging": true,
    "ExtraArchivedLogDestIds": [],
    "AllowSelectNestedTables": true,
    "DatabaseName": "MyDatabase",
    "UseBFile": true,
    "UseLogminerReader": false,
    "SecretsManagerAccessRoleArn": "arn:aws:iam::0000011111:role/SynchroReplicationInfrastructureSecretsManagerReadWriteRole",
    "SecretsManagerSecretId": "arn:aws:secretsmanager:us-east-1:0000011111:secret:beta/xyzedf"
}

Target endpoint settings:

{
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": "\\t",
    "BucketFolder": "MyFolder",
    "BucketName": "MyBucketName",
    "CompressionType": "NONE",
    "DataFormat": "csv",
    "EnableStatistics": true,
    "IncludeOpForFullLoad": true,
    "TimestampColumnName": "transactionCommitTimestamp",
    "DatePartitionEnabled": false,
    "AddColumnName": false,
    "MaxFileSize": 2000
}
Marcus
asked 2 years ago481 views
2 Answers
0

Have you tried using replace function to modify the contents as per https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html#:~:text=%3A%2050%0A%7D%0A%7D-,Using%20SQLite%20functions%20to%20build%20expressions,-You%20use%20table

replace(x,y,z)

The replace(x,y,z) function returns a string formed by substituting string z for every occurrence of string y in string x.

I hope it helps.

AWS
answered 2 years ago
  • What is the right syntax for the replace expression? I tried this, but didn't work:

        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "rule-action": "include-column",
          "value": null,
          "old-value": null
        }
    
  • "rule-action": "add-column" needed

    which could look like this. It will select table, remove COL_NAME and lastly add new column UPDATED_COL_NAME with expression replace($COL_NAME, '\n','') this should work unless DMS is not supporting escape character \ in that case you can try updating the source table data to remove the \n

    Or create a view which is replacing the \n in column but in that case only full load is supported.

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "transformation",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "remove-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-action": "add-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "value": "UPDATED_COL_NAME",
          "data-type": {
            "type": "string",
            "length": 50
          }
        }
      ]
    }                   
    
  • It worked partially, I couldn't make it work for more than one character replacement. I tried "expression": "replace(replace($COL_NAME, '\n',''), '\t', ' ')" and replace($COL_NAME, '/[\t\n]/g,''). Do you know how can I replace multiple characters in the same string?

0

To solve this issue I added lambda to remove those characters.

I used to have the following flow:

Oracle DB -> AWS DMS -> S3 Bucket

Current:

Oracle DB -> AWS DMS -> S3 Bucket1 -> Lambda -> S3 Bucket2

Marcus
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions