AWD DMS Replace Character

0

Hi !

I configured a AWS DMS containing an Oracle as Source database and a S3 Bucket as target.

Problem Statement: My Oracle database contains Horizontal Tab and Line feed characters on the row content. I would like to replace/remove those characters.

I tried to add CharacterSetSettings to Migration Task Settings but it's not supported for S3 as target. Also the rfc4180 property add quotes to the value but it doesn't solve the problem when the consumers try to load the row in a redshift.

Is there another way that I can remove those characters without setting an extra resources/infrastructure?

Source endpoint settings:

{
    "AddSupplementalLogging": true,
    "ExtraArchivedLogDestIds": [],
    "AllowSelectNestedTables": true,
    "DatabaseName": "MyDatabase",
    "UseBFile": true,
    "UseLogminerReader": false,
    "SecretsManagerAccessRoleArn": "arn:aws:iam::0000011111:role/SynchroReplicationInfrastructureSecretsManagerReadWriteRole",
    "SecretsManagerSecretId": "arn:aws:secretsmanager:us-east-1:0000011111:secret:beta/xyzedf"
}

Target endpoint settings:

{
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": "\\t",
    "BucketFolder": "MyFolder",
    "BucketName": "MyBucketName",
    "CompressionType": "NONE",
    "DataFormat": "csv",
    "EnableStatistics": true,
    "IncludeOpForFullLoad": true,
    "TimestampColumnName": "transactionCommitTimestamp",
    "DatePartitionEnabled": false,
    "AddColumnName": false,
    "MaxFileSize": 2000
}
Marcus
已提问 2 年前499 查看次数
2 回答
0

Have you tried using replace function to modify the contents as per https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html#:~:text=%3A%2050%0A%7D%0A%7D-,Using%20SQLite%20functions%20to%20build%20expressions,-You%20use%20table

replace(x,y,z)

The replace(x,y,z) function returns a string formed by substituting string z for every occurrence of string y in string x.

I hope it helps.

AWS
已回答 2 年前
  • What is the right syntax for the replace expression? I tried this, but didn't work:

        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "rule-action": "include-column",
          "value": null,
          "old-value": null
        }
    
  • "rule-action": "add-column" needed

    which could look like this. It will select table, remove COL_NAME and lastly add new column UPDATED_COL_NAME with expression replace($COL_NAME, '\n','') this should work unless DMS is not supporting escape character \ in that case you can try updating the source table data to remove the \n

    Or create a view which is replacing the \n in column but in that case only full load is supported.

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "transformation",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "remove-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-action": "add-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "value": "UPDATED_COL_NAME",
          "data-type": {
            "type": "string",
            "length": 50
          }
        }
      ]
    }                   
    
  • It worked partially, I couldn't make it work for more than one character replacement. I tried "expression": "replace(replace($COL_NAME, '\n',''), '\t', ' ')" and replace($COL_NAME, '/[\t\n]/g,''). Do you know how can I replace multiple characters in the same string?

0

To solve this issue I added lambda to remove those characters.

I used to have the following flow:

Oracle DB -> AWS DMS -> S3 Bucket

Current:

Oracle DB -> AWS DMS -> S3 Bucket1 -> Lambda -> S3 Bucket2

Marcus
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则