AWD DMS Replace Character

0

Hi !

I configured a AWS DMS containing an Oracle as Source database and a S3 Bucket as target.

Problem Statement: My Oracle database contains Horizontal Tab and Line feed characters on the row content. I would like to replace/remove those characters.

I tried to add CharacterSetSettings to Migration Task Settings but it's not supported for S3 as target. Also the rfc4180 property add quotes to the value but it doesn't solve the problem when the consumers try to load the row in a redshift.

Is there another way that I can remove those characters without setting an extra resources/infrastructure?

Source endpoint settings:

{
    "AddSupplementalLogging": true,
    "ExtraArchivedLogDestIds": [],
    "AllowSelectNestedTables": true,
    "DatabaseName": "MyDatabase",
    "UseBFile": true,
    "UseLogminerReader": false,
    "SecretsManagerAccessRoleArn": "arn:aws:iam::0000011111:role/SynchroReplicationInfrastructureSecretsManagerReadWriteRole",
    "SecretsManagerSecretId": "arn:aws:secretsmanager:us-east-1:0000011111:secret:beta/xyzedf"
}

Target endpoint settings:

{
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": "\\t",
    "BucketFolder": "MyFolder",
    "BucketName": "MyBucketName",
    "CompressionType": "NONE",
    "DataFormat": "csv",
    "EnableStatistics": true,
    "IncludeOpForFullLoad": true,
    "TimestampColumnName": "transactionCommitTimestamp",
    "DatePartitionEnabled": false,
    "AddColumnName": false,
    "MaxFileSize": 2000
}
Marcus
질문됨 2년 전498회 조회
2개 답변
0

Have you tried using replace function to modify the contents as per https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html#:~:text=%3A%2050%0A%7D%0A%7D-,Using%20SQLite%20functions%20to%20build%20expressions,-You%20use%20table

replace(x,y,z)

The replace(x,y,z) function returns a string formed by substituting string z for every occurrence of string y in string x.

I hope it helps.

AWS
답변함 2년 전
  • What is the right syntax for the replace expression? I tried this, but didn't work:

        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "rule-action": "include-column",
          "value": null,
          "old-value": null
        }
    
  • "rule-action": "add-column" needed

    which could look like this. It will select table, remove COL_NAME and lastly add new column UPDATED_COL_NAME with expression replace($COL_NAME, '\n','') this should work unless DMS is not supporting escape character \ in that case you can try updating the source table data to remove the \n

    Or create a view which is replacing the \n in column but in that case only full load is supported.

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "transformation",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "remove-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-action": "add-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "value": "UPDATED_COL_NAME",
          "data-type": {
            "type": "string",
            "length": 50
          }
        }
      ]
    }                   
    
  • It worked partially, I couldn't make it work for more than one character replacement. I tried "expression": "replace(replace($COL_NAME, '\n',''), '\t', ' ')" and replace($COL_NAME, '/[\t\n]/g,''). Do you know how can I replace multiple characters in the same string?

0

To solve this issue I added lambda to remove those characters.

I used to have the following flow:

Oracle DB -> AWS DMS -> S3 Bucket

Current:

Oracle DB -> AWS DMS -> S3 Bucket1 -> Lambda -> S3 Bucket2

Marcus
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인