AWD DMS Replace Character

0

Hi !

I configured a AWS DMS containing an Oracle as Source database and a S3 Bucket as target.

Problem Statement: My Oracle database contains Horizontal Tab and Line feed characters on the row content. I would like to replace/remove those characters.

I tried to add CharacterSetSettings to Migration Task Settings but it's not supported for S3 as target. Also the rfc4180 property add quotes to the value but it doesn't solve the problem when the consumers try to load the row in a redshift.

Is there another way that I can remove those characters without setting an extra resources/infrastructure?

Source endpoint settings:

{
    "AddSupplementalLogging": true,
    "ExtraArchivedLogDestIds": [],
    "AllowSelectNestedTables": true,
    "DatabaseName": "MyDatabase",
    "UseBFile": true,
    "UseLogminerReader": false,
    "SecretsManagerAccessRoleArn": "arn:aws:iam::0000011111:role/SynchroReplicationInfrastructureSecretsManagerReadWriteRole",
    "SecretsManagerSecretId": "arn:aws:secretsmanager:us-east-1:0000011111:secret:beta/xyzedf"
}

Target endpoint settings:

{
    "CsvRowDelimiter": "\\n",
    "CsvDelimiter": "\\t",
    "BucketFolder": "MyFolder",
    "BucketName": "MyBucketName",
    "CompressionType": "NONE",
    "DataFormat": "csv",
    "EnableStatistics": true,
    "IncludeOpForFullLoad": true,
    "TimestampColumnName": "transactionCommitTimestamp",
    "DatePartitionEnabled": false,
    "AddColumnName": false,
    "MaxFileSize": 2000
}
Marcus
feita há 2 anos498 visualizações
2 Respostas
0

Have you tried using replace function to modify the contents as per https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Expressions.html#:~:text=%3A%2050%0A%7D%0A%7D-,Using%20SQLite%20functions%20to%20build%20expressions,-You%20use%20table

replace(x,y,z)

The replace(x,y,z) function returns a string formed by substituting string z for every occurrence of string y in string x.

I hope it helps.

AWS
respondido há 2 anos
  • What is the right syntax for the replace expression? I tried this, but didn't work:

        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "rule-action": "include-column",
          "value": null,
          "old-value": null
        }
    
  • "rule-action": "add-column" needed

    which could look like this. It will select table, remove COL_NAME and lastly add new column UPDATED_COL_NAME with expression replace($COL_NAME, '\n','') this should work unless DMS is not supporting escape character \ in that case you can try updating the source table data to remove the \n

    Or create a view which is replacing the \n in column but in that case only full load is supported.

    {
      "rules": [
        {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "transformation",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "remove-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          }
        },
        {
          "rule-type": "transformation",
          "rule-id": 344,
          "rule-name": "344",
          "rule-action": "add-column",
          "rule-target": "column",
          "object-locator": {
            "schema-name": "SCHEMA_NAME",
            "table-name": "TABLE_NAME",
            "column-name": "COL_NAME"
          },
          "expression": "replace($COL_NAME, '\n','')",
          "value": "UPDATED_COL_NAME",
          "data-type": {
            "type": "string",
            "length": 50
          }
        }
      ]
    }                   
    
  • It worked partially, I couldn't make it work for more than one character replacement. I tried "expression": "replace(replace($COL_NAME, '\n',''), '\t', ' ')" and replace($COL_NAME, '/[\t\n]/g,''). Do you know how can I replace multiple characters in the same string?

0

To solve this issue I added lambda to remove those characters.

I used to have the following flow:

Oracle DB -> AWS DMS -> S3 Bucket

Current:

Oracle DB -> AWS DMS -> S3 Bucket1 -> Lambda -> S3 Bucket2

Marcus
respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas