Skip to content

Collation/character set issues with AWS DMS when migrating from RDS Postgres to RDS MariaDB

0

I'm setting up a "Migrate and replicate" DMS task, with an RDS Postgres (17.4) source, and an RDS MariaDB (10.11.13) target. It's set up to "drop tables on target", so it creates tables in the target as part of the initial load.

I'm wanting to use the utf8mb4 character set on the MariaDB side, as that is what's currently recommended. As such, I have all character_set_* parameters set to utf8mb4, and all collation_* parameters set to utf8mb4_general_ci. If using a different character set or collation on the MariaDB side is better for some reason, I'm open to trying something else, but I would like to understand why the AS-IS setup doesn't work.

The DMS task uses a provisioned replication instance, running engine version 3.6.1. There are no transformation rules in the task, only selection rules that select which schemas/tables to replicate.

Aside from the parameter group for the MariaDB target, no other settings or options related to character set and collation have been set.

The observed behavior is the following:

  1. The newly created tables have a utf8mb4_general_ci collation for columns that have a data type in the source that's not a varchar/text type but maps to one (such as uuid columns in the source, become a varchar(36) in the destination with utf8mb4_general_ci collation).
  2. The newly created tables have a utf16_general_ci collation for all columns that are varchar or text types in the source (this appears to be my main problem, and it's not clear why this collation is being used)
  3. The initial load completes successfully, loading all data properly
  4. Ongoing replication starts up, but always fails when trying to replicate any table that has a varchar/text type field in the source (it works fine for tables that have no varchar/text columns)

When looking at the CloudWatch logs for the replication, I can see that ongoing replication is failing with errors like:

SqlState: HY000 NativeError: 1270 Message: [MySQL][ODBC 8.0(w) Driver][mysqld-5.5.5-10.11.13-MariaDB-log]Illegal mix of collations (utf16_general_ci,IMPLICIT), (binary,IGNORABLE), (utf8mb4_general_ci,IMPLICIT) for operation 'case' [1022502]

There are some other errors in the log that appear to be related:

SqlState: HY000 NativeError: 1193 Message: [MySQL][ODBC 8.0(w) Driver][mysqld-5.5.5-10.11.13-MariaDB-log]Unknown system variable 'default_collation_for_utf8mb4' [1022502] (ar_odbc_stmt.c:4802)

It is also notable that I am using UTF8 on the source side, and the logs are reporting the same:

PostgreSQL database server encoding is 'UTF8' (postgres_endpoint_imp.c:1123)

1 Answer
1

The issue you're experiencing with AWS DMS migration from PostgreSQL to MariaDB is related to character set and collation handling. There are several factors contributing to your problem:

  1. AWS DMS is creating tables with mixed collations - utf8mb4_general_ci for some columns and utf16_general_ci for varchar/text columns, which is causing the "Illegal mix of collations" error during ongoing replication.

  2. The error about "Unknown system variable 'default_collation_for_utf8mb4'" suggests that DMS is trying to use a system variable that doesn't exist in your MariaDB version.

To resolve this issue, you can try the following approaches:

  1. Use the CharsetMapping extra connection attribute in your target MariaDB endpoint settings. This allows you to specify character set mapping explicitly. For example:
CharsetMapping=utf8,65001
CharsetMapping=utf16,65001

This would map both utf8 and utf16 to UTF8 code page (65001).

  1. Consider pre-creating your tables in the target database with the desired utf8mb4 character set and collation, then set your DMS task's target table preparation mode to "Truncate" or "Do nothing" instead of "Drop tables on target". This way, you control the table structure and collation settings.

  2. If you're using AWS DMS engine version 3.6.1, you should have support for utf8mb4 character set (versions prior to 3.1.1 didn't support it), but you might need to explicitly configure it.

  3. Check if there are any specific collation settings in your source PostgreSQL database that might be influencing how DMS interprets the character sets.

During data migration, data goes through multiple encoding and decoding phases as it's read from the source, processed in the DMS replication instance, and loaded into the target. If there's an inappropriate configuration, such as Unicode to Latin character set conversion, it can cause issues during replication.

Before making changes to your production setup, it's recommended to test character set updates on a test instance first. Make sure all database objects are properly converted to the utf8mb4 character set.
Sources
Troubleshooting migration tasks in AWS Database Migration Service - AWS Database Migration Service
Troubleshooting - Amazon Aurora MySQL Migration Handbook
Optimize data validation using AWS DMS validation-only tasks | AWS Database Blog
Update the character set to utf8mb4 in RDS for MySQL | AWS re:Post

answered 3 months ago
EXPERT
reviewed 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.