I want to use AWS Database Migration Service (AWS DMS) to migrate my PostgreSQL source database to an Amazon Relational Database Service (Amazon RDS) for PostgreSQL source database. What are best practices I can use to migrate from one PostgreSQL database to another?
Short description
When you migrate homogeneous PostgreSQL databases, consider your engine's native tools, such as pg_dump. Then, perform pg_restore on the target. You can also use logical replication, and the COPY command. To migrate from an Amazon RDS for PostgreSQL database instance to another, take a snapshot, and then restore the snapshot as the target. For more information, see Best practices for migrating PostgreSQL databases to Amazon RDS and Amazon Aurora.
A full load of all data can take a long time because of bandwidth limitations, source or target capacity, and replication engine throughput. Comparatively, change data capture (CDC) replication contains only changes from the source to the target, so these workloads are lighter.
This article covers best practices for full load and CDC tasks when you use AWS DMS to migrate from PostgreSQL to PostgreSQL.
Resolution
Create and determine the current log sequence number
Before you take a backup, get a marker to instruct your DMS task where to begin the migration changes.
On the source PostgreSQL database, run the following query to create the replication slot:
SELECT * FROM
pg_create_logical_replication_slot('REPLICATION_SLOT_NAME','test_decoding')
Note: Replace REPLICATION-SLOT-NAME with the name of your replication slot.
On the source PostgreSQL database, run the following restart_LSN command to get the current log sequence number (LSN):
SELECT restart_LSN FROM pg_replication_slots WHERE slot_name = 'replication_slot_name';
Remove foreign keys and triggers during full load
During full load, check that the target database doesn't include foreign keys and triggers. DMS migrates tables in alphabetical order and doesn't distinguish between parent tables and child tables. DMS might try to migrate child tables first. If DMS migrates the child tables first, then this causes the migration to stop because of a foreign key violation. Either suppress or remove foreign keys on the target during migration.
Important: Never include triggers on a target during migration because they can run several processes that corrupt data on the target. Add triggers after migration completes and the target becomes the production database.
Turn on Full LOB Mode when migrating JSON Data
When you migrate large binary objects (LOB) in the form of JSON, turn on full LOB mode so that DMS doesn't truncate the JSON format. If you use limited LOB mode, then data truncation might occur. Then, DMS fails the table because the JSON is in an invalid format.
Confirm that the primary key field doesn't use the TEXT data type
Check that the primary key field is not TEXT, especially if full LOB mode is turned on. DMS treats the TEXT data type as LOB. During full load, DMS tries to set the primary key to NULL, and then reports a duplicate because multiple text columns contain the same value. DMS reports the error as a duplicate rather than the "NULL not allowed on Primary Key" error.
Allow DMS to create target tables
During full load, allow DMS to create tables on the target. DMS creates the tables with matching fields without DEFAULT values for the columns. Default values on columns can cause unexpected behavior in DMS. For example, SERIAL causes DMS migration to fail because this field auto-generates a value.
Example:
CREATE TABLE COMPANY (
ID SERIAL PRIMARY KEY,
NAME TEXT NOT NULL);
If the target uses the preceding structure, then PostgreSQL internals try to generate the value of the ID column. The source also contains a value to INSERT and causes an issue. Remove DEFAULTS from the target during migration.
Define partitions as source tables in task table mappings
When you migrate partitioned tables, define partitions as source tables in task table mappings, not parent tables. Write ahead logs (WAL) retain partitioned table information. Use parent tables only during full load. Don't use parent tables for the CDC phase. If you define parent tables during the CDC phase, then you might get duplicate errors that affect migration.
When you map the target tables, remap all partitions to the parent. This allows the parent to automatically distribute to its partitions.
Define all facets on the source when you use PGLOGICAL
When you use PGLOGICAL for migration, define the required facet on the source. If you skip one area, then you might see unexpected behavior.
Note: These issues are difficult to troubleshoot, so confirm that you define these areas before you begin migration with PGLOGICAL.
RDS DB instance
If your source is a RDS DB instance, then set the following value in the parameter group:
shared_preload_libraries = pglocical
Run the following command at the database level:
create extension pglogical;
On-premises PostgreSQL DB instance
If your source is an on-premises PostgreSQL DB instance, then set the following value in postgresql.conf:
shared_preload_libraries = pglogical
Run the following command at the database level:
create extension pglogical;
Define all PG plugins both on the source and target
Define all PG plugins on the target that you define on the source. This helps with compatibility and smooth processing of data.
Resolve tasks in the Stop or Error state
When tasks remain in the Stop or Error state for a long time, the replication slot fills up storage.
Delete the replication slots you manually created from the source
After migration completes, delete any replication slots that you created from the source. If the replication slots remain on the source, then they accumulate in size and fill up storage.
Migrate tables with a primary key or unique index
Check that the tables you migrate have a primary key or unique index. If a table doesn't have a primary key, then the UPDATE and DELETE statements might not apply to the table because WAL logs don't log them. For tables without a primary key, use REPLICATE IDENTITY FULL.
Important: REPLICATE IDENTITY FULL generates a large volume of information in the logs and can affect storage and performance.
Related information
Using a PostgreSQL database as an AWS DMS source
Targets for data migration