Why does my AWS DMS task that uses PostgreSQL as the source fail with all of the replication slots in use?

3 minute read
0

I have an AWS Database Migration Service (AWS DMS) task that uses an Amazon Relational Database Service (Amazon RDS) for PostgreSQL DB instance. My task fails, all of the replication slots are in use, and I receive an error message.

Short description

For Amazon RDS for PostgreSQL instances, AWS DMS uses native replication slots to perform the logical replication for change data capture (CDC).

The max_replication_slots parameter controls the number of replication slots that a PostgreSQL instance has. By default, RDS for PostgreSQL instances have five replication slots. If you exceed the maximum number of replication slots, then you see log entries like these:

Messages
[SOURCE_CAPTURE ]E: Failed (retcode -1) to execute statement [1022502] (ar_odbc_stmt.c:2579)
[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: 53400 NativeError: 1 Message: ERROR: all replication slots are in use;

To resolve these errors, remove the used replication slots, or increase the value of the max_replication_slots parameter.

Resolution

Remove used replication slots

Used replication slots continue to occupy space. If you run multiple AWS DMS tasks or you have old tasks that run on the same DB instance, then remove the used replication slots.

First, identify the maximum number of replication slots. Then, remove the unused replication slots.

To check the maximum number of replication slots, run the following query. To identify unused replication slots, review the active column:

SELECT * FROM pg_replication_slots;
     slot_name    |    plugin     | slot_type | datoid | database | active |  xmin  | catalog_xmin | restart_lsn
 -----------------+---------------+-----------+--------+----------+--------+--------+--------------+-------------
old_and_used_slot | test_decoding | logical   |  12052 | postgres | f      |        |          684 | 0/16A4408

Note: active: t (true) means that the slot is in use, and active: f (false) means that the slot isn't in use.

To remove an unused replication slot, run this query:

SELECT pg_drop_replication_slot('old_and_used_slot');

Note: Replace old_and_used_slot with the name of your replication slot.

After you remove unused replication slots, restart the task.

Increase the value of the max_replication_slots parameter

Modify the DB parameter in the custom DB parameter group that's attached to the RDS DB instance. Then, increase the value of the max_replication_slots parameter. This is a static parameter, so be sure to reboot the DB instance after you change the parameter value. Then, restart the task.

Related information

Activating change data capture (CDC) using logical replication

Amazon RDS for PostgreSQL

Using a PostgreSQL database as an AWS DMS source

Logical decoding examples on the PostgreSQL website

2 Comments

We seems to be having this problem often. Is there anything that can be setup to automatically drop inactive replication slots based on if/when such inactive slots has reached a certain size?

Ed
replied 7 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 7 months ago