DMS between two RDS instances, grown OldestReplicationSlotLag

0

Hello. I have a DMS task(full load + CDC) replicating only one table between two AWS RDS instances (source, target). This table has a primary key, but the table is updated very rarely.

Once I start the DMS task, OldestReplicationSlotLag and TransactionLogsDiskUsage (source db) grow until I restart the DMS task. However, I see changes in the table on both source and target.

What could be the reason for this issue?

1개 답변
0

When using DMS for ongoing replication (Change Data Capture or CDC), it relies on database logs to capture changes. For PostgreSQL (assuming your RDS instances are PostgreSQL because you mentioned replication slots), DMS uses replication slots.

Here's a breakdown of the issues you mentioned and potential solutions:

  1. OldestReplicationSlotLag Growing: This indicates that the replication slot used by DMS is not being "consumed" or "advanced" as expected. In PostgreSQL, replication slots retain WAL (Write Ahead Logs) until they're consumed by the client (in this case, DMS). If DMS isn't consuming these logs appropriately, they'll continue to be retained, causing the replication lag to grow.

  2. TransactionLogsDiskUsage Growing: This is directly related to the above issue. Since the logs aren't being consumed, they take up more and more disk space.

Here are some potential causes and solutions:

  1. DMS Isn't Consuming Logs: This is the core issue. Even if the table is updated rarely, DMS should still periodically check and advance the replication slot, ensuring that old logs are released.

  2. Network Issues: Ensure there's no network bottleneck or interruptions between DMS and the RDS source instance.

  3. DMS Task Settings: Ensure that your DMS task settings are appropriate. For example, make sure you're not using a very high value for ChangeProcessingTuning or BatchApplyPreserveTransaction. This might cause delays in processing the logs.

  4. Monitor DMS Metrics: Use CloudWatch to monitor DMS task metrics, especially the CDC Latency metric. This metric provides insights into the lag between when a change is made in the source database and when it's replicated to the target.

  5. Replication Slot: Ensure that the replication slot used by DMS is active. Sometimes, manual interventions or other issues might invalidate a slot. You can check the status of replication slots using the pg_replication_slots view in PostgreSQL.

  6. Source DB Parameters/: Ensure that the wal_level is set to logical, max_replication_slots is appropriately configured, and max_wal_senders is set to a number that accommodates DMS and any other replication processes you might have.

  7. Manual Cleanup: As a temporary measure, you can manually drop the replication slot used by DMS and restart the task, which would cause DMS to create a new slot. But this is only a temporary solution and doesn't address the underlying issue. Remember to be cautious as this will drop the replication slot, which might cause data inconsistency if not done correctly.

  8. DMS Version: Ensure you're using the latest version of the DMS replication instance. AWS occasionally releases updates, and using the latest version can help avoid bugs.

profile picture
전문가
AndyAWS
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인