Skip to content

On-Prem to RDS SQL Server Replication - Can't Get Exclusive Lock Due to Incoming DMS CDC Churn

0

Greetings, we are seeing large TLog growth which appears to be from using the RELY_ON_SQL_SERVER_REPLICATION_AGENT SafeGuardPolicy.

In this case there are on-prem Hang fire jobs that run every 15 minutes and produce update churn that seem to perpetuate into a never-ending transaction from the aws_dms_user while applying these changes.

The documents seem to indicate that pivoting to EXCLUSIVE_AUTOMATIC_TRUNCATION for the SafeGuardPolicy will prevent the dummy transaction from holding a transaction in the active log, thusly, allowing the 4-hour TLog backup operation to truncate the log.

We are also unable to perform general partition Maintenace at the target. We use List Partitioning, whereas each table has a SourceID which represents a specific client such as COKE or PEPSI.

Normal maintenance operations such as TRUNCATE WITH PARTITION and SPLIT RANGE, which complete in milliseconds, require an xlock at the table level. This lock can't be obtained while the DMS process is processing incoming CDC changes as it seems to have an exclusive lock on the table(s) and in cases this lock is never removed.

Two questions:

  1. Will this actually reduce the TLog file sizes if we switch to EXCLUSIVE_AUTOMATIC_TRUNCATION.
  2. Will lock contention at the target also be reduced if we move to EXCLUSIVE_AUTOMATIC_TRUNCATION and/or are there better ways to avoid long running table locks at the target by DMS.
asked a year ago193 views
1 Answer
1
Accepted Answer

Hello,

When the EXCLUSIVE_AUTOMATIC_TRUNCATION setting is used, AWS DMS has full control of the replication agent process that marks log entries as ready for truncation using sp_repldone. Thus, this setting could be used to enforce DMS to truncate logs.

[+] Endpoint settings when using SQL Server as a source for AWS DMS - https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html#CHAP_Source.SQLServer.ConnectionAttrib
[+] MicrosoftSQLServerSettings - https://docs.aws.amazon.com/dms/latest/APIReference/API_MicrosoftSQLServerSettings.html

Additionally, using EXCLUSIVE_AUTOMATIC_TRUNCATION might help in reducing lock contention at the target. You may also consider reducing long-running transactions and high concurrency on the source tables that DMS needs to lock for enabling CDC to avoid long-running table locks at the target by implementing strategies like batching, indexing, and adjusting isolation levels which would provide significant benefits.

[+] Partitioned tables and indexes - https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver16
[+] SET TRANSACTION ISOLATION LEVEL (Transact-SQL) - https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver16

Further, kindly ensure that the source database is not experiencing bottlenecks, such as CPU starvation or maximum usage of storage bandwidth, which can exacerbate locking issues. As per your requirements, you may scale up the instance class, optimize resource-intensive queries, or archive rarely accessed data if necessary.

AWS
SUPPORT ENGINEER
answered a year ago
AWS
EXPERT
reviewed 8 months ago
  • Thanks for the information.

  • Hey Vedanshi, taking your advice, I found the tables that were holding locks have recently seen a spike in on-prem churn, like 100 times the usual amount. This was causing a perpetual backlog. These cache tables were removed from the task's table schema and the tasks were redeployed and operations returned to normal. Thanks for taking time to reply. We will prepare a test in the lower environments using EXCLUSIVE_AUTOMATIC_TRUNCATION, I bet it will lead to less contention as you stated.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.