How do I troubleshoot issues that cause my Aurora read replica to lag and restart?
I want to troubleshoot issues that are causing my Amazon Aurora read replica to lag and restart.
Resolution
Note: The following resolution applies to Aurora clusters in a single AWS Region and Global Database primary clusters, not Global Database secondary clusters.
Measure the AuroraReplicaLag
Aurora replicas connect to the same storage volume as the writer instance. Aurora synchronously writes changes to the shared storage volume but asynchronously replicates changes to the reader instances. For read consistency, Aurora invalidates data that's cached in memory that relates to the change. The data cache includes the buffer pool or page cache.
In some cases, a delay can occur when you propagate changes across the reader instances. The delay appears as an increase in the AuroraReplicaLag metric in Amazon CloudWatch that can cause restarts, and you might receive the following error message:
"Read Replica has fallen behind the master too much. Restarting".
For Amazon Aurora MySQL-Compatible Edition, run the following query on the INFORMATION_SCHEMA.REPLICA_HOST_STATUS table to measure AuroraReplicaLag:
select server_id AS Instance_Identifier, if(session_id = 'MASTER_SESSION_ID', 'writer', 'reader') as Role, replica_lag_in_milliseconds as AuroraReplicaLag from information_schema.replica_host_status;
Example output:
+---------------------+--------+-------------------+ | Instance_Identifier | Role | AuroraReplicaLag | +---------------------+--------+-------------------+ | myamscluster-aza-1 | writer | 0 | | myamscluster-azb-1 | reader | 5.150000095367432 | | myamscluster-aza-2 | reader | 5.033999919891357 | +---------------------+--------+-------------------+
For Amazon Aurora PostgreSQL-Compatible Edition, run the following query to get the aurora_replica_status() function and filter the results:
select server_id, case when session_id= 'MASTER_SESSION_ID' then 'Writer' else 'Reader' end AS Role, replica_lag_in_msec as AuroraReplicaLag from aurora_replica_status();
Example output:
server_id | role | aurorareplicalag -------------------+--------+------------------ myapgcluster-aza-1 | Reader | 19.641 myapgcluster-azb-1 | Reader | 19.752 myapgcluster-aza-2 | Writer | (3 rows)
Make sure that all instances in the cluster have the same specification
If a reader instance has a lower DB instance class configuration than a writer DB instance, then the number of changes might be too large. Then, the reader instance can't invalidate in the cache. To avoid this issue, it's a best practice to keep the same specification for all DB instances in the Aurora cluster.
Monitor sessions with metrics and Enhanced Monitoring
When you're running multiple sessions at the same time, a reader instance can lag. An Aurora replica might slowly apply the necessary changes from the writer because there's a lack of available resources. To check for lag, view the CPUUtilization, DBConnections, and NetworkReceiveThroughput metrics. You can also turn on Enhanced Monitoring with a granularity of 1 or 5 seconds to get the resource usage on the reader. Or, turn on Performance Insights and use Database Insights to visualize the reader's workload. For Aurora PostgreSQL-Compatible, you can use the ReadIOPS metric.
Important: Performance Insights will reach its end of life on June 30, 2026. You can upgrade to the Advanced mode of Database insights before June 30, 2026. If you don't upgrade, then DB clusters that use Performance Insights will default to the Standard mode of Database Insights. Only the Advanced mode of Database Insights will support execution plans and on-demand analysis. If your clusters default to the Standard mode, then you might not be able to use these features on the console. To turn on the Advanced mode, see Turning on the Advanced mode of Database Insights for Amazon RDS and Turning on the Advanced mode of Database Insights for Amazon Aurora.
Use CloudWatch to visualize write activity
A sudden surge of write activity in an already write-heavy production cluster can overload the writer DB instance. The overload can cause reader instances to lag. Use CloudWatch to view the DMLThroughput, DDLThroughput and Queries metrics that show sudden bursts. For Aurora PostgreSQL-Compatible, view the WriteThroughput metric.
Troubleshoot long-running transactions for Aurora MySQL-Compatible
The MySQL InnoDB engine uses multi-version concurrency control (MVCC) by default. So, you must track all changes to rows that occurred throughout the length of a transaction. After long-running transactions complete, a spike in purge thread activity begins. The sudden purge can cause an Aurora replica to lag because of the volume of the backlog that long-running transactions create.
For Aurora MySQL-Compatible, check the RollbackSegmentHistoryListLength metric in CloudWatch to view the spike in purge. You can run the SHOW ENGINE INNODB STATUS command to view the purge. Or, run the following query:
select NAME AS RollbackSegmentHistoryListLength, COUNT from INFORMATION_SCHEMA.INNODB_METRICS where NAME = 'trx_rseg_history_len';
Example output:
+----------------------------------+-------+ | RollbackSegmentHistoryListLength | COUNT | +----------------------------------+-------+ | trx_rseg_history_len | 358 | +----------------------------------+-------+ 1 row in set (0.00 sec)
Set up CloudWatch alarms to monitor RollbackSegmentHistoryListLength to make sure that it doesn't reach a high number. It's a best practice to avoid long-running transactions in relational databases.
Prevent brief network interruptions
Although rare, transient networking communication failures between the writer and reader instances or between an instance and the Aurora storage layer can occur. Reader instances can lag or restart because of a brief interruption in networking. The Aurora replica can also lag because a large number of changes overloaded the instance's network bandwidth. To avoid this issue, it's a best practice to modify the size of the DB instance to a size that can handle the number of changes.
Related information
Adding Aurora replicas to a DB cluster
Replication with Amazon Aurora MySQL-Compatible
- Topics
- Database
- Language
- English

Relevant content
- asked 10 months ago
- asked 2 years ago
- asked 3 years ago
- Accepted Answerasked a year ago