RDS Aurora Mysql Multi-master downtime and best practices


Hi all,

---Context--- A customer is currently using a large RDS Aurora cluster (12 replicas, 6x r5.12xlarge and 6x r5.4xlarge ) for their production environment. This cluster is currently part of a monolith that is proactively (and slowly) being broken down into smaller applications with independent data stores. This will still take months/years to complete due to competing priorities on their end

---Challenge--- Over the past few months the customer has performed a few database restarts due to either engine upgrades or different database parameter tuning. The customer would like to evaluate multi-master or any other alternative that mitigates service downtime as much as possible for future upgrades or restarts.


  1. Is multi-master (2 nodes) + 12 additional read replicas an option at all?
  2. If we ever implement a multi-master approach keeping the remaining replicas as readers, how does a database upgrade/restart affect the service? are all the database notes rebooted as well as it happens with a regular single-master cluster?
  3. The customer application is not built for a multi-master active-active approach as they won't be able to handle deadlocks at the application level. Is a multi-master active-passive an option for fail-over?
  4. Do we have any other recommendation/architecture for managing database upgrade/restarts that would help minimizing downtime?


1 Answer
Accepted Answer

1), 2)


In a multi-master cluster, all DB instances can perform write operations. The notions of a single read/write primary instance and multiple read-only Aurora Replicas don't apply.

The multi-master does not have read replicas and two nodes R/W.

Because binlog replication is not available, EC2 can also be used for replication Not possible. As a result, it is currently difficult to scale out read workloads on multi-master.

Yes, active-passive workloads minimize any downtime for write operations. However, if one of the nodes dies, the mechanism for accessing the other node is the application's It is a responsibility. Cluster endpoints are not used for DML in Multi-master.

Check out those other limitatons.


What seconds of downtime is acceptable? You may want to review DB connection management first. There are best practices for DNS caching, Smart drivers, etc.


answered 5 years ago
profile picture
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions