[RDS, Aurora MySQL] Frequent "Recovery of the DB instance has started. Recovery time will vary with the amount of data to be recovered."

0

We have a db.t2.small Aurora MySQL instance with less than 2GiB of data that has been entering recovery frequently - 6 times in the last 3 weeks. From what I can find online, this is due to underlying hardware failures? Shouldn't the database be migrated to good/working hardware after a failure?

How do we prevent and/or reduce the frequency of these occurrences? Is there something we could be doing differently?

Also, our client software on our app server doesn't register a disconnect after this. Does the original host stay up while/after the new one is spun up? I've found anecdotal evidence that the IP might change, with people handling the event out-of-band by forcing a dns cache refresh via a triggered client software restart in response to the event.

Recovery on: 2022-08-03, 2022-08-02, 2022-07-27, 2022-07-20, 2x 2022-07-19

1 Answer
0

Looks like you are running a single instance cluster with no read replicas. If the underlying HW fails, it will try to bring up the instance on a new host and has to to crash recovery and can take up to 10 min. Your application should be connecting to the cluster endpoint and that will not change. To avoid this downtime you can add a replica which reduces the downtime to a few seconds instead of minutes, what happens in the background is that it will failover to the replica and make it a writer instance. The cluster end point will remain the same.

** PS : If this answers your question and helps you resolve the issue, please accept the answer **

AWS
answered 2 years ago
profile picture
EXPERT
reviewed 22 days ago
  • Thank you for your response. Could you elaborate on a few points? The console only lists a reader and a writer endpoint. What is the "cluster endpoint"? <somename>-cluster.cluster-ro-<some id>.<region>.rds.amazonaws.com <somename>-cluster.cluster-<some id>.<region>.rds.amazonaws.com Can you comment on the frequency of recovery events? Bad luck?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions