Ongoing service disruptions
For the most recent update on ongoing service disruptions affecting the AWS Middle East (UAE) Region (ME-CENTRAL-1), refer to the AWS Health Dashboard. For information on AWS Service migration, see How do I migrate my services to another region?
Why did my Amazon RDS DB instance restart, recover, or failover?
I want to know the root cause for the restart, recover, or failover of my Amazon Relational Database Service (Amazon RDS) database instance.
Short description
The Amazon RDS DB instance automatically performs a restart under the following conditions:
- There isn't availability in the primary Availability Zone or performance bottlenecks and resource contention have caused an excessive workload.
- There's an underlying infrastructure issue with the primary instance. This issue can be a loss of network connectivity to the primary instance, or a compute unit or storage issue on the primary instance.
- The DB instance class type is changed as part of a DB instance vertical scaling activity.
- The underlying host of the DB instance is undergoing OS patching during a specific maintenance window. For more information, see Maintaining a DB instance and Upgrading a DB instance engine version.
- You used the Reboot or Reboot with failover option to initiate a manual reboot of the DB instance.
When the DB instance shows potential issues and fails to respond to Amazon RDS health checks, Amazon RDS automatically takes one of the following actions:
- For a single AZ deployment, Amazon RDS initiates a Single-AZ recovery.
- For a Multi-AZ deployment, Amazon RDS initiates a Multi-AZ failover for the Multi-AZ deployment.
Then, Amazon RDS restarts the DB instance so that you can resume database operations as quickly as possible without administrative intervention.
Resolution
To identify the cause of the outage, check the following logs and metrics for your DB instance. Also, make sure to follow best practices.
Amazon RDS event messages
To identify the root cause of an unplanned outage in your instance, view all the Amazon RDS events for the last 24 hours. By default, Amazon RDS registers all events in the UTC/GMT time. To store events for a longer time period, send the Amazon RDS events to Amazon CloudWatch Events. When your instance restarts, you see one of the following messages in your Amazon RDS event notifications.
The RDS instance was modified by customer
This RDS event message indicates that an RDS instance modification initiated the failover.
Applying modification to database instance class
This RDS event message indicates that the DB instance class type is changed based on the deployment type:
- Single-AZ deployments become unavailable for a few minutes during this scaling operation.
- Multi-AZ deployments are unavailable during the time that it takes for the instance to failover. This duration is usually about 60 seconds. This delay occurs because the standby database is upgraded before the newly sized database experiences a failover. Then, your database restarts, and the engine performs recovery to make sure that your database remains in a consistent state.
The user requested a failover of the DB instance
This message indicates that a user used the Reboot or Reboot with failover option to initiate a manual reboot of the DB instance.
The primary host of the RDS Multi-AZ instance is unhealthy
This reason indicates that a transient underlying hardware issue led to the loss of communication to the primary instance. This issue might render the instance unhealthy because the RDS monitoring system couldn't communicate with the RDS instance to perform health checks.
The primary host of the RDS Multi-AZ instance is unreachable due to loss of network connectivity
This reason indicates that a transient network issue that affected the primary host of your Multi-AZ deployment caused the Multi-AZ failover and database instance restart. The internal monitoring system detected this issue and initiated a failover.
The RDS Multi-AZ primary instance is busy and unresponsive, the Multi-AZ instance activation started, or the Multi-AZ instance activation completed
The event log shows these messages under the following situations:
- The primary DB instance is unresponsive.
- A memory crunch after excessive memory consumption in the database prevented the Amazon RDS monitoring system from contacting the underlying host. As a proactive measure, the monitoring system restarts the database.
- The DB instance experienced intermittent network issues with the underlying host.
- The instance experienced a database load. In this case, you might notice spikes in CloudWatch metrics CPUUtilization, DatabaseConnections, IOPS metrics, and Throughput details. You might also notice depletion of Freeablememory.
Database instance patched
This message indicates that the DB instance underwent a minor version upgrade during a maintenance window. This message occurs because the Auto minor version upgrade setting is turned on for the instance.
CloudWatch metrics
To check if a database load issue caused the outage, review the CloudWatch metrics for your Amazon RDS instance. Spikes in the following key metrics might indicate an issue in the availability and health status of your RDS instance:
- DatabaseConnections
- CPUUtilization
- FreeableMemory
- WriteIOPS
- ReadIOPS
- ReadThroughput
- WriteThroughput
- DiskQueueDepth
Enhanced Monitoring
Amazon RDS delivers metrics from Enhanced Monitoring into your Amazon CloudWatch Logs account. This feature provides metrics in real time for the operating system that your DB instance runs on. You can view all the system metrics and process information for your DB instances on the console.
Database Insights
The CloudWatch Database Insights Dashboard contains information related to your database performance that you can use to analyze and troubleshoot performance issues. Use the Database Insights Dashboard to analyze statistics for a query. It's a best practice to use the information on this dashboard to tune the performance of the query and optimize your workload. For more information, see Viewing the Database Instance Dashboard for CloudWatch Database Insights.
Note: To reduce issues, it's a best practice to work with your database administrator to make these changes.
RDS database logs
To troubleshoot the cause of the outage for your DB instance, you can use the Amazon Aurora and RDS console or Amazon RDS API operations to view, download, or monitor database log files. You can also query the database log files that are loaded into database tables. For more information, see Monitoring Amazon RDS log files.
Keep the following best practices in mind when dealing with RDS instance outages:
- To reduce downtime during an outage, turn on Multi-AZ deployment for your instance.
- Adjust the DB instance maintenance window. If RDS needs to apply system changes, then the DB instance is unavailable during this time for only the minimum amount of time required to make the necessary changes. If you don't want your instances to go through automatic minor version upgrades, then turn off this option.
- Allocate sufficient resources to your database to run queries. With Amazon RDS, the number of resources allocated depends on the instance type. Also, specific queries, such as stored procedures, might take an unprecedented amount of memory. If the instance frequently restarts because of a lack of resources, then scale up your database instance class to keep up with your application demands.
- To avoid instance throttling, configure Amazon CloudWatch alarms on key Amazon RDS metrics that indicate the availability and health status of your RDS instances. For example, you can set a CloudWatch alarm on the FreeableMemory metric so that you receive a notification when available memory reaches 95%. For more information, see How can I filter Enhanced Monitoring CloudWatch logs to generate automated custom metrics for Amazon RDS?
- To have Amazon RDS notify you when there's a failover on your RDS instance, subscribe to Amazon RDS event notifications.
- To optimize database performance, make sure that your queries are properly tuned for PostgreSQL and MySQL. Otherwise, you might experience performance issues and extended wait times.
- To troubleshoot loads in terms of CPU, memory, or other resource crunch, see How do I troubleshoot high CPU utilization for Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible instances? Also, see How do I troubleshoot and resolve high CPU utilization on my Amazon RDS for MySQL or Aurora MySQL-Compatible DB instance?
Related information
What factors affect my downtime or database performance in Amazon RDS?
Why did my Amazon RDS DB instance fail over?
How do I minimize downtime during required Amazon RDS maintenance?
- Language
- English

Relevant content
- asked 2 years ago
- Accepted Answerasked a year ago
- asked 7 months ago
- Accepted Answerasked 3 years ago
AWS OFFICIALUpdated 4 years ago