Skip to content

Why did my Amazon Aurora PostgreSQL-Compatible cluster fail over?

4 minute read
0

I want to know what causes failover events in my Amazon Aurora PostgreSQL-Compatible Edition database (DB) cluster.

Short description

When one of the following events occurs, Aurora PostgreSQL-Compatible automatically fails over to a reader instance:

  • There are infrastructure issues with the writer instance. This includes loss of network connectivity to the physical host or cluster's volume, or issues with the physical computing resources.
  • The writer instance isn't reachable. This issue occurs when there's an excessive workload that causes performance bottlenecks and resource contention.
  • The writer's DB instance class type changes because of DB instance vertical scaling.
  • The writer's underlying host experiences software patching, hardware maintenance, or an operating system (OS) update during a specific maintenance window. For more information, see Maintaining an Amazon Aurora DB cluster.
  • The instance uses the failover option at the instance level.

Resolution

When the cluster's writer fails to respond to health checks, then the cluster starts a failover to one of the readers based on priority. To identify what caused the failover, check the following logs and metrics for your Aurora PostgreSQL-Compatible cluster.

Amazon RDS events

To identify the cause of an unplanned outage, view all the Aurora events from the failover period. You can view events up to the past 2 weeks. To store events for a longer period of time, send the Aurora events to Amazon EventBridge. For more information, see Creating a rule that triggers on an Amazon Aurora event.

CloudWatch metrics

To check whether high DB load caused the failover, use Amazon CloudWatch to view your Aurora DB cluster metrics.

Check for spikes in the following metrics that show the availability and health status of your cluster:

  • DatabaseConnections
  • CPUUtilization
  • FreeableMemory
  • DiskQueueDepth
  • StorageNetworkThroughput

Enhanced Monitoring

Use Enhanced Monitoring to view OS metrics in real time. To turn on Enhanced Monitoring for your Amazon Aurora instances, see Setting up and turning on Enhanced Monitoring. For a list of OS metrics that you can view, see OS metrics in Enhanced Monitoring.

Performance Insights

Use Performance Insights to view the DB load on your Aurora PostgreSQL-Compatible cluster. You can filter the load by waits, SQL statements, hosts, or users. For more information, see Analyzing metrics with the Performance Insights dashboard.

Performance Insights shows queries that contribute the most to the DB load, such as a query that uses 99% of the DB load.

Performance Insights helps you identify whether the following issues might affect DB cluster performance:

  • I/O operations, such as IO:DataFileRead for disk reads
  • Locking contention, such as Lock:transactionid, and Lock:Relation
  • Buffer management issues, such as BufferPin:BufferPin
  • Client communication delays, such as Client:ClientRead, and Client:ClientWrite

Important: Performance Insights will reach its end of life on June 30, 2026. You can upgrade to the Advanced mode of Database insights before June 30, 2026. If you don't upgrade, then DB clusters that use Performance Insights will default to the Standard mode of Database Insights. Only the Advanced mode of Database Insights will support execution plans and on-demand analysis. If your clusters default to the Standard mode, then you might not be able to use these features on the console. To turn on the Advanced mode, see Turning on the Advanced mode of Database Insights for Amazon Relational Database Service (Amazon RDS). Also, see Turning on the Advanced mode of Database Insights for Amazon Aurora.

Aurora DB logs

In on-premises databases, the DB logs reside on the file system. Because you can't access the host to the DB logs on the file system, publish your logs to Amazon CloudWatch Logs instead.

You can also use the Aurora and RDS console to watch a DB log file.

Fast failover with Aurora PostgreSQL

To quickly switch operations to a healthy replica instance after a failover, configure your application for fast failover.

Fast recovery after failover with cluster cache management for Aurora PostgreSQL-Compatible

For fast recovery of your DB instance in your DB cluster, use cluster cache management for Aurora PostgreSQL-Compatible.

RDS Proxy to improve failover performance

Use Amazon RDS Proxy to keep an open pool of connections to the DB instances. During database failovers, RDS Proxy continues to accept connections at the same IP address and automatically directs connections to the new primary DB instance. When the original DB instance becomes unavailable, RDS Proxy connects to the standby database but doesn't drop idle application connections.

Related information

High availability for Amazon Aurora

Monitoring metrics in an Amazon Aurora cluster

Amazon RDS event categories and event messages for Aurora

AWS OFFICIALUpdated 4 months ago