I am using an Amazon Aurora cluster endpoint/writer endpoint in my application server but my application connects to the reader instance instead.
Short description
When you try to connect to your Aurora cluster endpoint or writer endpoint, your application might connect to the reader instance instead. This happens when the endpoint and its mapped IP addresses are cached on the client application end.
Aurora cluster endpoints always point to the Aurora writer instance. When a failover happens, the Aurora cluster endpoint points to the new writer instance. If you are using the cluster endpoint, then during failover your read/write connections are automatically redirected to an Aurora replica. This replica instance is promoted to primary.
So, during failover, the underlying IP address of your Aurora instance can change, and the cached value might no longer be in service.
A client trying to connect to a database using a DNS name must resolve that DNS name to an IP address by querying a DNS server. The client then caches the responses. Per protocol, DNS responses specify the Time to Live (TTL), which governs how long the client should cache the record. Aurora DNS zones use a short TTL of five seconds. But many systems implement client caches with different settings, which can make the TTL longer.
If a client tries to connect to the cluster when the DNS record changes haven't been propagated, then the client receives an old address. This causes the client to connect to the previous primary instance, which is now the reader instance.
So, caching the DNS data for an extended time can cause connection failures.
The client no longer gets TCP traffic from the database after failover initiates. Instead, it's up to the client to time out. This hard fencing of the original primary database on any failover means that the client sees similar behavior during planned and unplanned failovers.
Resolution
Check if you are connecting to the writer instance or an Aurora replica.
To determine if your client is connecting to the writer instance or to an Aurora replica, use the @@innodb_read_only variable:
mysql> select @@innodb_read_only;
A value of 0 means that you are connected to the writer instance.
Run this query to determine which server you're connected to, and if that server is a writer or reader:
mysql> select concat("You are connected to '",server_id,"', which is a ",if(SESSION_ID='MASTER_SESSION_ID',"Writer","Reader")) as CONNECTION_STATUS from information_schema.replica_host_status where SERVER_ID in (select @@aurora_server_id);
+-----------------------------------------------------------------+
| CONNECTION_STATUS |
+-----------------------------------------------------------------+
| You are connected to 'aurora-test-instance1', which is a Writer |
+-----------------------------------------------------------------+
1 row in set (0.08 sec)
Troubleshoot multiple reader instances in a cluster
Aurora reader endpoints are DNS CNAME entries. If a cluster has multiple reader instances, then when you resolve the reader endpoint, you get an instance IP that's chosen in round robin fashion. This is because the reader endpoint contains all Aurora replicas, and it provides DNS-based, round-robin load balancing for new connections.
Make sure that you keep resolving the endpoint without caching DNS to get a different instance IP on each resolution. If you resolve the endpoint only once and keep the connection in your pool, then every query on that connection goes to the same instance. If you cache DNS, you receive the same instance IP each time you resolve the endpoint.
Follow best practices
- Make sure that your network and client configurations don't further increase the DNS cache TTL. If you use any form of connection pooling or other multiplexing, you might need to flush or reduce the time-to-live for any cached DNS information. If your client application is caching the DNS data of your DB instances, then set a TTL value of less than 30 seconds.
- Use Amazon Relational Database Service (Amazon RDS) Proxy to manage connections. For more information, see Using Amazon RDS Proxy for Aurora.
- Review the best practices for Using smart drivers.
- Use a TCP-based load balancer such as Elastic Load Balancing or HA/Proxy.
Related information
Types of Aurora endpoints
DNS caching
Why do I get a read-only error after an Aurora MySQL-Compatible DB cluster fails over?