Skip to content

Fast Failover using Active-Passive failover policy

0

I have two DCs (On OnPrem). My primary domain example.com is on GoDaddy. Which is resolved to public hosted zones on AWS. I have added two A records say A and B . A is configured to map the nginx's public IP in DC1 and B is to nginx's public IP in DC2. For record A, a health check is configured like https:://<IP>:443/health , which returns 200 OK in normal condition. For Both A record , TTL is configured as 10 sec. For health check the interval is 10 sec and retries = 2. In my tests, I observed, that when my primary nginx goes down, the failover takes between 60 to 90 sec. How can I achieve the fast switch over to the secondary site with 10-20 sec?

asked 4 months ago71 views
1 Answer
1

To achieve faster failover (10-20 seconds) in your active-passive configuration with Route 53, you need to optimize several aspects of your current setup:

  1. Health Check Configuration:
  • Your current health check interval is 10 seconds with 2 retries, which means it could take up to 30 seconds (10 seconds × 3 checks) to detect a failure
  • Consider reducing the health check interval to the minimum allowed (typically 10 seconds)
  • Reduce the failure threshold to 1 retry instead of 2 to detect failures faster
  1. DNS TTL Settings:
  • While you've set the TTL to 10 seconds, which is good, be aware that some DNS resolvers might ignore very low TTL values and cache responses longer
  • Ensure your application is configured to respect DNS TTL values and not cache DNS resolutions internally
  1. Failover Configuration:
  • Ensure you're using proper failover routing policy in Route 53, with one record designated as "Primary" and the other as "Secondary"
  • The primary record should have the health check associated with it
  • For the secondary record, you can either associate a health check or set "Evaluate Target Health" to "Yes" if it's an alias record
  1. Client-side Considerations:
  • DNS caching on client machines or intermediate DNS servers might be extending your failover time
  • Some clients or applications might cache DNS resolutions longer than the specified TTL

In active-passive failover configurations, Route 53 automatically changes to the secondary resource when the primary becomes unhealthy, and automatically fails back when the primary becomes healthy again. If you need to control the failback process manually, you can either deactivate the health check or use weighted routing instead of failover routing.

Remember that even with the most optimized configuration, there will always be some delay in failover due to the nature of DNS propagation and caching.
Sources
Active-active and active-passive failover - Amazon Route 53
Prevent the automatic failback with a Route 53 health check | AWS re:Post
Disaster recovery options in the cloud - Disaster Recovery of Workloads on AWS: Recovery in the Cloud

answered 4 months ago
EXPERT
reviewed 4 months ago
  • Thanks Gary for the detailed Answer. For my above use case, are there any other options on AWS than the Active - Passive fail over using A records?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.