Skip to content

Does Route 53 latency-based routing perform failover when one region performance degrades?

0

Hi,

I have the following deployment:

  1. Region1: API Gateway (apigw1) backed by lambda function (myFunc) exposed via custom domain name (say, api.mydomain.com)
  2. Region2: API Gateway (apigw2) backed by same lambda function implementation (myFunc) exposed via same custom domain name (i.e. api.mydomain.com).
  3. Global: Two "A" records for "api.mydomain.com" in Route 53 Hosted Zone (mydomain.com) with Routing Policy as "Latency" pointing to apigw1 endpoint and apigw2 endpoint in region1 and region2 respectively.

My questions are:

  1. If there is a problem in region1 (e.g. myFunc lambda function in region1 has performance degradation) resulting in significantly higher API response time, will the clients connected to apigw1 endpoint for region1 be eventually (say, in 30 mins or 1 hour or after some reasonable time period) be diverted to apigw2 in region2?
  2. If answer to above question is "no", what changes in Routing Policy or other components do I need to perform to ensure Route 53 can failover to a healthy region endpoint in case of issues in one region?

Please advise.

2 Answers
1
Accepted Answer

To address your questions about Route 53 latency-based routing and failover:

  1. Latency-based routing alone does not automatically failover or divert traffic based on performance degradation. Route 53 uses historical latency data to make routing decisions, but it doesn't actively monitor real-time performance issues like slow lambda function responses. This means that clients connected to the region1 endpoint (apigw1) would not automatically be diverted to region2 (apigw2) if performance degrades in region1.

  2. To ensure failover to a healthy region endpoint in case of issues, you should implement a combination of health checks and failover routing along with your latency-based routing. Here's how you can modify your setup:

    a. Create health checks for each regional endpoint (apigw1 and apigw2).

    b. Instead of using just latency-based routing, implement a combination of failover and latency routing:

    • Create two failover records (primary and secondary) for api.mydomain.com.
    • For each failover record, create latency-based alias records pointing to your regional API Gateway endpoints.
    • Associate the health checks with the corresponding latency records.

    c. Configure the failover records:

    • Set one region as primary and the other as secondary.
    • Route 53 will use the primary region when it's healthy, and automatically switch to the secondary if the primary fails the health check.

    d. Within each failover record, the latency-based routing will direct traffic to the lowest latency endpoint within that group.

This setup allows you to maintain the benefits of latency-based routing while also implementing automatic failover in case of regional issues. Route 53 will detect when an endpoint is unhealthy based on the health checks and stop including it when responding to queries.

Remember to configure appropriate health check parameters that accurately reflect the performance and availability of your API Gateway and Lambda function. This ensures that Route 53 can detect issues and initiate failover when necessary.

By implementing this combination of failover and latency-based routing with health checks, you can achieve both low-latency routing under normal conditions and automatic failover to a healthy region when issues occur in one region.
Sources
Latency-based routing - Amazon Route 53
Active-active and active-passive failover - Amazon Route 53
Cross-Region DNS-based load balancing and failover - Real-Time Communication on AWS

answered a year ago
EXPERT
reviewed a year ago
  • Thank you, Adeleke. This is very helpful.

  • And how would it work in a scenario, where I have 2 regions, I want to provide the possibly best performance, yet without implementing sophisticated health checks (let's say latency based routing is fine even with historical data), but I also want to keep my application operational in case of a region outage? It seems this would go under active-active scenario, which, to my understanding, would just support this out of the box.

    The other scenario would be where I'd want to offer the best performance that is uncompromised - then does the failover policy with sophisticated health checks that validate the performance of my application fit the best? And is it possible to create a sophisticated health check? It is possible to use CloudWatch to some extent, but it seems somewhat limited to me. How about Global Accelerator? Should it be considered? Are there any other options? This other scenario cannot be based on historical data, it needs to grant the best possible performance.

0

Route 53 Latency-Based Routing (LBR) does not inherently perform failover when the performance of one region degrades. LBR is designed to route users to the endpoint that has the lowest latency based on historical measurements, not current performance metrics.

Latency-Based Routing alone does not failover based on performance degradation. Solution: Add health checks and use Failover Routing Policy alongside Latency-Based Routing. Consider monitoring and automation with CloudWatch and Lambda to proactively manage endpoint health.

EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.