- Newest
- Most votes
- Most comments
Hi,
an Application Load Balancer is a regional resource, meaning that you can only configure Availability Zones in the same region as your load balancer. To achieve a greater resiliency, you can leverage multiple load balancers in multiple regions. However, you also need to think about the resources behind these load balancers and how they can be kept in sync (if that's a requirement for you). If a multi-region solution makes sense for your workload, have a look at Global Accelerator which allows you to distribute traffic across multiple load balancers in one or more AWS Regions. (https://docs.aws.amazon.com/global-accelerator/latest/dg/what-is-global-accelerator.html) as well as https://docs.aws.amazon.com/whitepapers/latest/real-time-communication-on-aws/cross-region-dns-based-load-balancing-and-failover.html
AWS has two layers of fault isolation: AZ and region. AZs are disparate buildings within a metropolitan area with low latency internet links, collectively called a region. No single power outage / truck crash / backhoe should be able to entirely take offline an entire region. Then you have the control plane above that -- one control plane per region.
A bad control plane can affect multiple AZs within a region, though AWS has some mitigations against that.
Operating in multiple AZs protects you from many physical failures.
Operating in multiple regions protects you from most control-plane failures.
I think someone did a study and showed that us-east-1 is more likely then other regions to suffer a control-plane failure like this, but all the regions can.
If you operate in a single region you are always susceptible to a control plane failure.
Thank you. That's useful information. In regard to the failure of the lambdas in last week's outage. Is there anything, short of setting up a Route 53 multi-region load-balancer that would have mitigated the problem for us? For example, if we had chosen us-east-1-bos-1 or us-east-1-atl for an AZ, would we have more protection from an outage like that? We were somewhat shocked that, in spite of our best effort to design for fault-tolerance, we suffered an outage.
I'm less familiar with Local Zones and how their control plane works; my inclination is to say using a us-east-1* LZ does not insulate you from us-east-1 failure but I can't say for sure. Being in us-east-2 would insulate you from a us-east-1 failure, but not from a failure of us-east-2. If you need that level of reliability then a GTM or GlobalAccelerator load balancer across two disparate regions is needed. Note that you can't rely on being able to change the R53 configuration if us-east-1 dies, but behavior changes due to healthcheck failures or ApplicationRecoveryController switches still works.
Relevant content
- Accepted Answerasked a year ago
- What do we need to configure to enable a multi availability zone Simple Email Service configuration?Accepted Answerasked a year ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 10 months ago
Thank you. I understand all that. In AWS, in us-east-1, there are several availability zones you can select from. For example, us-east-1a, us-east-1b, etc. Apparently, all the AZ in us-east-1* were impacted by last week's outages and the ALBs were not able to compensate. So, my question is, if we had selected one of the additional zones, like us-east-1-bos-1, would that have given us better fault-tolerance in lieu of last weeks region-wide outage in the AWS lambda services?
Hi, thanks for the clarification. It's hard to say because a Local Zone is not fully independent of its parent zone. The control plane (as compared to the data plane) of some services like EC2 run in the parent region (see https://aws.amazon.com/blogs/networking-and-content-delivery/hybrid-inspection-architectures-with-aws-local-zone/).