How do I implement disaster recovery or fault tolerance for my ElastiCache for Redis self-designed cluster?

3 minute read

I want to implement disaster recovery or fault tolerance for my Amazon ElastiCache for Redis self-designed cluster.

Resolution

To implement disaster recovery or fault tolerance for your ElastiCache for Redis self-designed cluster, choose one of the following methods based on your use case:

Multi-Availability Zone

If data retention, minimal downtime, and application performance are a priority, then use the Multi-AZ solution. This method offers the following benefits:

Low data loss potential - Multi-AZ provides fault tolerance for every scenario and hardware related issues.
Low performance impact - Multi-AZ provides the fastest recovery time because there's no manual procedure to follow after the process is implemented.
Low to high cost - Multi-AZ is the lowest cost option. Use Multi-AZ when you can't risk data loss because of hardware failure. Also, if you can't afford the downtime required by other options in your response to an outage, then use this option.

For more information, see Minimizing downtime in ElastiCache for Redis with Multi-AZ.

Cross-Region

Use Global Datastore for Redis to write and read data between an ElastiCache for Redis self-designed cluster in one AWS Region and different cross-Region replica clusters. This feature allows low latency reads and disaster recovery across Regions. This method offers the following benefits:

Medium data loss potential - When the manual promotion is initiated, the promotion completes in less than 1 minute and allows your applications to remain available.
Low performance impact - If Regional degradation occurs, then a cross-Region replica cluster in the Global Datastore can be promoted to a primary cluster with full capabilities. This promotion occurs in less than one minute and allows your applications to remain available.
Medium to high cost - Global Datastore introduces the secondary Regions cost for disaster recovery support across Regions.

For more information, see Replication across AWS Regions using global datastores.

Daily automatic backups

Schedule your daily automatic backups at times that you expect low resource use for your cluster. ElastiCache creates a backup of the cluster, and then writes the data from the cache to a Redis rdb file. Redis versions 2.8.22 and later implement a forkless backup that improves performance.

Note: Redis backup and restore aren't supported on cache.t1.micro nodes for clusters with cluster mode turned off.

High data loss potential - Daily automatic backups are retained for up to 35 days.
Medium to high performance impact - Performance is affected when you run multiple file backups throughout the day. To improve performance, turn on RDB snapshots on a designated persistence only secondary node. Then, turn off both RDB snapshots and the Redis append-only file (AOF) on the primary node and all other secondary nodes.
Low to medium cost - Storage costs increase with the number of backups and the data retention duration.

Note: Before you implement backup and restore, make sure that you review the limitations that are caused by backup constraints. For more information, see Snapshot and restore and Taking manual backups.

Topics

Database

Relevant content

FSx for OpenZFS: compute and network scalability (and fault-tolerance)
Nick
asked 3 months ago
PITR testing for disaster recovery
rePost-User-1893682
asked 2 months ago
Is CloudEndure(CE)/AWS Elastic Disaster Recovery(AWS EDRS) supported for disaster recovery of SAP on AWS?
Accepted Answer
EXPERT
Deep_K
asked 2 years ago
What is Amazon's own Disaster Recovery plan?
rePost-User-9898972
asked a year ago
Disaster Recovery solution for IAM users
rePost-User-9168878
asked 6 months ago
How do I use Auto Scaling to improve the fault tolerance of an application behind my load balancer?
AWS OFFICIALUpdated a year ago
How do I implement Redis keyspace notifications in ElastiCache?
AWS OFFICIALUpdated a month ago
How do I make my Amazon OpenSearch Service domain more fault tolerant?
AWS OFFICIALUpdated 3 years ago
How do I restore, resize, or create an EBS persistent storage snapshot in Amazon EKS for disaster recovery or when the EBS modification rate is exceeded?
AWS OFFICIALUpdated 2 years ago
AWS Elastic Disaster Recovery with VMware Raw Device Mappings
EXPERT
Patrick Kremer
published a year ago