Elasticache stuck "modifying" when performing a failover

0

I am trying to migrate my redis Elasticache server out of a specific subnet and so launched a replica. The replica took several hours to initialize, but it finally did. Then I failed over to the replica, but it's been almost 5 hours and it's still showing "Modifying." I asked Amazon Q and it said to try rebooting, but the reboot button is disabled. The redis cluster is still working (it's in a production environment) but I don't know why it's stuck 'modifying'. I'm wanting to shut down the original primary cluster and don't really want to be paying for two clusters to be running ... any ideas on why it's stuck?

asked 8 months ago1K views
2 Answers
1

=>It seems like you're encountering an issue with your Redis ElastiCache cluster being stuck in the "modifying" state after performing a failover. Here are some steps you can take to troubleshoot and resolve the issue:

=>Check Status: Ensure that all nodes in the cluster are healthy and operational. If any nodes are experiencing issues or failures, it could impact the modification process.

=>Review Logs: Look into the ElastiCache logs for any error messages or warnings that might provide insight into why the modification process is stuck.

=>Check Resource Limits: Ensure that your AWS account has sufficient resources available to complete the modification process. Sometimes, resource limitations can cause operations to get stuck.

=>Reboot: Since the reboot option is disabled, try performing a manual reboot using the command-line interface (CLI) or AWS SDK. You can use the reboot-cache-cluster command to initiate a reboot.

=>Contact AWS Support: If the issue persists after trying the above steps, it's best to reach out to AWS Support for further assistance. They can provide deeper insights into the specific issue affecting your cluster and help you resolve it.

=>Cost Considerations: While troubleshooting, keep an eye on your AWS billing to ensure you're not incurring unnecessary costs. You can also consider scaling down resources if possible to minimize expenses while resolving the issue.

=>By following these steps and leveraging AWS support if needed, you should be able to address the stuck "modifying" state of your Redis ElastiCache cluster and successfully complete the failover process.

EXPERT
answered 8 months ago
0

Based on the information you've provided, it seems like you're facing an issue with the migration of your Redis ElastiCache server to a new subnet. The fact that the replica took several hours to initialize and the failover process is still stuck in the "Modifying" state for almost 5 hours is concerning. [1]

Here are a few things you can try to resolve this issue:

Check the ElastiCache event logs: The ElastiCache service logs various events related to your clusters. You can check the event logs to see if there are any error messages or indications of what might be causing the delay in the failover process. You can access the event logs through the AWS Management Console, the AWS CLI, or the ElastiCache API.

Verify the subnet configuration: Ensure that the new subnet you're trying to migrate to is properly configured and has the necessary network settings, security groups, and routing tables to support the Redis cluster. Any issues with the subnet configuration could be causing the failover process to get stuck.

Contact AWS Support: If you're unable to resolve the issue on your own, I would recommend reaching out to AWS Support. They have access to more detailed information about your ElastiCache cluster and can investigate the issue further. You can create a support case through the AWS Management Console or the AWS Support Center.

Consider a manual migration: As a last resort, you could consider performing a manual migration of your Redis data from the old cluster to the new one. This would involve creating a snapshot of the old cluster, restoring it to the new cluster, and then updating your application to point to the new cluster. This process may involve some downtime, but it could be a viable option if the automated failover process continues to be problematic.

It's understandable that you don't want to be paying for two clusters to be running, but it's important to ensure that the migration process is completed successfully to avoid any data loss or service disruptions. I would recommend reaching out to AWS Support as soon as possible to get their assistance in resolving this issue.

Please let me know if you have any other questions!

Sources [1] Modifying cluster mode - Amazon ElastiCache for Redis https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/modify-cluster-mode.html

profile pictureAWS
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions