RDS connections dropping and then self-recover

0

"I recently encountered an issue where I received consecutive health check down alerts via email. One by one, all my sites went down, which run on different EC2 instances and connect to different RDSs. When I checked the logs on my EC2 instances, I found that the DB connections had dropped. However, after around 5 minutes, everything returned to normal without any manual intervention. All of my Ec2 and RDS on single region Ohio. Has anyone experienced similar issues with RDS connections dropping and then self-recovering? I’d appreciate any insights into possible causes or troubleshooting steps!"

2 Answers
2

This issue you're experiencing with RDS connections dropping and then self-recovering could be related to a few different factors. Here are some insights and troubleshooting steps to consider:

  1. Network connectivity: There might have been a temporary network issue in the Ohio region affecting multiple EC2 instances and RDS databases. This could explain why all your sites went down simultaneously.

  2. RDS maintenance or updates: AWS occasionally performs maintenance on RDS instances, which can cause brief interruptions. Check your RDS event logs to see if there were any scheduled maintenance activities during that time.

  3. Resource constraints: If your RDS instances were experiencing high load or resource constraints, it could lead to connection drops. DevOps Guru for RDS, if enabled, can help detect performance issues like high DB load that might degrade service quality.

  4. SSL/TLS certificate issues: Although less likely if it affected multiple RDS instances, expired SSL/TLS certificates can cause connection problems. Ensure your certificates are up to date.

  5. Proxy issues: If you're using RDS Proxy, there could be temporary connection problems between the proxy and the DB instances.

To troubleshoot and prevent future occurrences:

  1. Check AWS service health dashboard for any reported issues in the Ohio region during that time.

  2. Review your RDS and EC2 CloudWatch metrics for any anomalies around the time of the incident.

  3. Enable Performance Insights on your RDS instances if not already done. This will provide more detailed analysis if DevOps Guru detects any issues in the future.

  4. Implement retry logic in your application to handle temporary connection drops gracefully.

  5. Consider using RDS Proxy to manage database connections more efficiently and improve application resilience.

  6. Regularly update your SSL/TLS certificates and ensure they're not near expiration.

If the issue persists or recurs, you may want to contact AWS support for a more in-depth investigation of your specific setup and the root cause of the connection drops.
Sources
How DevOps Guru for RDS works - Amazon DevOps Guru
Troubleshooting for RDS Proxy - Amazon Relational Database Service
Troubleshoot Amazon RDS expired SSL/TLS certificate connection issues | AWS re:Post
Troubleshooting applications on Amazon RDS - Amazon Relational Database Service

profile picture
answered a month ago
profile picture
EXPERT
reviewed a month ago
1

In terms of what's available from AWS Health, what date did this occur?

The Health Dashboard either the Service (public) view or your account specific events (when logged into the console) could give you an indication of service events affecting your resources. For example if EC2, RDS or another related service posted about an ongoing issue at that time.

https://health.aws.amazon.com/health/status

profile pictureAWS
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions