EC2 Instance Connection Issues - Recurring Data Reception Failures with EC2, Route 53, and Load Balancer Every Few Days

0

Hello,

I am running a backend server on an Amazon EC2 instance and have set up Route 53, a Load Balancer, and Certificate Manager to manage HTTPS connections for my domain. I am using pm2 to ensure the backend processes continue running even if the console is closed.

However, I am facing a recurring issue where, approximately every 2–7 days, the domain stops receiving data. When I check the port status with the command lsof -i :443, it shows something is still running on the port, but data is not being received through the domain. Even after using pkill node to terminate the process on port 443 and restarting the process, the issue persists.

The only way I have been able to resolve this issue is by stopping the EC2 instance and then restarting it, which changes the public IP address. I then manually update the public IP in the A record of Route 53. Once this is done, the domain starts receiving data again.

I have no clear understanding of what the cause is, nor do I have a sense of what actions I should take..

1 Answer
0

This recurring issue you're experiencing with your EC2 instance and associated services is complex and could be caused by several factors. Let's break down the potential causes and solutions:

  1. EC2 Instance Health: The fact that stopping and starting the instance resolves the issue suggests there might be an underlying problem with the EC2 instance itself. It's possible that the instance is experiencing resource exhaustion (CPU, memory, or disk) or a software issue that's not immediately apparent.

  2. Load Balancer Configuration: Since you're using a Load Balancer, it's important to ensure that the health checks are configured correctly. If the Load Balancer is marking your instance as unhealthy, it could explain why data isn't being received. Check the Load Balancer's health check settings and logs to see if there are any failures.

  3. Security Group and Network ACL Settings: Verify that your security group and network ACL settings are correctly configured to allow traffic on the necessary ports, including port 443 for HTTPS.

  4. Route 53 and DNS Configuration: The need to manually update the A record in Route 53 after restarting the instance indicates that you might not be using an Elastic IP address. Consider assigning an Elastic IP to your instance, which will remain static even when you stop and start the instance. This would eliminate the need for manual DNS updates.

  5. Application-level Issues: The fact that killing and restarting the Node.js process doesn't resolve the issue suggests that the problem might not be at the application level. However, it's worth reviewing your application logs for any recurring errors or issues that might be causing the service to become unresponsive.

  6. EC2 Instance Type: Depending on your workload, you might be using an instance type that's not suitable for your needs. Consider monitoring your instance's resource utilization and upgrading to a more powerful instance type if necessary.

To resolve this issue:

  1. Implement proper monitoring using Amazon CloudWatch to track your instance's health, resource utilization, and application performance.

  2. Set up Auto Scaling to automatically replace unhealthy instances.

  3. Use an Elastic IP address to avoid manual DNS updates.

  4. Review and optimize your Load Balancer configuration, including health check settings.

  5. Implement more robust error handling and logging in your application to catch and report any issues.

  6. Consider using Amazon EC2 Auto Recovery to automatically recover the instance if it becomes impaired.

By implementing these measures, you should be able to improve the stability of your setup and reduce or eliminate the need for manual interventions.
Sources
Troubleshoot website connectivity on EC2 Windows instance | AWS re:Post
Troubleshoot status check failures on an unreachable EC2 Linux instance | AWS re:Post
Troubleshoot issues connecting to your Amazon EC2 Linux instance - Amazon Elastic Compute Cloud
Troubleshoot a Classic Load Balancer: Health checks - Elastic Load Balancing

profile picture
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions