Skip to content

Troubleshooting AWS Load Balancer and EC2 Issues After Storage Optimization and Snapshot Deletion.

0

We made two changes to our server, optimized storage, and deleted older snapshots. A few days later, our Elastic Load Balancers stopped launching new instances. Our AWS engineer made some adjustments, which allowed the load balancer to launch instances again. However, all our APIs are now returning 500 errors. We haven't changed any code, only the storage optimization and snapshot deletion were performed. We suspect that after the load balancer issue was resolved, some settings were inadvertently changed or lost, causing the API failures. We need help troubleshooting this, as we're unsure what settings might be affected by the load balancer and instance launch problems.

2 Answers
0
Accepted Answer

Load balancers don't launch EC2 instances. In your case, it's probably EC2 Auto Scaling that launches new instances and associates them with the target groups of your load balancers. Auto Scaling can also track the health checks made by the load balancer to decide to replace failed instances, but it's still Auto Scaling, not the load balancer, that terminates and launches the EC2 instances.

What exactly did you do with regard to storage optimisation and removing snapshots? Auto Scaling launches EC2 instances from AMIs (Amazon Machine Images), which are typically backed by EBS snapshots. If you removed the AMI and associated snapshots that your Auto Scaling Groups (ASGs) are set to use via EC2 launch templates (or legacy launch configurations), failures to launch instances would be caused by the AMI specified in the template not being available. Similar launch failures would occur if security groups, VPC subnets, or other dependencies of the ASG's launch template are deleted.

EXPERT
answered a year ago
EXPERT
reviewed a year ago
0

Based on the information provided, it seems you're experiencing a complex issue that may involve multiple components of your AWS infrastructure. Let's break down the troubleshooting process:

  1. Load Balancer Issues: The initial problem with the Elastic Load Balancers not launching new instances could have been related to the storage optimization and snapshot deletion. This might have affected the AMIs or launch configurations used by your Auto Scaling groups.

  2. API 500 Errors: The fact that your APIs are now returning 500 errors after the load balancer issue was resolved suggests that there might be problems with your application servers or their configuration. Here are some steps to troubleshoot:

a) Check Instance Health: Verify that your EC2 instances are healthy and properly registered with the load balancer. Look at the HealthyHostCount metric in CloudWatch to ensure you have healthy instances in each Availability Zone.

b) Examine Application Logs: Review your application logs for any errors that might be causing the 500 responses. The issue could be related to database connections, file permissions, or other application-specific problems.

c) Verify Security Groups and Network ACLs: Ensure that the security groups and network ACLs are correctly configured to allow traffic between the load balancer and your EC2 instances.

d) Check Instance Resources: Monitor the CPU, memory, and disk usage of your EC2 instances. The storage optimization might have inadvertently affected the available resources for your application.

e) Review Load Balancer Configuration: Double-check your load balancer settings, including listener rules, target group settings, and health check configurations. Ensure that the health check path is correctly set and that your application is responding properly to these checks.

f) Inspect Application Code and Dependencies: Although you mentioned no code changes, verify that all necessary files and dependencies are present and correctly configured on the new instances.

g) Check Database Connections: If your APIs rely on a database, ensure that the connection strings and credentials are correct and that the database is accessible from your EC2 instances.

h) Verify Environment Variables: Check that any necessary environment variables or configuration files are properly set on the new instances.

i) Examine ELB Access Logs: Enable access logging on your load balancer if it's not already active. This can provide insights into the requests that are resulting in 500 errors.

j) Monitor CloudWatch Metrics: Pay attention to metrics like HTTPCode_ELB_5XX and HTTPCode_Target_5XX to get a better understanding of where the errors are occurring.

If after going through these steps you're still unable to identify the root cause, you may need to engage AWS Support for more in-depth troubleshooting. They can help investigate potential issues within the ELB service or provide guidance on any misconfigurations that might have occurred during the resolution of the initial load balancer problem.

Remember to document any changes made during this troubleshooting process, as it will help in both resolving the current issue and preventing similar problems in the future.
Sources
Troubleshoot a Classic Load Balancer: Health checks - Elastic Load Balancing
Troubleshoot a Classic Load Balancer: HTTP errors - Elastic Load Balancing
How to debug 502 errors on ELB | AWS re:Post

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.