1 Answer
- Newest
- Most votes
- Most comments
0
ALBs will generally take few seconds to propagate any changes, but the actual times can vary.
As you are seeing 504 errors when changing the weights of the target groups from 0 to 100. Here are some troubleshooting suggestions:
- Try moving the weights slowly/incrementally and test.
- Make sure both the target groups have healthy targets.
- Check access logs for that time.
From the error it looks like there is some error in the application servers. Let me know if this help in troubleshooting.
answered 2 months ago
Relevant content
- Accepted Answerasked 2 years ago
- asked 9 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Thank you for your reply, we're implementing a slower change in the target groups weights as a workaround to see if it helps. We are sure that both target groups have healthy targets and are able to take requests. We are also going to eanble ALB logs and check them with the logs from the instances.
I was able to create a production like load on the test environment but I couldn't reproduce the problem, when I change the weights from 0 to 100 all the traffic switch to the new target group after roughly 10 seconds but without any error. Could the problem be related to some VPC or subnet configuration? In the test environment we recrete everything every time but the production enviroment is still using some old vpc/subnet.
We've managed to did some tests in the production environment and we noticed this: if the active color is blue the 504 errors happen when the green target group start to register the ASG instances, the errors last until all the green instances are healthy (this take almost 2 minutes in our case). All of this happens even if the weight of the green target group is 0 while the blue one is 100. So the errors happen before we change the weights, and chaging them afterward does not cause any more errors.
Thank you for your email. This looks like an application issue. Seems like ALB is working as expected.
I see you mention that for Blue target group 504 errors last for 2 minutes and at that time 100% of the request is going to blue target group. To troubleshoot this further:
Please let me know if above troubleshooting steps help and feel free to share your result (after hiding sensitive data from result) and we can dive deep into it further.