Skip to content

Issues and Questions Encountered During Blue/Green Deployment with CodeDeploy

0

I'm still not very familiar with AWS, so I set up a blue/green zero-downtime deployment environment using CodeDeploy with the help of Gemini. The deployment and traffic shifting seem to work correctly, but I think there's an issue, so I'm here to ask a question.

Auto Scaling Group

First, for the Auto Scaling Group, I set the desired, minimum, and maximum capacity all to 1 and linked a launch template. The permissions granted to the IAM role in this launch template are as follows:

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonS3ReadOnlyAccess
  • AWSCodeDeployFullAccess
  • AWSCodeDeployRole

Target Group

After that, I created two Target Groups, service-tg-blue and service-tg-green, with the same settings. The protocol is HTTP on port 80, and the health check path is set to /.

Load Balancer

I created an Application Load Balancer. The listener for HTTP:80 is set to redirect to HTTPS://#{host}:443/#{path}?#{query}. The listener for HTTPS:443 has its default action set to [Forward to target group] with service-tg-blue: 100% and service-tg-green: 0%, and target group stickiness is off. The SSL certificate was issued and linked via AWS ACM.

Code Deploy

In Code Deploy, I created a deployment group with the Deployment type set to [Blue/green]. I chose [Copy an Auto Scaling group], initially selecting the Auto Scaling Group I created, allowing it to create new ASGs automatically afterward. For Deployment settings, I selected [Reroute traffic immediately] and [Terminate the original instances in the deployment group] -> wait time [5 minutes]. The Deployment configuration is set to CodeDeployDefault.AllAtOnce. For the Load balancer type, I chose [Application Load Balancer or Network Load Balancer] and selected the two Target Groups, service-tg-blue and service-tg-green, that I linked to the Load Balancer above. [IAM Role]: The role name is CodeDeployRole-For-Service.

  • The role has the following 3 policies: AutoScalingFullAccess, AWSCodeDeployRole, and a custom policy.
  • A summary of the permissions for the custom policy is as follows:
ServiceAccess levelResourceRequest condition
EC2Limited: List, Write, TaggingAll resourcesNone
EC2 Auto ScalingLimited: List, WriteAll resourcesNone
ELBFull: List, Read, WriteAll resourcesNone
ELB v2Limited: Read, WriteAll resourcesNone
IAMLimited: WriteRoleNamestring like
SNSLimited: WriteAll resourcesNone

Current Problem

The deployment itself completes successfully, and the shift to the new version happens without any downtime. However, after the deployment is complete, both the service-tg-blue and service-tg-green Target Groups show the same instance under their [Registered targets], and the traffic ratio remains fixed at service-tg-blue: 100%, service-tg-green: 0%. Also, despite the setting [Terminate the original instances in the deployment group] -> wait time [5 minutes], it seems that no group is being terminated after the 5 minutes have passed.

My expected scenario was:

  1. Initially, 100% of the traffic goes to the service-tg-blue environment.
  2. When a new deployment starts, Code Deploy launches a new instance in service-tg-green and modifies the traffic to service-tg-blue: 0%, service-tg-green: 100%.
  3. When a second deployment starts, Code Deploy then launches a new instance in service-tg-blue and shifts the traffic back to service-tg-blue: 100%, service-tg-green: 0%. I imagined that service-tg-blue and service-tg-green would alternate, taking on the blue and green roles respectively, but it doesn't seem to be working as I expected. Based on my current situation, it seems that traffic is continuously routed only to the service-tg-blue group, instead of the blue and green roles alternating.

What I'm curious about

  1. What is currently happening in my setup?
  2. What additional settings should I look into?
  3. Is my current setup problematic, or is it working correctly and this is just a simple misunderstanding due to my lack of knowledge about AWS?
  4. How can I resolve my current issue?

These are the 4 things I'm curious about. Thank you for reading this long post.

1 Answer
0

Based on your description, I can see that you've set up a blue/green deployment with CodeDeploy, but there are some misunderstandings about how the process works with ECS and target groups.

In a CodeDeploy blue/green deployment with ECS, the process doesn't work exactly as you've described in your expected scenario. Here's what's actually happening:

  1. When you perform a blue/green deployment with CodeDeploy, it doesn't alternate between your target groups as separate environments. Instead, both target groups are used simultaneously during the deployment process.

  2. The issue you're seeing where both target groups show the same instance is expected behavior. In a CodeDeploy blue/green deployment, the same EC2 instance can be registered to multiple target groups. This is because the traffic shifting happens at the load balancer level, not by moving instances between target groups.

  3. The traffic ratio remains at 100% to blue and 0% to green after deployment because CodeDeploy modifies these weights during the deployment process, but once the deployment is complete, it sets the production traffic back to the primary target group (in this case, your blue target group).

  4. Regarding the termination of instances, in a CodeDeploy blue/green deployment with EC2/ASG, CodeDeploy creates a new Auto Scaling group for the green environment and terminates the old one after the specified wait time. However, your description suggests that this isn't happening correctly.

To resolve your issues:

  1. Check your CodeDeploy service role permissions. The role needs sufficient permissions to manage Auto Scaling groups, EC2 instances, and load balancers. Your custom policy seems to have most of these permissions, but verify that there are no restrictions preventing instance termination.

  2. Ensure your AppSpec file is correctly configured for blue/green deployments. The AppSpec file tells CodeDeploy how to handle the deployment, including which resources to use and how to route traffic.

  3. Consider using the AWS Console or CLI to check the deployment logs and events to see if there are any errors or warnings during the deployment process, particularly during the termination phase.

  4. If you want to alternate between blue and green environments with each deployment, you might need to modify your deployment process or consider using a different approach, such as creating a custom script that alternates the target group assignments between deployments.

Your current setup isn't necessarily problematic, but it might not be achieving what you expected. The behavior you're seeing with both target groups showing the same instance is normal for how CodeDeploy works with load balancers and target groups in a blue/green deployment.
Sources
CodePipeline step CodeDeploy using ECS blue/green fails at 'Rerouting production traffic to replacement task set' | AWS re:Post
Troubleshoot Amazon ECS deployment issues - AWS CodeDeploy
Why do new EC2 instances initially show as unhealthy in ALB Target Group during both blue-green deployments and autoscaling? | AWS re:Post
Questions about CodeDeploy and ECS | AWS re:Post
Instance is registering to the incorrect Target Group automatically when Deploying via CodeDeploy | AWS re:Post

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.