- Newest
- Most votes
- Most comments
What you're experiencing is not expected behavior for ECS deployments. The downtime you're seeing approximately 15-20 minutes after deployment is related to how your capacity provider and Auto Scaling Group are handling scale-in operations.
The issue appears to be that when your capacity provider scales the number of EC2 instances back down from 4 to 2 after deployment, it's not properly respecting task placement or ensuring instances with running tasks are drained before termination. This is causing your service downtime.
The key factor here is likely that you have "Capacity provider managed draining" disabled. This feature is designed specifically to prevent the scenario you're experiencing. When enabled, it ensures that:
- Instances with running tasks are properly drained before termination
- Tasks are migrated to other instances before the host instance is terminated
- Scale-in operations target empty instances first
I recommend enabling "Capacity provider managed draining" as this should resolve your issue. This feature automatically manages instance scale-in protection for you, similar to what you're doing manually.
Additionally, you might want to check:
- Your ASG cooldown periods - ensure they're not too short, which could cause rapid scaling decisions
- The capacity provider's target tracking metrics - ensure they're appropriate for your workload
- Consider adjusting your minimum healthy percent and maximum percent settings if you need to further optimize the deployment process
The behavior you're seeing when manually enabling scale-in protection (no downtime) confirms that the issue is related to instance termination during scale-in rather than the ECS deployment process itself.
Sources
The rollout update of new ECS service version is slow | AWS re:Post
The rollout update of new service version is slow | AWS re:Post
ECS Task Replacement | AWS re:Post
Relevant content
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a month ago
