- Newest
- Most votes
- Most comments
Hello There,
I understand that you have queries around practices and strategy for your deployment with efficient CPU allocation for your ECS tasks and cost optimisation.
================
-
[+] To begin with, it does seem like your workflow is basically well planned and conceptually good enough. I would still recommend adding another step where you check and monitor the resource utilisation and behaviour when you have higher number of users. Scale up and down the number of tasks, Memory/CPU values and observe whether the trade off between extra resource cost and latency/results improvement are worth it. That's the only correct way of determining the most suited resource requirements as such scenarios are application dependent and there is no pre determined values or methodology for the same. [1]
-
[+] Furthermore, I would say set the task definition resource values by selecting a constant small number for the resource limits. Calculate a value which ensures that your resources are not heavily underutilised at any period of time, while at the same time are also not so small that you have to scale the number of tasks too often, scaling too often can lead to disruptions as you need time to register and deregister from your ELB. You need to strike the balance, and that's the tough part. Other than that, let the service autoscaling handle the requests by scaling accordingly. [2]
For example, if you expect the concurrent users to rise to 50 at maximum and 1 at minimum and you require say 1 vCPU and 2 GB Memory for every 10 concurrent users before you need a new task due to resource constraint, a sample good strategy could be to keep 1 vCPU and 2 GB Memory as task limits and scale up the number of tasks at somewhere between 65-75% of vCPU and/or Memory usage. This ensures that you are not overpaying/overconsuming while also ensuring a good headroom in case you need to scale up to serve more concurrent users at a short notice.
- [+] Moreover, there still are limitations. Fargate does not cache images, so if your image is large, this can cause higher task startup times (way higher startup times if windows) and can cause disruptions. Here you have opportunities for improvement: (Skip if you have small sized images)
- Reduce the size of the Docker image, if possible.
- Use VPC Endpoint for ECR registry
- Network throughput is based on the CPU allocation to the Fargate Task, i.e. More vCPU means more network bandwidth for faster downloads.
- Store the Docker images in the same region as the Cluster/Task.
Check the attached documentation regarding ECS task launch time. [3]
- [+] Additionally, do consider other things, like if you have a steady or expected workload, EC2 can have better returns than Fargate. Also, identifying the correct metric and value setting for scaling the tasks can have significant impact on costs, using Container Insights for analysing the resource utilisations, throughputs and other metrics. Checking if metrics like "ActiveConnectionCount" to your ALB can be used for scaling and so on. I would strongly recommend going through a few documentations published by AWS on best practices which can provide a strong insight on how to approach optimisation of your deployment. [4]
================
[1] Cost Optimisation Checklist for Amazon ECS and AWS Fargate: https://aws.amazon.com/blogs/containers/cost-optimization-checklist-for-ecs-fargate/
[2] How Amazon ECS manages CPU and memory resources: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
[3] Optimise Amazon ECS task launch time: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-recommendations.html
[4] Optimise Amazon ECS service auto scaling: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-autoscaling.html
[5] Amazon ECS best practices: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-best-practices.html
Relevant content
- Accepted Answerasked 5 months ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago