Hello,
I have an ecs service running an nlp model for inference. The service has the following scaling policy:
ChatSpamV3ScalingPolicy:
DeletionPolicy: "Delete"
Type: "AWS::ApplicationAutoScaling::ScalingPolicy"
Properties:
PolicyName: !Join [ "-", [ "CPU", "at85", !Ref ChatSpamV3ServiceName ] ]
PolicyType: "TargetTrackingScaling"
ScalingTargetId: !Ref ChatSpamV3AppAutoScalingScalableTarget
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: "ECSServiceAverageCPUUtilization"
ScaleInCooldown: 40
ScaleOutCooldown: 60
TargetValue: 40
ChatSpamV3ScalingRAMPolicy:
DeletionPolicy: "Delete"
Type: "AWS::ApplicationAutoScaling::ScalingPolicy"
Properties:
PolicyName: !Join [ "-", [ "RAM", "at", !Ref ChatSpamV3ServiceName ] ]
PolicyType: "TargetTrackingScaling"
ScalingTargetId: !Ref ChatSpamV3AppAutoScalingScalableTarget
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: "ECSServiceAverageMemoryUtilization"
ScaleInCooldown: 300
ScaleOutCooldown: 100
TargetValue: 90
ChatSpamV3InternalTargetGroup:
Type: "AWS::ElasticLoadBalancingV2::TargetGroup"
Properties:
HealthCheckIntervalSeconds: 40
HealthCheckPath: "/ping"
Port: 8080
Protocol: "HTTP"
HealthCheckPort: "traffic-port"
HealthCheckProtocol: "HTTP"
HealthCheckTimeoutSeconds: 15
UnhealthyThresholdCount: 7
TargetType: "ip"
Matcher:
HttpCode: "200"
HealthyThresholdCount: 3
VpcId: "vpc-ddc697ba"
Name: !Join [ '-', [ 'ml', !Ref ChatSpamV3ServiceName ] ]
HealthCheckEnabled: true
TargetGroupAttributes:
- Key: "stickiness.enabled"
Value: "false"
- Key: "deregistration_delay.timeout_seconds"
Value: "30"
- Key: "stickiness.type"
Value: "lb_cookie"
- Key: "stickiness.lb_cookie.duration_seconds"
Value: "86400"
- Key: "slow_start.duration_seconds"
Value: "0"
- Key: "load_balancing.algorithm.type"
Value: "least_outstanding_requests"
As you can see there is a policy on cpu, memory and I have also experimented with request per target count.
Now the issue I face is that occasionally one of the tasks blows up - not sure why but it is not a bug in the code, it is an issue with the load (perhaps memory leak)?
I would expect based on autoscaling for the service to provision and register another task or to scale up before the task blows up. Instead the service never scales up and while the first task crashes, the extra traffic accumulated on the remaining tasks causes them to crash as well.
I end up having 0 healthy tasks running while I have an extra capacity for up to 20 tasks to be spun up which never happens. Does anyone know why this is the case, why the task that crashes does not get replaced soon enough to mitigate a cascade of failures, and why the service does not scale up prior to allowing tasks to start failing?