Issue with AWS ECS Auto-Scaling and Binpack Task Placement Strategy: Tasks Not Shifting Back After Scale-In

0

In AWS ECS, I use auto-scaling and a binpack task placement strategy. I am facing an issue where, once the tasks scale up and instances are attached to ECS, after a scale-in event, some tasks remain on different instances and do not shift back to fewer instances as expected. How can I resolve this issue?

已提問 7 個月前檢視次數 452 次
2 個答案
1
已接受的答案

Hi Tharunkumar,

Please try this below solution, I hope it will help to resolve your issue.

Implement an ECS Task Rebalancer:

  1. Create a Lambda Function: This function will check the task placement and stop tasks that need to be redistributed.

  2. Invoke Lambda Function Periodically: Use CloudWatch Events to trigger the Lambda function at regular intervals.

  3. CloudFormation Template: Use a CloudFormation template to create the Lambda function and set up the CloudWatch Event rule.

Lambda Function (Python)

This function lists tasks in your ECS cluster, groups them by instance, and stops tasks from under-utilized instances:


import boto3

ecs_client = boto3.client('ecs')

def lambda_handler(event, context):
    cluster_name = 'your-cluster-name'
    service_name = 'your-service-name'
    
    # List tasks
    tasks = ecs_client.list_tasks(cluster=cluster_name, serviceName=service_name)['taskArns']
    
    # Describe tasks
    tasks_details = ecs_client.describe_tasks(cluster=cluster_name, tasks=tasks)['tasks']
    
    # Group tasks by instance
    tasks_by_instance = {}
    for task in tasks_details:
        instance_id = task['containerInstanceArn']
        if instance_id not in tasks_by_instance:
            tasks_by_instance[instance_id] = []
        tasks_by_instance[instance_id].append(task['taskArn'])
    
    # Example logic: Stop tasks from under-utilized instances
    for instance_id, task_arns in tasks_by_instance.items():
        if len(task_arns) == 1:  # Adjust this threshold based on your binpack strategy
            ecs_client.stop_task(cluster=cluster_name, task=task_arns[0])
    
    return {
        'statusCode': 200,
        'body': 'Rebalanced tasks'
    }

CloudFormation Template

This template sets up the Lambda function and the CloudWatch Event rule to trigger it periodically:


AWSTemplateFormatVersion: '2010-09-09'
Resources:
  MyLambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: LambdaExecutionPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                  - ecs:ListTasks
                  - ecs:DescribeTasks
                  - ecs:StopTask
                Resource: '*'

  MyLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: MyRebalanceFunction
      Handler: index.lambda_handler
      Role: !GetAtt MyLambdaExecutionRole.Arn
      Code:
        ZipFile: |
          import boto3
          
          ecs_client = boto3.client('ecs')

          def lambda_handler(event, context):
              cluster_name = 'your-cluster-name'
              service_name = 'your-service-name'
              
              # List tasks
              tasks = ecs_client.list_tasks(cluster=cluster_name, serviceName=service_name)['taskArns']
              
              # Describe tasks
              tasks_details = ecs_client.describe_tasks(cluster=cluster_name, tasks=tasks)['tasks']
              
              # Group tasks by instance
              tasks_by_instance = {}
              for task in tasks_details:
                  instance_id = task['containerInstanceArn']
                  if instance_id not in tasks_by_instance:
                      tasks_by_instance[instance_id] = []
                  tasks_by_instance[instance_id].append(task['taskArn'])
              
              # Example logic: Stop tasks from under-utilized instances
              for instance_id, task_arns in tasks_by_instance.items():
                  if len(task_arns) == 1:  # Adjust this threshold based on your binpack strategy
                      ecs_client.stop_task(cluster=cluster_name, task=task_arns[0])
              
              return {
                  'statusCode': 200,
                  'body': 'Rebalanced tasks'
              }
      Runtime: python3.8

  CloudWatchEventRule:
    Type: AWS::Events::Rule
    Properties:
      ScheduleExpression: rate(5 minutes)
      Targets:
        - Arn: !GetAtt MyLambdaFunction.Arn
          Id: "TargetFunctionV1"

  LambdaInvokePermission:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:InvokeFunction
      FunctionName: !Ref MyLambdaFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt CloudWatchEventRule.Arn

Please go through the below useful AWS documentation links for the services involved

1. Lambda

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-permission.html

2. AWS IAM

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-role.html

3. AWS Cloud Watch Events

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-events-rule.html

4.AWS ECS:

https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ListTasks.html

https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_DescribeTasks.html

https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_StopTask.html

專家
已回答 7 個月前
專家
已審閱 5 個月前
專家
已審閱 7 個月前
0

Are you using a Capacity Provider for the ASG? If so, do you have Managed Termination Protection feature enabled? This will prevent instances from being scaled-in as long as there's any replica tasks running on them.

If you want to have tasks killed and replaced on new instances to binpack better, disabled Managed Termination Protection, and instead enabled Managed Draining and set the target value to 100

AWS
專家
已回答 7 個月前
  • I done this but it is not working

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南