跳至内容

如何配置和监控 Amazon ECS 部署断路器?

4 分钟阅读
0

我想在我的 Amazon Elastic Container Service (Amazon ECS) 部署失败时自动回滚并收到通知。

简短描述

要使用 Amazon ECS 部署断路器自动回滚和监控部署,请完成以下步骤:

  1. 配置部署断路器。
  2. 配置 Amazon EventBridge 以监控部署断路器。
  3. 测试部署失败场景。

解决方法

**注意:**如果您在运行 AWS 命令行界面 (AWS CLI) 命令时收到错误,请参阅 AWS CLI 错误故障排除。此外,确保您使用的是最新版本的 AWS CLI

配置部署断路器

完成以下步骤:

  1. 使用与以下示例类似的任务定义创建一个 JSON 文件:
    {
        "family": "my-task",
        "containerDefinitions": [
            {
                "name": "sample-container",
                "image": "nginx:alpine",
                "essential": true
            }
        ],
        "executionRoleArn": "arn:aws:iam::123456789876:role/ecsTaskExecutionRole",
        "networkMode": "awsvpc",
        "requiresCompatibilities": [
            "FARGATE"
        ],
        "cpu": "256",
        "memory": "512"
    }
    **注意:**请将 123456789876 替换为您的 AWS 账户 ID。如果您没有 ecsTaskExecutionRole,请创建任务执行角色
  2. 要注册任务定义,请运行以下 register-task-definition 命令:
    aws ecs register-task-definition \
        --cli-input-json file://taskdef-success.json
    **注意:**请将 taskdef-success.json 替换为您的任务定义 JSON 文件。
  3. 要在激活部署断路器和回滚的情况下创建 Amazon ECS 服务,请运行以下 create-service 命令:
    aws ecs create-service \
         --cluster default \
         --service-name my-sample-service \
         --deployment-controller type=ECS \
         --desired-count 1 \
         --deployment-configuration "deploymentCircuitBreaker={enable=true,rollback=true}" \
         --task-definition my-task:1 \
         --launch-type FARGATE \
         --network-configuration "awsvpcConfiguration={subnets=[subnet-12345],securityGroups=[sg-12345],assignPublicIp=ENABLED}"
    **注意:**请将 subnet-12345 替换为您的子网,将 sg-12345 替换为您的安全组。必须将 deployment-controller 设置为 type=ECS,因为部署断路器仅适用于滚动更新部署。
    如果您没有默认集群,请运行以下 create-cluster 命令来创建集群:
    aws ecs create-cluster \
        --cluster-name example-cluster
    **注意:**请将 example-cluster 替换为您的集群名称。
  4. 运行以下 describe-services 命令,确认 Amazon ECS 服务处于 Steady(稳定)状态:
    aws ecs describe-services \
        --cluster default \
        --services my-sample-service | jq '.services[0].events[] | {message}'
    您收到的输出类似于以下示例:
    {
      "message": "(service my-sample-service) has reached a steady state."
    }
    {
      "message": "(service my-sample-service) (deployment ecs-svc/1234567890123456789) deployment completed."
    }
    {
      "message": "(service my-sample-service) has started 1 tasks: (task 2918eb15dd0f4d42affc2a3a07818abf)."
    }

配置 EventBridge 以监控部署断路器

完成以下步骤:

  1. 运行以下 create-topic 命令创建 Amazon Simple Notification Service (Amazon SNS) 主题,以用作 EventBridge 规则目标:

    aws sns create-topic \
        --name my-topic

    **注意:**请将 my-topic 替换为您的 SNS 主题名称。

  2. 要更新主题属性以允许调用所需的 API,请运行以下 set-topic-attributes 命令:

    aws sns set-topic-attributes \
        --topic-arn arn:aws:sns:eu-west-1:123456789876:my-topic \
        --attribute-name Policy \
        --attribute-value '{
      "Version": "2008-10-17",
      "Id": "my_topic_policy",
      "Statement": [
        {
          "Sid": "my_topic_default",
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          },
          "Action": [
            "SNS:GetTopicAttributes",
            "SNS:SetTopicAttributes",
            "SNS:AddPermission",
            "SNS:RemovePermission",
            "SNS:DeleteTopic",
            "SNS:Subscribe",
            "SNS:ListSubscriptionsByTopic",
            "SNS:Publish"
          ],
          "Resource": "arn:aws:sns:eu-west-1:123456789876:my-topic",
          "Condition": {
            "StringEquals": {
              "AWS:SourceOwner": "123456789876"
            }
          }
        },
        {
          "Sid": "my_topic_for_sns_Publish",
          "Effect": "Allow",
          "Principal": {
            "Service": "events.amazonaws.com"
          },
          "Action": "sns:Publish",
          "Resource": "arn:aws:sns:eu-west-1:123456789876:my-topic"
        }
      ]
    }'

    **注意:**请将 eu-west-1 替换为您的 AWS 区域,将 123456789876 替换为您的账户 ID,将 my-topic 替换为您的主题名称。

  3. 要通过电子邮件订阅 SNS 主题,请运行以下 subscribe 命令:

    aws sns subscribe \
        --topic-arn arn:aws:sns:eu-west-1:123456789876:my-topic \
        --protocol email \
        --notification-endpoint example@example.com

    **注意:**请将 eu-west-1 替换为您的区域,将 123456789876 替换为您的账户 ID,将 my-topic 替换为您的主题名称,将 example@example.com 替换为您的电子邮件地址。

  4. 在您收到的订阅确认电子邮件中,选择 Confirm subscription(确认订阅)。

  5. 要为服务部署失败事件创建 EventBridge 规则,请运行以下 put-rule 命令:

    aws events put-rule \
      --name "EcsServiceDeploymentFailed" \
      --event-pattern "{\"source\":[\"aws.ecs\"],\"detail-type\":[\"ECS Deployment State Change\"],\"detail\":{\"eventName\":[\"SERVICE_DEPLOYMENT_FAILED\"]}}"
  6. 要添加 SNS 主题作为 EventBridge 规则目标,请运行以下 put-targets 命令:

    aws events put-targets \
        --rule EcsServiceDeploymentFailed --targets "Id"="1","Arn"="arn:aws:sns:eu-west-1:123456789876:my-topic"

    **注意:**请将 eu-west-1 替换为您的区域,将 123456789876 替换为您的账户 ID,将 my-topic 替换为您的主题名称。

测试部署失败场景

完成以下步骤:

  1. 创建 JSON 文件,在其任务定义中包含不正确的映像标签,类似于以下内容:

    {
        "family": "my-task",
        "containerDefinitions": [
            {
                "name": "sample-container",
                "image": "nginx:wrong-image-tag",
                "essential": true
            }
        ],
        "executionRoleArn": "arn:aws:iam::123456789876:role/ecsTaskExecutionRole",
        "networkMode": "awsvpc",
        "requiresCompatibilities": [
            "FARGATE"
        ],
        "cpu": "256",
        "memory": "512"
    }

    **注意:**请将 sample-container 替换为您的容器实例,将 nginx:wrong-image-tag 替换为不正确的映像标签,将 123456789876 替换为您的账户 ID。不正确的映像标签会导致部署失败。

  2. 要注册任务定义,请运行以下 register-task-definition 命令:

    aws ecs register-task-definition --cli-input-json file://taskdef-failure.json

    **注意:**请将 taskdef-failure.json 替换为任务定义 JSON 文件的标题。

  3. 要使用新的任务定义更新服务并启动新部署,请运行以下 update-service 命令:

    aws ecs update-service --service my-sample-service --task-definition my-task:2

    **注意:**请将 my-sample-service 替换为您的服务,将 my-task:2 替换为您的任务。由于任务无法拉取映像,新部署将失败。您收到的输出类似于以下示例:

    {
        "version": "0",
        "id": "12345abc-2f7c-f86a-e544-a69218eb1446",
        "detail-type": "ECS Deployment State Change",
        "source": "aws.ecs",
        "account": "123456789876",
        "time": "2024-11-19T17:42:41Z",
        "region": "eu-west-1",
        "resources": [
            "arn:aws:ecs:eu-west-1:123456789876:service/default/my-sample-service"
        ],
        "detail": {
            "eventType": "ERROR",
            "eventName": "SERVICE_DEPLOYMENT_FAILED",
            "clusterArn": "arn:aws:ecs:eu-west-1:123456789876:cluster/default",
            "deploymentId": "ecs-svc/9876543210987654321",
            "updatedAt": "2024-11-19T17:42:40.73Z",
            "reason": "ECS deployment circuit breaker: tasks failed to start."
        }
    }
  4. 要验证 Amazon ECS 服务是否已回滚,请运行以下 describe-services 命令:

    aws ecs describe-services \
        --cluster default \
        --services my-sample-service | jq '.services[0].events[] | {message}'

    您收到的输出类似于以下示例:

    {
      "message": "(service my-sample-service) has reached a steady state."
    }
    {
      "message": "(service my-sample-service) (deployment ecs-svc/1234567890123456789) deployment completed."
    }
    {
      "message": "(service my-sample-service) rolling back to deployment ecs-svc/1234567890123456789."
    }
    {
      "message": "(service my-sample-service) (deployment ecs-svc/9876543210987654321) deployment failed: tasks failed to start."
    }
    {
      "message": "(service my-sample-service) has started 1 tasks: (task b808c60616134ec1ac0c656a2bff1ef2)."
    }
    {
      "message": "(service my-sample-service) has started 1 tasks: (task 846c9aebd9224c2b832a38942cae5ea6)."
    }
    {
      "message": "(service my-sample-service) has started 1 tasks: (task 7143d03444574f2db2b567d75df3fe72)."
    }
    {
      "message": "(service my-sample-service) has started 1 tasks: (task 9a6a399770d940a2b442560c02a6a4c0)."
    }
    {
      "message": "(service my-sample-service) has reached a steady state."
    }
    {
      "message": "(service my-sample-service) (deployment ecs-svc/1234567890123456789) deployment completed."
    }
    {
      "message": "(service my-sample-service) has started 1 tasks: (task 2918eb15dd0f4d42affc2a3a07818abf)."
    }

相关信息

Amazon ECS 部署断路器如何检测故障

Announcing Amazon ECS deployment circuit breaker

AWS 官方已更新 1 年前