ECS Tasks fail ELB Health Check

0

Hi,

I used the nodejs aws cdk to build an ECS service that runs a dockerized nodejs express app. When I test the docker container and code locally I am able to ping the health check just fine.

After deploying the infrastructure code, all the pieces seem to be there. However, my tasks keep deregistering because they fail the health check. The error that they show is:

Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-east-1:916847193903:targetgroup/AskGen-AskGe-JT3TRNKU8ROF/d048425de709efce)

I can see in the task logs that the requests are being received and are processed correctly, but the tasks continue to fail the health checks.

Any help/insights would be greatly appreciated.

Here is the cdk code:

const vpc = new aws_ec2.Vpc(construct, APIVpc-${env}`, {
    maxAzs: 1 // Default is all AZs in region
  });

  const cluster = new ecs.Cluster(construct, `APIFargateCluster-${env}`, {
    clusterName: `APIFargateCluster-${env}`,
    containerInsights: true,
    vpc
  });

  // Create ECR - This will hold all the docker images
  const repository = new ecr.Repository(construct, `ECRRepo-${env}`, {
    repositoryName: `_ecr_repo_${env}`,
    removalPolicy: RemovalPolicy.DESTROY
  });

  const hostedZone = aws_route53.HostedZone.fromHostedZoneAttributes(
    construct,
    'APIHostedZone',
    {
      hostedZoneId: '....', 
      zoneName: '.....
    }
  );

  const certificate = new aws_certificatemanager.Certificate(construct, `Cert-${env}`, {
    domainName: 'api.com',
    subjectAlternativeNames: ['*.api.com'],
    validation: aws_certificatemanager.CertificateValidation.fromDns(hostedZone) // Records must be added manually,
  });

  const ecrPolicy = new aws_iam.Policy(construct, `ECRPolicy-${env}`, {
    policyName: 'ECRPolicyName',
    statements: [
      new aws_iam.PolicyStatement({
        actions: [
          'ecr:GetAuthorizationToken',
          'ecr:BatchCheckLayerAvailability',
          'ecr:GetDownloadUrlForLayer',
          'ecr:GetRepositoryPolicy',
          'ecr:ListImages',
          'ecr:DescribeRepositories',
          'ecr:DescribeImages',
          'ecr:BatchGetImage',
          'logs:*',
          'secretsmanager:*',
          'sqs:*'
        ],
        resources: ['*'] // You can restrict resources if needed
      })
    ]
  });

  const ecsTaskRole = new aws_iam.Role(construct, `ECSTaskRole-${env}`, {
    roleName: `TaskDefinitionRole-${env}`,
    assumedBy: new aws_iam.ServicePrincipal('ecs-tasks.amazonaws.com')
  });

  ecsTaskRole.attachInlinePolicy(ecrPolicy);

  // Create a load-balanced Fargate service and make it public
  const service = new ecsPatterns.ApplicationLoadBalancedFargateService(
    construct,
    `APIFargateService-${env}`,
    {
      serviceName: `APIService-${env}`,
      cluster: cluster, // Required
      //redirectHTTP: true,
      certificate: certificate,
      cpu: 256, // Default is 256
      desiredCount: 1, // Default is 1
      circuitBreaker: {
        rollback: true
      },
      loadBalancerName: `APILoanBalancer-${env}`,
      domainName:  '....',
      domainZone: hostedZone,
      taskImageOptions: {
        containerName: `ApiContainer-${env}`,
        image: ecs.ContainerImage.fromRegistry(repository.repositoryUri),
        enableLogging: true,
        environment: {
          ...envVariables
        },
        taskRole: ecsTaskRole,
        executionRole: ecsTaskRole
      },
      memoryLimitMiB: 512, // Default is 512
      publicLoadBalancer: true // Default is true,
    }
  );

  service.targetGroup.configureHealthCheck({
    path: '/health-check'
  });

  return service;
1 Answer
1
Accepted Answer

Hi Brennan,

This issue usually seems to happen when the ELB Health Check fails even before the container is up and running. This duration is configured with the property "Health Check Grace Period". [1]

I would suggest you consider increasing the grace period from the default value. In CDK, for Health Check, there is a parameter called "startPeriod". [2]

References:

[1] https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-ecs-adds-elb-health-check-grace-period/

[2] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.HealthCheck.html

Please let me know if this resolves the issue.

Thanks,

Atul

profile picture
answered a year ago
profile pictureAWS
EXPERT
reviewed a year ago
  • This was the main issue! Thank you very much. Was able to solve it. I also had to update my Dockerfile so that the startup did not take so long.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions