Deploying an accessible RabbitMQ container to Fargate

2

I have a Python project that requires a Celery Beat service and I want to use RabbitMQ as the broker. I want to put this all in an ECS Cluster and use Fargate, and hopefully minimize my security risk. I'm using CDK and have the following configuration:

class Cluster(cdk.NestedStack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        vpc = ec2.Vpc(
            self,
            VPC_NAME,
            max_azs=2,
            enable_dns_hostnames=True,
            enable_dns_support=True,
        )
        sg = ec2.SecurityGroup(
            self,
            SECURITY_GROUP_NAME,
            vpc=vpc,
        )
        sg.connections.allow_from_any_ipv4(
            ec2.Port.tcp(5672),
        )
        cluster = ecs.Cluster(
            self,
            CLUSTER_NAME,
            vpc=vpc,
            cluster_name=CLUSTER_NAME,
            container_insights=True,
        )

        role = iam.Role(
            self,
            ROLE_NAME,
            assumed_by=iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
            managed_policies=[...],
        )

        beat_repository = ecr.Repository(
            self,
            BEAT_IMAGE_NAME,
            repository_name=BEAT_IMAGE_NAME,
        )
        beat_task_definition = ecs.FargateTaskDefinition(
            self,
            BEAT_TASK_DEFINITION_NAME,
            cpu=1024,
            memory_limit_mib=2048,
            family=BEAT_TASK_DEFINITION_NAME,
            task_role=role,
        )
        beat_task_definition.add_container(
            BEAT_CONTAINER_NAME,
            image=ecs.ContainerImage.from_ecr_repository(beat_repository),
            command=...,
            logging=ecs.LogDrivers.aws_logs(stream_prefix="ecs"),
            port_mappings=[ecs.PortMapping(container_port=8000)],
            # health_check=TODO
        )
        beat_service = ecs.FargateService(
            self,
            BEAT_SERVICE_NAME,
            cluster=cluster,
            task_definition=beat_task_definition,
            service_name=BEAT_SERVICE_NAME,
            security_groups=[sg],
        )

        rabbit_repository = ecr.Repository(
            self,
            RABBIT_IMAGE_NAME,
            repository_name=RABBIT_IMAGE_NAME,
        )
        rabbit_task_definition = ecs.FargateTaskDefinition(
            self,
            RABBIT_TASK_DEFINITION_NAME,
            cpu=1024,
            memory_limit_mib=2048,
            family=RABBIT_TASK_DEFINITION_NAME,
            task_role=role,
        )
        rabbit_task_definition.add_container(
            RABBIT_CONTAINER_NAME,
            image=ecs.ContainerImage.from_ecr_repository(rabbit_repository),
            logging=ecs.LogDrivers.aws_logs(stream_prefix="ecs"),
            port_mappings=[ecs.PortMapping(container_port=5672)],
            health_check=ecs.HealthCheck(
                command=["CMD-SHELL", "rabbitmq-diagnostics -q ping || exit 1"],
            ),
        )

        rabbit_service = ecs_patterns.NetworkLoadBalancedFargateService(
            self,
            RABBIT_SERVICE_NAME,
            cluster=cluster,
            task_definition=rabbit_task_definition,
            service_name=RABBIT_SERVICE_NAME,
            listener_port=5672,
            public_load_balancer=True,
            assign_public_ip=True,
        )
        rabbit_service.target_group.configure_health_check(
            protocol=elbv2.Protocol.TCP,
            port="5672",
        )

The Dockerfile for the Rabbit MQ container exposes 5672 and runs rabbitmq-server. I have a few questions:

  • Is there a way for me to access the containers in the rabbit_service from containers in the beat_service WITHOUT exposing my RabbitMQ to the internet? Can I shrink the security group ingress rule?
  • The RabbitMQ service deploys fine and containers come up in a healthy state. But, the containers get killed and in the console I see
    Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:...
    
    Why would the health checks for the target group fail? As far as I can tell, they are configured to ping TCP:5672.

Please let me know if there are optimizations I can make. Thanks!

  • I'll leave my answer as a comment because is not a full answer, is more guidance or a direction to follow on your research.

    For the first question, there are two alternatives,

Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande