Deploying an accessible RabbitMQ container to Fargate

2

I have a Python project that requires a Celery Beat service and I want to use RabbitMQ as the broker. I want to put this all in an ECS Cluster and use Fargate, and hopefully minimize my security risk. I'm using CDK and have the following configuration:

class Cluster(cdk.NestedStack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        vpc = ec2.Vpc(
            self,
            VPC_NAME,
            max_azs=2,
            enable_dns_hostnames=True,
            enable_dns_support=True,
        )
        sg = ec2.SecurityGroup(
            self,
            SECURITY_GROUP_NAME,
            vpc=vpc,
        )
        sg.connections.allow_from_any_ipv4(
            ec2.Port.tcp(5672),
        )
        cluster = ecs.Cluster(
            self,
            CLUSTER_NAME,
            vpc=vpc,
            cluster_name=CLUSTER_NAME,
            container_insights=True,
        )

        role = iam.Role(
            self,
            ROLE_NAME,
            assumed_by=iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
            managed_policies=[...],
        )

        beat_repository = ecr.Repository(
            self,
            BEAT_IMAGE_NAME,
            repository_name=BEAT_IMAGE_NAME,
        )
        beat_task_definition = ecs.FargateTaskDefinition(
            self,
            BEAT_TASK_DEFINITION_NAME,
            cpu=1024,
            memory_limit_mib=2048,
            family=BEAT_TASK_DEFINITION_NAME,
            task_role=role,
        )
        beat_task_definition.add_container(
            BEAT_CONTAINER_NAME,
            image=ecs.ContainerImage.from_ecr_repository(beat_repository),
            command=...,
            logging=ecs.LogDrivers.aws_logs(stream_prefix="ecs"),
            port_mappings=[ecs.PortMapping(container_port=8000)],
            # health_check=TODO
        )
        beat_service = ecs.FargateService(
            self,
            BEAT_SERVICE_NAME,
            cluster=cluster,
            task_definition=beat_task_definition,
            service_name=BEAT_SERVICE_NAME,
            security_groups=[sg],
        )

        rabbit_repository = ecr.Repository(
            self,
            RABBIT_IMAGE_NAME,
            repository_name=RABBIT_IMAGE_NAME,
        )
        rabbit_task_definition = ecs.FargateTaskDefinition(
            self,
            RABBIT_TASK_DEFINITION_NAME,
            cpu=1024,
            memory_limit_mib=2048,
            family=RABBIT_TASK_DEFINITION_NAME,
            task_role=role,
        )
        rabbit_task_definition.add_container(
            RABBIT_CONTAINER_NAME,
            image=ecs.ContainerImage.from_ecr_repository(rabbit_repository),
            logging=ecs.LogDrivers.aws_logs(stream_prefix="ecs"),
            port_mappings=[ecs.PortMapping(container_port=5672)],
            health_check=ecs.HealthCheck(
                command=["CMD-SHELL", "rabbitmq-diagnostics -q ping || exit 1"],
            ),
        )

        rabbit_service = ecs_patterns.NetworkLoadBalancedFargateService(
            self,
            RABBIT_SERVICE_NAME,
            cluster=cluster,
            task_definition=rabbit_task_definition,
            service_name=RABBIT_SERVICE_NAME,
            listener_port=5672,
            public_load_balancer=True,
            assign_public_ip=True,
        )
        rabbit_service.target_group.configure_health_check(
            protocol=elbv2.Protocol.TCP,
            port="5672",
        )

The Dockerfile for the Rabbit MQ container exposes 5672 and runs rabbitmq-server. I have a few questions:

  • Is there a way for me to access the containers in the rabbit_service from containers in the beat_service WITHOUT exposing my RabbitMQ to the internet? Can I shrink the security group ingress rule?
  • The RabbitMQ service deploys fine and containers come up in a healthy state. But, the containers get killed and in the console I see
    Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:...
    
    Why would the health checks for the target group fail? As far as I can tell, they are configured to ping TCP:5672.

Please let me know if there are optimizations I can make. Thanks!

  • I'll leave my answer as a comment because is not a full answer, is more guidance or a direction to follow on your research.

    For the first question, there are two alternatives,

Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen