How do I connect Amazon mq to AWS emr serveless?

0

I have a Serverless EMR appication, I am submitting a spark job via python script. I have packaged all the dependencies an an the script to an s3 bucket. When I execute the job the spark job is running fine in fact all the dependencies are also getting imported for e.g. PIKA. but the issue is when I am trying to connect to the Rabbit mq via pika it gives the following error.

 Traceback (most recent call last):
  File "/tmp/spark-f27fd95b-3ec7-46b9-93f0-d9411a0e4a52/test_spark.py", line 108, in rabbit_mq_connection
    rabbit_mq_conn= pika.BlockingConnection(parameters)
  File "/home/hadoop/pyspark_deps/lib/python3.11/site-packages/pika/adapters/blocking_connection.py", line 360, in __init__
    self._impl = self._create_connection(parameters, _impl_class)
  File "/home/hadoop/pyspark_deps/lib/python3.11/site-packages/pika/adapters/blocking_connection.py", line 451, in _create_connection
    raise self._reap_last_connection_workflow_error(error)
pika.exceptions.AMQPConnectionError

I have check the following:

The server less executor has all the Amazon MQ full access
Rabbit mq is publicly accessible
I am able to connect to the rabbit mq instance via my lamda function

are there any permission missing ?

Tushar
asked a month ago445 views
2 Answers
2

The error pika.exceptions.AMQPConnectionError suggests that there could be an issue with the connection to the RabbitMQ server. Here are a few potential reasons and solutions:

  1. Network Connectivity: Ensure that the EMR Serverless application has network connectivity to the RabbitMQ instance. Even though the RabbitMQ instance is publicly accessible, there might be network restrictions or security group rules that prevent the EMR cluster from accessing the RabbitMQ instance.
  2. RabbitMQ Broker Configuration: Verify that the RabbitMQ broker is configured to accept connections from the EMR cluster's IP address or network range. You may need to update the broker's configuration to allow connections from the EMR cluster.
  3. Networking Mode: In EMR Serverless, the networking mode (Virtual Private Cloud (VPC) or Internet Gateway) can affect the network connectivity. Ensure that the networking mode is configured correctly to allow outbound connections to the RabbitMQ instance.

You can find more information about configuring the networking mode in the Serverless EMR documentation: Configure networking for an EMR Serverless application

AWS
answered a month ago
profile picture
EXPERT
reviewed a month ago
  • Thanks your suggestion worked, but now I am facing a weird issue. When I am submitting a job via my python script or aws cli the jobs gets submitted but fails. when I clone the same job from aws emr studio dashboard it works.

    this is my script

     application_id = 'appid'
    
    
        job_driver = {
        'sparkSubmit': {
            'entryPoint': 's3://spark.job/script/test_spark.py',  # Path to your Python script on S3
            'entryPointArguments': [id],  # Arguments to your Spark script, if any
            'sparkSubmitParameters': "--conf spark.archives=s3://spark.job/script/pyspark_deps.tar.gz#pyspark_deps " +
                                     "--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_deps/bin/python3.11 " +
                                     "--conf spark.executorEnv.PYSPARK_PYTHON=./pyspark_deps/bin/python3.11 " +
                                     "--conf spark.yarn.appMasterEnv.PYTHONPATH=./pyspark_deps/lib/python3.11/site-packages " +
                                     "--conf spark.executorEnv.PYTHONPATH=./pyspark_deps/lib/python3.11/site-packages"
        }
    }
    
    
    
        configuration_overrides = {
            'applicationConfiguration': [
                
            ]
        }
    
        # Start the Spark job
        response = client.start_job_run(
            applicationId=application_id,
            executionRoleArn='URL',
    
    
0

You can try the following steps:

  • Check Logs: Review the logs and error messages generated when the job fails to see if you can find the root cause.
  • Verify Configurations: Compare the configurations you're using in your script with the configurations used in the EMR Studio dashboard, including instance types, EMR release versions, and any additional configurations or overrides. If possible, try to replicate the same configurations and settings used in the EMR Studio dashboard in your Python script.
  • Check Permissions: Verify that the IAM role you're using has the necessary permissions to access the required S3 buckets, resources, and perform the necessary actions.

You can try submitting a simple job without any additional dependencies to ensure that the basic job is working correctly.

AWS
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions