AWS Glue Mongodb Atlas Connection Works For Crawler But Not For ETL Job?

0

Hi community,

I am trying to perform an ETL job using AWS Glue. Our data is stored in MongoDB Atlas, inside a VPC. Our AWS is connected to our MongoDB Atlas using VPC peering.

To perform the ETL job in AWS Glue I have first created a connection using the VPC details and the mongoDB Atlas URI along with the password and username. The connection is used by the AWS Glue crawlers to extract the schema to AWS Data Catalog Tables. This connection works!

However, I am then attempting to perform the actual ETL job using the following pySpark code:

#My Temp Variables source_database="d*********a" source_table_name="main_businesses source_mongodb_db_name = "main" source_mongodb_collection = "businesses"

glueContext.create_dynamic_frame.from_catalog(database=source_database,table_name=source_table_name,additional_options = {"database": source_mongodb_db_name,"collection":source_mongodb_collection})

However the connection times out and for some reason mongodb atlas is blocking the connection from the ETL job. It's as if the ETL Job is using the connection differently than the crawler does. Maybe the ETL Job is not able to run the job inside our AWS VPC that is connected to the MongoDB Atlas VPC (VPC Peering is not possible?).

Does anyone have any idea what might be going on or how I can fix this?

Thank you!

  • Have you checked for differences in the IAM policies attached to ETL job role and the crawler role?

  • Hi Sausborn, I have given the crawler and the glue job the same role and so the same IAM policies... Yet the crawler is able to connect to the database with the given roles but the glue job fails.

1 Answer
0
Accepted Answer

Are you adding the connection to the Glue job?
It's not enough to reference the table since the Glue job needs to know the connection VPC details before starting

profile pictureAWS
EXPERT
answered a year ago
  • Hi Gonzalo, I have reference the VPC details when I created the connection. (VPC id, subnet and security group) When clicking on the connection, I can see that both the crawler and the glue jobs reference this connection. The crawler is successfully connecting but the glue job is not.

    Which other way is there to reference the connection VPC details from within the glue job ?

  • Make sure the connection is added in the job connection list

  • Thank you :D That helped.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions