Glue ETL error: Cannot resolve column name "***" among ()


Hi, I'm relatively new to AWS glue and i'm having trouble with Glue ETL error. What makes it strange is that this error is only on Dev env but not on Test env. Same code & configuration! Also tried to .printSchema() but it doesn't show on the logs.

code: JoinCheckinPartner_DF = JoinCheckinPartner_node1.toDF() ApplyMappingCustomerAccessPass_DF = ApplyMappingCustomerAccessPass.toDF() ApplyMappingAccessToken_DF = ApplyMappingAccessToken.toDF()

JoinCheckinPartnerCustomerAccessPass_DF = JoinCheckinPartner_DF.join( ApplyMappingCustomerAccessPass_DF, JoinCheckinPartner_DF.customer_access_pass_id == ApplyMappingCustomerAccessPass_DF.id_from_customeraccesspass_table, how = 'left_outer', ) Error: 23/12/13 15:46:36 ERROR ProcessLauncher: Error from Python:Traceback (most recent call last): File "/tmp/", line 495, in <module> JoinCheckinPartner_DF["customer_access_pass_id"] == ApplyMappingCustomerAccessPass_DF["id_from_customeraccesspass_table"]...

pyspark.sql.utils.AnalysisException: Cannot resolve column name "customer_access_pass_id" among ()

Glue Job Type: Spark ETL Language: python 3 Glue Version: Glue 3.0

1 Answer

I think the issue is the mixture of using DynamicFrame and DataFrame (I understand you do that to be able to use a left join which is not supported on DynamicFrame).
If you don't have actual data reaching that point, when you convert to DataFrame it will have an empty schema and then fail.
You could check for data present before reaching that point (e.g. count() > 0)

profile pictureAWS
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions