I have setup an AWS Glue Crawler to read the AWS CUR data residing in S3. Yesterday, I have enabled new Cost Allocation tags in CUR and today I can see them when I query the table in Athena. But I cant access the new columns in AWS Glue ETL job. I am reading the table in AWS Glue ETL as below.
dyf = glueContext.create_dynamic_frame.from_catalog(database=source_db,
table_name=source_tbl)
usage_df = dyf.toDF()
usage_df = usage_df.filter(filter_clause)
usage_df.printSchema() ## Schema is not showing the new fields
Tried executing MSCK REPAIR TABLE
, still no luck. The Crawler property set as Update the table definition in the data catalog
and its a partitioned table with year
and month
as partition column. Am I missing anything ?
Thanks a lot. It worked.
usage_df = spark.table("source_db.source_tbl")
@Gonzalo Herreros can you share details around fixing it and then going back to using create_dynamic_frame.from_catalog or create_data_frame.from_catalog. Or is the expectation to use only spark.table once we have updated the schema?