pyspark dataframe in glue notebook mode=overwrite creates empty files at each path level

0

I'm writing partitioned parquet data using a Spark data frame and mode=overwrite to update stale partitions. I have this set: spark.conf.set('spark.sql.sources.partitionOverwriteMode','dynamic') The data is being written correctly with all the partitioning being set correctly, but I am also getting empty files created at each level of the path named <path_level>_$folder$. Removing mode=overwrite eliminates this strange behavior. Is there any way to prevent these zero size files from being created? Have I misconfigured something?

asked a year ago810 views
1 Answer
0

Those are folder markers, they are used because S3 doesn't allow creating empty folders without objects, you can just ignore them they shouldn't do any harm, it's very common to have them.
Once the corresponding directory has any files, they can be safely removed.

profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions