pyspark dataframe in glue notebook mode=overwrite creates empty files at each path level

0

I'm writing partitioned parquet data using a Spark data frame and mode=overwrite to update stale partitions. I have this set: spark.conf.set('spark.sql.sources.partitionOverwriteMode','dynamic') The data is being written correctly with all the partitioning being set correctly, but I am also getting empty files created at each level of the path named <path_level>_$folder$. Removing mode=overwrite eliminates this strange behavior. Is there any way to prevent these zero size files from being created? Have I misconfigured something?

已提问 1 年前826 查看次数
1 回答
0

Those are folder markers, they are used because S3 doesn't allow creating empty folders without objects, you can just ignore them they shouldn't do any harm, it's very common to have them.
Once the corresponding directory has any files, they can be safely removed.

profile pictureAWS
专家
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则