pyspark dataframe in glue notebook mode=overwrite creates empty files at each path level

0

I'm writing partitioned parquet data using a Spark data frame and mode=overwrite to update stale partitions. I have this set: spark.conf.set('spark.sql.sources.partitionOverwriteMode','dynamic') The data is being written correctly with all the partitioning being set correctly, but I am also getting empty files created at each level of the path named <path_level>_$folder$. Removing mode=overwrite eliminates this strange behavior. Is there any way to prevent these zero size files from being created? Have I misconfigured something?

質問済み 1年前826ビュー
1回答
0

Those are folder markers, they are used because S3 doesn't allow creating empty folders without objects, you can just ignore them they shouldn't do any harm, it's very common to have them.
Once the corresponding directory has any files, they can be safely removed.

profile pictureAWS
エキスパート
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ