pyspark dataframe in glue notebook mode=overwrite creates empty files at each path level

0

I'm writing partitioned parquet data using a Spark data frame and mode=overwrite to update stale partitions. I have this set: spark.conf.set('spark.sql.sources.partitionOverwriteMode','dynamic') The data is being written correctly with all the partitioning being set correctly, but I am also getting empty files created at each level of the path named <path_level>_$folder$. Removing mode=overwrite eliminates this strange behavior. Is there any way to prevent these zero size files from being created? Have I misconfigured something?

질문됨 일 년 전826회 조회
1개 답변
0

Those are folder markers, they are used because S3 doesn't allow creating empty folders without objects, you can just ignore them they shouldn't do any harm, it's very common to have them.
Once the corresponding directory has any files, they can be safely removed.

profile pictureAWS
전문가
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인