pyspark dataframe in glue notebook mode=overwrite creates empty files at each path level

0

I'm writing partitioned parquet data using a Spark data frame and mode=overwrite to update stale partitions. I have this set: spark.conf.set('spark.sql.sources.partitionOverwriteMode','dynamic') The data is being written correctly with all the partitioning being set correctly, but I am also getting empty files created at each level of the path named <path_level>_$folder$. Removing mode=overwrite eliminates this strange behavior. Is there any way to prevent these zero size files from being created? Have I misconfigured something?

preguntada hace un año826 visualizaciones
1 Respuesta
0

Those are folder markers, they are used because S3 doesn't allow creating empty folders without objects, you can just ignore them they shouldn't do any harm, it's very common to have them.
Once the corresponding directory has any files, they can be safely removed.

profile pictureAWS
EXPERTO
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas