1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
0
If writing a plain file is fast, I suspect the performance issue is with "partitionKeys=partition_cols_list", maybe those columns have too much granularity and force writing lots of tiny files. Also the counting on the converted DF might result on double processing.
Since you already have a DataFrame, the DataFrame writer is faster doing table partitioning, you can achieve the same (as long as you are not writing to an S3 Lakeformation location) doing something like this (haven't tested it):
inc_df.writeTo(f'{target_db}.{dataset_dict_obj.get("table_name")}').partitionedBy(partition_cols_list).createOrReplace()
Contenido relevante
- OFICIAL DE AWSActualizada hace 2 años
- OFICIAL DE AWSActualizada hace 3 años
- OFICIAL DE AWSActualizada hace 2 años