1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
If writing a plain file is fast, I suspect the performance issue is with "partitionKeys=partition_cols_list", maybe those columns have too much granularity and force writing lots of tiny files. Also the counting on the converted DF might result on double processing.
Since you already have a DataFrame, the DataFrame writer is faster doing table partitioning, you can achieve the same (as long as you are not writing to an S3 Lakeformation location) doing something like this (haven't tested it):
inc_df.writeTo(f'{target_db}.{dataset_dict_obj.get("table_name")}').partitionedBy(partition_cols_list).createOrReplace()
Contenus pertinents
- demandé il y a un an
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans