1 個回答
- 最新
- 最多得票
- 最多評論
0
If writing a plain file is fast, I suspect the performance issue is with "partitionKeys=partition_cols_list", maybe those columns have too much granularity and force writing lots of tiny files. Also the counting on the converted DF might result on double processing.
Since you already have a DataFrame, the DataFrame writer is faster doing table partitioning, you can achieve the same (as long as you are not writing to an S3 Lakeformation location) doing something like this (haven't tested it):
inc_df.writeTo(f'{target_db}.{dataset_dict_obj.get("table_name")}').partitionedBy(partition_cols_list).createOrReplace()
相關內容
- AWS 官方已更新 3 年前