1개 답변
- 최신
- 최다 투표
- 가장 많은 댓글
0
If writing a plain file is fast, I suspect the performance issue is with "partitionKeys=partition_cols_list", maybe those columns have too much granularity and force writing lots of tiny files. Also the counting on the converted DF might result on double processing.
Since you already have a DataFrame, the DataFrame writer is faster doing table partitioning, you can achieve the same (as long as you are not writing to an S3 Lakeformation location) doing something like this (haven't tested it):
inc_df.writeTo(f'{target_db}.{dataset_dict_obj.get("table_name")}').partitionedBy(partition_cols_list).createOrReplace()
관련 콘텐츠
- AWS 공식업데이트됨 3년 전