same glue job running differently when used sample method

1

I have a csv file with 5 million rows in my s3. I used crawler on the file. I have a custom transformations in my glue job. My issue is, if I use create_dynamic_frame_from_catalog(), it is running very slow, where as if I use create_sample_dynamic_frame_from_catalog() with max sample limit as 5 million, it is working much faster. Why it is happening. i want to speed up the job without using the sample method.

1回答
1

Hello, based on the documentation, what parameter did you configure for create_sample_dynamic_frame_from_catalog function ? num is the one that defines the maximum number of records to be fetched.

Overall, regarding the job performance, there are a couple of strategies, like changing WorkerType and NumberOfWorkers parameter on the Job.

This blog post is also handy: https://aws.amazon.com/blogs/big-data/best-practices-to-scale-apache-spark-jobs-and-partition-data-with-aws-glue/

AWS
回答済み 2年前
profile pictureAWS
エキスパート
Chris_G
レビュー済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ