same glue job running differently when used sample method

1

I have a csv file with 5 million rows in my s3. I used crawler on the file. I have a custom transformations in my glue job. My issue is, if I use create_dynamic_frame_from_catalog(), it is running very slow, where as if I use create_sample_dynamic_frame_from_catalog() with max sample limit as 5 million, it is working much faster. Why it is happening. i want to speed up the job without using the sample method.

1 Antwort
1

Hello, based on the documentation, what parameter did you configure for create_sample_dynamic_frame_from_catalog function ? num is the one that defines the maximum number of records to be fetched.

Overall, regarding the job performance, there are a couple of strategies, like changing WorkerType and NumberOfWorkers parameter on the Job.

This blog post is also handy: https://aws.amazon.com/blogs/big-data/best-practices-to-scale-apache-spark-jobs-and-partition-data-with-aws-glue/

AWS
beantwortet vor 2 Jahren
profile pictureAWS
EXPERTE
Chris_G
überprüft vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen