Glue Jobs - G2 worker type - executor not allocating!

0

Hi, I am using G2x type worker, during data write operation glue job allocating only 5 executors and not allocating more executors which makes process getting so long. How to resolve this issue.

asked a year ago461 views
3 Answers
0

Hello, AWS Glue has a great feature that you can take advantage of, it will optimize costs and resource usage by dynamically scaling the number of workers up and down throughout the job run. Please Note: Auto Scaling is available for your AWS Glue ETL and streaming jobs with AWS Glue version 3.0 or later.

This means over a job run, when more executors are requested by your Spark application, more workers will be added to the cluster allowing the process being done to be faster. When the executor has been idle without active computation tasks, the executor and the corresponding worker will be removed.

Auto scaling feature is just below "Worker Type" option on Job Detail page. https://docs.aws.amazon.com/glue/latest/dg/auto-scaling.html

profile pictureAWS
Muzi_M
answered a year ago
  • This is a great explanation of the autoscaling feature, and I think it is important to note may help cost optimizing jobs that like in the case of the question cannot scale out and leave executors idles.

    Need more details on check to be done to understand why the current job would not scale over 5 executor during the write phase.

0

Hi,

there might be many different reasons why the job is not allocating more executors, mostly reducing to the fact that your job is going write only a limited number of files and for this only 5 executors are needed, a couple of example why this is happening could be:

  1. Number of executors depends on the number of workers you have defined for the job, G2 nodes have 1 executor per worker, you might not have enough workers ( hypotetical, because you did not specify the configuration of the job)
  2. there is data skew and only a limited number of partitions are created so to write those partition only 5 executors are needed by spark
  3. there might be a repartition or coalesce operation to limit the number of files that needs to be write out

if the job is actively repartitioning the data then you might want to increase the number of partitions; here you will have a tradeoff between faster performance of writes and size of the files.

depending on the AWS Glue job version (v2.0 / v3.0 / v 4.0) and thus the spark version, there might be different techniques to salt the data in case of data skew.

Looking at the Spark history server logs , may help troubleshoot more the reasons why this is happening. Some resource on Spark history Server below: AWS Glue Documentation: Monitoring jobs using the Apache Spark web UI AWS Glue Workshop: Enabling Spark UI and Monitoring

hope this helps

AWS
EXPERT
answered a year ago
  • We are running the job with 50 G2.x worker nodes , basically the data would be 1250 partitions and total size 120 GB data, likely 5 billion rows.

    what i observed, initial data reading from catalog consumes 42/50 ips from subnet however during writing the data to s3 , the max executor consumption between 5-10 only. we enabled the auto scaling, from the job metrics its says required executor equal to no of partitions 1250, but only between 5-10 are active executors . as per the the available worker nodes looks 40 idle. There are no repartition/ coalesce operation when the writing data. we use glue 3.0 version.

0

Thanks for your inputs, i have added salting and added --config on glue job parameters ex. "--conf": "spark.executor.memory=28g", "--conf": "spark.driver.memory=26g", i have enabled the continuous logs , as below details i found .. why these failure logs are happening...

WARN ExecutorTaskManagement: executor status Success(FAILED) for 48 g-ba0411327eea7bbe2bc63a063ca1de8b7d166064

INFO TaskGroupInterface: getting status for executor task g-35aa087036b77f0b0b89c45368add14850cefcbb

WARN ExecutorTaskManagement: executor status Success(FAILED) for 27 g-35aa087036b77f0b0b89c45368add14850cefcbb

INFO TaskGroupInterface: getting status for executor task g-dcd8e66de97f52d19293f5caa5328ca1c39f2ac2

WARN ExecutorTaskManagement: executor status Success(FAILED) for 33 g-dcd8e66de97f52d19293f5caa5328ca1c39f2ac2

INFO TaskGroupInterface: getting status for executor task g-4fce6339dba7ec32535a30f5a9525005f0c25a28

WARN ExecutorTaskManagement: executor status Success(FAILED) for 12 g-4fce6339dba7ec32535a30f5a9525005f0c25a28

INFO TaskGroupInterface: getting status for executor task g-83a793602062e370813da0bc9c4f7813312dab07

WARN ExecutorTaskManagement: executor status Success(FAILED) for 15 g-83a793602062e370813da0bc9c4f7813312dab07

INFO TaskGroupInterface: getting status for executor task g-9839e7940677dda3c42efc4654452a422639709f

WARN ExecutorTaskManagement: executor status Success(FAILED) for 8 g-9839e7940677dda3c42efc4654452a422639709f

INFO TaskGroupInterface: getting status for executor task g-16e4047fe0b73994937621a3e830b0ceae8c3a07

WARN ExecutorTaskManagement: executor status Success(FAILED) for 42 g-16e4047fe0b73994937621a3e830b0ceae8c3a07

INFO TaskGroupInterface: getting status for executor task g-0bf4442d874990951edc71a0d569cfbc8d3dff5f

WARN ExecutorTaskManagement: executor status Success(FAILED) for 36 g-0bf4442d874990951edc71a0d569cfbc8d3dff5f

INFO TaskGroupInterface: getting status for executor task g-e76afa0b7974c1ecb0870b052af7135fed1878d3

WARN ExecutorTaskManagement: executor status Success(FAILED) for 21 g-e76afa0b7974c1ecb0870b052af7135fed1878d3

INFO TaskGroupInterface: getting status for executor task g-7369001b1d0764133377e477ea5f9fb8e416715d

WARN ExecutorTaskManagement: executor status Success(FAILED) for 18 g-7369001b1d0764133377e477ea5f9fb8e416715d

INFO TaskGroupInterface: getting status for executor task g-c76a64979a1b31f2f3c1dbb752b62914b6468f51

WARN ExecutorTaskManagement: executor status Success(FAILED) for 24 g-c76a64979a1b31f2f3c1dbb752b62914b6468f51

INFO TaskGroupInterface: getting status for executor task g-f76484471d4ca8fa8048e1de3e1924371bb30cdd

WARN ExecutorTaskManagement: executor status Success(FAILED) for 35 g-f76484471d4ca8fa8048e1de3e1924371bb30cdd

INFO TaskGroupInterface: getting status for executor task g-51b7f66b831d9f65237d9b0e55de0e72d6e729bd

WARN ExecutorTaskManagement: executor status Success(FAILED) for 41 g-51b7f66b831d9f65237d9b0e55de0e72d6e729bd

INFO TaskGroupInterface: getting status for executor task g-40da826f314851e71d3b9e812da4982b00114a19

WARN ExecutorTaskManagement: executor status Success(FAILED) for 7 g-40da826f314851e71d3b9e812da4982b00114a19

INFO TaskGroupInterface: getting status for executor task g-fe46afa0be1ebfb5f75a92bae9ec49ad4d860f7f

WARN ExecutorTaskManagement: executor status Success(FAILED) for 17 g-fe46afa0be1ebfb5f75a92bae9ec49ad4d860f7f

INFO TaskGroupInterface: getting status for executor task g-91ed08b48433369880860643b88de2d75a1de509

WARN ExecutorTaskManagement: executor status Success(FAILED) for 44 g-91ed08b48433369880860643b88de2d75a1de509

INFO TaskGroupInterface: getting status for executor task g-e66dcec9a3608c4c2e8feb78db6d0981200a7706

WARN ExecutorTaskManagement: executor status Success(FAILED) for 23 g-e66dcec9a3608c4c2e8feb78db6d0981200a7706

INFO TaskGroupInterface: getting status for executor task g-6c924ff6f6c421447de5d37850da61730ed8636d

WARN ExecutorTaskManagement: executor status Success(FAILED) for 38 g-6c924ff6f6c421447de5d37850da61730ed8636d

INFO TaskGroupInterface: getting status for executor task g-a5158ab1b4b9f4490c2b513e27b69900c99b61ef

WARN ExecutorTaskManagement: executor status Success(FAILED) for 47 g-a5158ab1b4b9f4490c2b513e27b69900c99b61ef

INFO TaskGroupInterface: getting status for executor task g-3a9aa260fccbd23980d17a1b4af790c6d6c960ba

WARN ExecutorTaskManagement: executor status Success(FAILED) for 26 g-3a9aa260fccbd23980d17a1b4af790c6d6c960ba

INFO TaskGroupInterface: getting status for executor task g-3c3272f213f8b2b480069bbda405e86beb0895d7

WARN ExecutorTaskManagement: executor status Success(FAILED) for 11 g-3c3272f213f8b2b480069bbda405e86beb0895d7

INFO TaskGroupInterface: getting status for executor task g-38cbb4ae9009e9ec921540567e74170496986fb2

WARN ExecutorTaskManagement: executor status Success(FAILED) for 32 g-38cbb4ae9009e9ec921540567e74170496986fb2

INFO TaskGroupInterface: getting status for executor task g-ee91ed94060467a290da4eeaaac5e07a7c9bbb04

WARN ExecutorTaskManagement: executor status Success(FAILED) for 14 g-ee91ed94060467a290da4eeaaac5e07a7c9bbb04

INFO TaskGroupInterface: getting status for executor task g-89ae7c53fc8f7ed049a097e6982808b56690004e

WARN ExecutorTaskManagement: executor status Success(FAILED) for 29 g-89ae7c53fc8f7ed049a097e6982808b56690004e

INFO TaskGroupInterface: getting status for executor task g-cf97b33b99d5c80417373fa7f4e06ca39226fe78

WARN ExecutorTaskManagement: executor status Success(FAILED) for 20 g-cf97b33b99d5c80417373fa7f4e06ca39226fe78

INFO TaskGroupInterface: getting status for executor task g-f6e35d669143a6570c44b1d5dfad5cee39c3d5aa

WARN ExecutorTaskManagement: executor status Success(FAILED) for 46 g-f6e35d669143a6570c44b1d5dfad5cee39c3d5aa

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions