Glue Metrics / Cloudwatch Alert to SNS (glue.driver.aggregate.elapsedTime)

0

Hello, within the context of AWS Glue we would like to implement a Cloudwatch alert if the glue.driver.aggregate.elapsedTime metric exceeds a threshold. We have enabled the metric in Glue, however, it appears that cloudwatch is not populating the metric until after the job has completed running. Is this normal behavior? Is there a way to set an alert on the long-running of a Glue job, without cancelling the job?

2 Answers
1

Hello,

I would like to inform AWS Glue profiles and sends the following metrics to CloudWatch every 30 seconds, and the AWS Glue Metrics Dashboard report them once a minute. So metrics start appear after one minute not after running the entire job. Hence you can use glue.driver.aggregate.elapsedTime metrics to to determine how long it takes a job run to run on average and enable notification using cloudwatch alarm and SNS.

Apart from this you can use Glue job inbuilt "Delay notification threshold" option in which If the job runs longer than the specified time Glue will send a delay notification via CloudWatch. [1][2]

--Reference:

[1] https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html

[2] https://docs.amazonaws.cn/en_us/glue/latest/dg/automating-awsglue-with-cloudwatch-events.html

AWS
answered 2 months ago
  • This is contrary to my experience. We have set our cloudwatch metric alarms (in terraform) with the following variations and none of them are triggering. The metrics appear after job completion. namespace = "Glue" metric_name = "glue.driver.aggregate.elapsedTime" comparison_operator = "GreaterThanThreshold" datapoints_to_alarm = "1" evaluation_periods = "1" period = "60" and "300" threshold = "1000" statistic = "Sum" and "Maximum"
    dimensions = { JobName = "my_glue_job_name" or blank Type = "count" or "gauge" Value = "ALL" or blank }

  • The glue job has configuration to: create_event_alarms = "true" glue_version = "3.0" worker_type = "G.2X" alarm_actions = [module.sns_code.topic_arn] ok_actions = [module.sns_code.topic_arn] insufficient_data_actions = [module.sns_code.topic_arn]

    And we are using the GlueContext: sc = SparkContext(conf=config) glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) Options: "--enable-continuous-cloudwatch-log" = "true" "--enable-metrics" = "true"

    Is the issue due to using Glue version 3.0?

  • Thanks for the suggestion on "Delay notification threshold" I will test this functionality today.

0

Thank you I tried and was able to verify that this "Delay notification threshold" option is working and was able to send an email via SNS using it. Much thanks!

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions