Glue Metrics / Cloudwatch Alert to SNS (glue.driver.aggregate.elapsedTime)

0

Hello, within the context of AWS Glue we would like to implement a Cloudwatch alert if the glue.driver.aggregate.elapsedTime metric exceeds a threshold. We have enabled the metric in Glue, however, it appears that cloudwatch is not populating the metric until after the job has completed running. Is this normal behavior? Is there a way to set an alert on the long-running of a Glue job, without cancelling the job?

2 個答案
1

Hello,

I would like to inform AWS Glue profiles and sends the following metrics to CloudWatch every 30 seconds, and the AWS Glue Metrics Dashboard report them once a minute. So metrics start appear after one minute not after running the entire job. Hence you can use glue.driver.aggregate.elapsedTime metrics to to determine how long it takes a job run to run on average and enable notification using cloudwatch alarm and SNS.

Apart from this you can use Glue job inbuilt "Delay notification threshold" option in which If the job runs longer than the specified time Glue will send a delay notification via CloudWatch. [1][2]

--Reference:

[1] https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html

[2] https://docs.amazonaws.cn/en_us/glue/latest/dg/automating-awsglue-with-cloudwatch-events.html

AWS
已回答 1 年前
  • This is contrary to my experience. We have set our cloudwatch metric alarms (in terraform) with the following variations and none of them are triggering. The metrics appear after job completion. namespace = "Glue" metric_name = "glue.driver.aggregate.elapsedTime" comparison_operator = "GreaterThanThreshold" datapoints_to_alarm = "1" evaluation_periods = "1" period = "60" and "300" threshold = "1000" statistic = "Sum" and "Maximum"
    dimensions = { JobName = "my_glue_job_name" or blank Type = "count" or "gauge" Value = "ALL" or blank }

  • The glue job has configuration to: create_event_alarms = "true" glue_version = "3.0" worker_type = "G.2X" alarm_actions = [module.sns_code.topic_arn] ok_actions = [module.sns_code.topic_arn] insufficient_data_actions = [module.sns_code.topic_arn]

    And we are using the GlueContext: sc = SparkContext(conf=config) glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) Options: "--enable-continuous-cloudwatch-log" = "true" "--enable-metrics" = "true"

    Is the issue due to using Glue version 3.0?

  • Thanks for the suggestion on "Delay notification threshold" I will test this functionality today.

0

Thank you I tried and was able to verify that this "Delay notification threshold" option is working and was able to send an email via SNS using it. Much thanks!

已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南