Glue Metrics / Cloudwatch Alert to SNS (glue.driver.aggregate.elapsedTime)

0

Hello, within the context of AWS Glue we would like to implement a Cloudwatch alert if the glue.driver.aggregate.elapsedTime metric exceeds a threshold. We have enabled the metric in Glue, however, it appears that cloudwatch is not populating the metric until after the job has completed running. Is this normal behavior? Is there a way to set an alert on the long-running of a Glue job, without cancelling the job?

2개 답변
1

Hello,

I would like to inform AWS Glue profiles and sends the following metrics to CloudWatch every 30 seconds, and the AWS Glue Metrics Dashboard report them once a minute. So metrics start appear after one minute not after running the entire job. Hence you can use glue.driver.aggregate.elapsedTime metrics to to determine how long it takes a job run to run on average and enable notification using cloudwatch alarm and SNS.

Apart from this you can use Glue job inbuilt "Delay notification threshold" option in which If the job runs longer than the specified time Glue will send a delay notification via CloudWatch. [1][2]

--Reference:

[1] https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html

[2] https://docs.amazonaws.cn/en_us/glue/latest/dg/automating-awsglue-with-cloudwatch-events.html

AWS
답변함 일 년 전
  • This is contrary to my experience. We have set our cloudwatch metric alarms (in terraform) with the following variations and none of them are triggering. The metrics appear after job completion. namespace = "Glue" metric_name = "glue.driver.aggregate.elapsedTime" comparison_operator = "GreaterThanThreshold" datapoints_to_alarm = "1" evaluation_periods = "1" period = "60" and "300" threshold = "1000" statistic = "Sum" and "Maximum"
    dimensions = { JobName = "my_glue_job_name" or blank Type = "count" or "gauge" Value = "ALL" or blank }

  • The glue job has configuration to: create_event_alarms = "true" glue_version = "3.0" worker_type = "G.2X" alarm_actions = [module.sns_code.topic_arn] ok_actions = [module.sns_code.topic_arn] insufficient_data_actions = [module.sns_code.topic_arn]

    And we are using the GlueContext: sc = SparkContext(conf=config) glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) Options: "--enable-continuous-cloudwatch-log" = "true" "--enable-metrics" = "true"

    Is the issue due to using Glue version 3.0?

  • Thanks for the suggestion on "Delay notification threshold" I will test this functionality today.

0

Thank you I tried and was able to verify that this "Delay notification threshold" option is working and was able to send an email via SNS using it. Much thanks!

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠