Metric log error - ThroughputMetricsSource: Metric: is already registered by a different accumulator

0

Hi, When running the same job concurrently, I see the below error in the logs, is there a way to resolve this error?

ThroughputMetricsSource: Metric: s3://<my-bucket>/orchestration/logs.recordsWritten is already registered by a different accumulator. Retrying with suffix #1 java.lang.IllegalArgumentException: A metric named s3://<my-bucket>/orchestration/logs.recordsWritten already exists.

AWS
asked 2 months ago210 views
1 Answer
1

In AWS Glue, this error can occur when multiple Glue jobs are running concurrently and attempting to register the same metric name for their Spark accumulators. Accumulators in Spark are variables that are used to aggregate information across tasks. In the context of AWS Glue, which is built on top of Apache Spark, these accumulators might be used for metrics like tracking the number of records written to an S3 bucket.

The error message you're seeing:

ThroughputMetricsSource: Metric: s3://<my-bucket>/orchestration/logs.recordsWritten is already registered by a different accumulator. Retrying with suffix #1 java.lang.IllegalArgumentException: A metric named s3://<my-bucket>/orchestration/logs.recordsWritten already exists.

indicates that the metric named s3://<my-bucket>/orchestration/logs.recordsWritten is being registered more than once, which is not allowed. This can happen when multiple Glue jobs are using the same metric name simultaneously.

To resolve this issue, you need to ensure that each Glue job uses a unique name for its metrics.

profile picture
EXPERT
answered 2 months ago
  • Thanks for your response, is there a way to resolve this using Glue Studio visual instead of scripting? All my job properties including concurrency is set in job details tab and the job itself is called from Step Functions

  • I think you can resolve the issue by modifying your Glue jobs to write output to unique S3 paths for each concurrent run, by incorporating dynamic elements like job run IDs or timestamps into the S3 output paths. But, using Glue Studio's visual interface alone, you cannot directly configure dynamic elements like job run IDs or timestamps into the S3 output paths. This functionality would typically require scripting or passing dynamic parameters to your job.

    In Glue Studio, you can set job parameters and use them in your job script, but the generation of dynamic elements like timestamps would need to be handled within the script itself, rather than through the visual interface.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions