Glue Throttling Exception

0

We are using Step Functions for our ETL pipeline. The first step kicks off 21 jobs that each take about 1-3 minutes each consuming 2 DPUs. The Step Function fails with the below error when trying to run more than 15 Glue Jobs in parallel. We are using the arn:aws:states:::glue:startJobRun.sync task to invoke the jobs synchronously. Is there a quota I need to ask for an increase on? Kicking off 21 jobs in parallel seems pretty reasonable.

{ "resourceType": "glue", "resource": "startJobRun.sync", "error": "Glue.AWSGlueException", "cause": "Rate exceeded (Service: AWSGlue; Status Code: 400; Error Code: ThrottlingException; Proxy: null)" }

asked 2 months ago277 views
1 Answer
0

Hi,

You may want to read this KC article to minimize throttling: https://repost.aws/knowledge-center/glue-throttling-rate-exceeded

API Throttling is a "normal" situation when you make too many service requests in parallel: you just have to handle those exceptions with the standard technique of exponential backoff until they are all accepted. The Step Function is a good mechanism to implement such backoff

See https://en.wikipedia.org/wiki/Exponential_backoff

To be more specific to AWS, this page is interesting: https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

Mechanisms in Step Functions to handle such backoff strategy: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html

Best,

Didier

profile pictureAWS
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions