Glue Job for S3 Event

Question

Hi,

I have created a Glue job to trigger it for S3 event. So I have below design

S3 Bucket ---> SQS ---> Lambda ---> Trigger Glue job from Lambda.

I am facing below error when multiple files are placed/uplaoded in S3 bucket. I understand this can be fixed by setting the "max concurrency" in the glue job parameters. But in production we are not sure file count. Do we have any option to handle the "max concurrency" at job runtime/dynamically ?

An error occurred (ConcurrentRunsExceededException) when calling the StartJobRun operation: Concurrent runs exceeded

Answer

Thanks for the comment. 
Do you expect 1 zip archive to take 15 minutes to extract?

An alternative which you can pass parameters too and I do this is to launch an ECS task and pass in your file location. 
You can use a docker image to extract the file and pass in a variable by using the cmdoverride parameter to pass in the file location.

Happy to share some boto3 python code and docker file config.

This would be very similar to a glue job but runs on an ECS  cluster.

Answer

* A good question as an SA, is what are you trying to do? Whats the glue job doing?
* Could you not process the file directly using lambda with an S3 data event instead of using Glue? Likely it could be more efficient this way too

Glue Job for S3 Event

Relevant content