Glue Job for S3 Event

0

Hi,

I have created a Glue job to trigger it for S3 event. So I have below design

S3 Bucket ---> SQS ---> Lambda ---> Trigger Glue job from Lambda.

I am facing below error when multiple files are placed/uplaoded in S3 bucket. I understand this can be fixed by setting the "max concurrency" in the glue job parameters. But in production we are not sure file count. Do we have any option to handle the "max concurrency" at job runtime/dynamically ?

An error occurred (ConcurrentRunsExceededException) when calling the StartJobRun operation: Concurrent runs exceeded

Bharath
asked 20 days ago115 views
2 Answers
0
  • A good question as an SA, is what are you trying to do? Whats the glue job doing?
  • Could you not process the file directly using lambda with an S3 data event instead of using Glue? Likely it could be more efficient this way too
profile picture
EXPERT
answered 20 days ago
  • Thank you Gary Mclean for the suggestion.

    We use the Glue job to transfer files from source S3 bucket to target S3(if the files are compressed in source then decompress it while upload into target bucket.

    We thought of using lambda for this process, but as you know lambda has 15 minutes limitation so handling large volume of file may be challenging part.

0

Thanks for the comment. Do you expect 1 zip archive to take 15 minutes to extract?

An alternative which you can pass parameters too and I do this is to launch an ECS task and pass in your file location. You can use a docker image to extract the file and pass in a variable by using the cmdoverride parameter to pass in the file location.

Happy to share some boto3 python code and docker file config.

This would be very similar to a glue job but runs on an ECS cluster.

profile picture
EXPERT
answered 18 days ago
  • Thank you Gary for the input !

    Could you please share sample code

    I tried to process a 40 GB single file (uncompressed) using lambda and getting MemoryError whereas its processed successfully in Glue job

    obj=s3.get_object(bucketname,key) buffer=io.BytesIO(obj["Body"].read()).

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions