2 Answers
- Newest
- Most votes
- Most comments
0
- A good question as an SA, is what are you trying to do? Whats the glue job doing?
- Could you not process the file directly using lambda with an S3 data event instead of using Glue? Likely it could be more efficient this way too
0
Thanks for the comment. Do you expect 1 zip archive to take 15 minutes to extract?
An alternative which you can pass parameters too and I do this is to launch an ECS task and pass in your file location. You can use a docker image to extract the file and pass in a variable by using the cmdoverride parameter to pass in the file location.
Happy to share some boto3 python code and docker file config.
This would be very similar to a glue job but runs on an ECS cluster.
Thank you Gary for the input !
Could you please share sample code
I tried to process a 40 GB single file (uncompressed) using lambda and getting MemoryError whereas its processed successfully in Glue job
obj=s3.get_object(bucketname,key) buffer=io.BytesIO(obj["Body"].read()).
Relevant content
- asked 7 months ago
- AWS OFFICIALUpdated a year ago
- How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
Thank you Gary Mclean for the suggestion.
We use the Glue job to transfer files from source S3 bucket to target S3(if the files are compressed in source then decompress it while upload into target bucket.
We thought of using lambda for this process, but as you know lambda has 15 minutes limitation so handling large volume of file may be challenging part.