Storing output of AWS Batch job in file

0

Hi everyone,

sorry for the basic questions but I haven't been able to find any answer online yet.

I am regularly running time-consuming data analytics jobs on AWS Batch. Each job is essentially a docker image of Python files plus a range of data inputs, which I am pushing to ECR and then running the job on AWS Batch. However, I haven't found a proper way to retrieve the output of the jobs, which is why I am simply "printing" the data I need and then reading it from the job log on CloudWatch. But that cannot be the proper way. So:

What is the best and simplest way to get AWS Batch to store my outputs in a txt, csv or even pickle file? And how do I then retrieve these files from AWS?

Thanks a lot in advance. And if I am using the wrong service altogether for this task, do also let me know!

1 Answer
2

Hi JRK - I would recommend saving your output data to s3, then you can download them at will at any time in the future. There are multiple storage options (read: cheaper), and lifecycle options so you auto-delete them after X amount of days.

If you are new to AWS here are some basic steps:

  1. create s3 bucket. https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html

  2. If you don't have an IAM role already for your containers, this link shows you how to do so. https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html 2a) if you already have a role for your containers, add permissions to the role to enable them to access your s3 bucket https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_rw-bucket.html Note the example in 2a is a bit permissive for many company environments as it gives * access. You can start with "PutObject", that's the bare minimum to upload data to s3.

  3. The next step is to gather your data and upload it, this will depend on your scripting/programming of your job. It could be as simple as "aws s3 cp filename.txt s3://mybucket/folder/" if you use the CLI Documentation link to upload to s3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html CLI installers https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

  4. Once uploaded you can go to the console and browse the data, use the CLI (on your laptop) to view (use "aws s3 ls s3://mybucket"), or SDK if you desire.

Notes: a) for low quantity text data s3 will be very inexpensive, you might even be in the free tier. Standard storage, or intelligent tiering are good options when uploading https://aws.amazon.com/s3/storage-classes/ b) Lifecycle rules can help you manage your data, say autodelete after 90 days, or change to a different (cheaper) storage tier. https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html

AWS
answered a month ago
profile picture
EXPERT
reviewed a month ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions