How to set up step functions to split Dataframe and process in EC2

0

Hi everyone, I'm a student doing research and part of that research requires me to download over 2 million files and analyse them, I have a python script that I can run and it does what I need it to do however, I would like to split the data frame into 256 chunks and run each chunk on a separate EC2 instance to download the files into an S3 bucket and then as a file hits that bucket I would like a second python script to run analysing the file. I know this can be done a number of ways but I'm hoping someone can help steer me in the right direction, from what little i have had to do with AWS I'm thinking something like a steps function could help to achieve this?

thanks in advance Tom

1개 답변
0

I would look into Lambda functions. You will have 2 buckets, one for the large files and one for the small files. One function will trigger from the first bucket, it will read the file and split it into multiple, smaller files, which it will save in the second bucket. The second function will be triggered from the second bucket and will run the analysis on the small files.

This is assuming that the large files can be loaded into a function (size wise) and that it takes less than 15 minutes to split a large file and less than 15 minutes to analyze a small file.

profile pictureAWS
전문가
Uri
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠