I'm doing some ML inference function with lambda. I am currently using PyTorch, PyTorch Geometric, and some saved models. The image size is ~7gb. I have increased my size limit for lambda fns to 10gb.
Using the exact commands provided in the ECR, I built and pushed my image to ECR. I also verified that the app.py handler code works and gives the expected output. So everything's building and running correctly on my location machine when I docker run
and docker exec
into my running container. great!
I am also able to successfully push to ECR. No complaints.
The problem arises when I try to make a lambda function that uses that image. Both are us-west-2, I think I have all the IAM roles correct (I've done this before for another ECR/Lambda pair that uses tensorflow). But when I try to create the function, it runs for <30s and then I see a "The provided image is invalid" error.
My questions are:
- Is the problem with my Dockerfile?
- If yes, what are the conditions necessary for a Docker image to build and run successfully locally, but not on lambda?
- Could it be an IAM role issue?
- Is there some reason that lambda doesn't support PyTorch Geometric?
- What does ""The provided image is invalid"" mean? How can I find out more about the cause of this error?
The only thing I can think is possibly the problem is that sometimes you need to install torch-scatter first. But then the docker image should fail to build.
Here's my Dockerfile:
FROM public.ecr.aws/lambda/python:3.8
COPY app.py ${LAMBDA_TASK_ROOT}
COPY . .
RUN python3 -m pip install -r requirements.txt --target ${LAMBDA_TASK_ROOT}
In the directory I have some saved pytorch geometric models, pytorch models, and data files. (csvs, tensors).
Thank you!