Best design choice for a custom serverless ML application


Hi, How would one go about designing a serverless ML application in AWS?

Currently, our project is using the serverless framework and lambda functions to accomplish this. The user makes a POST API call to the BE and is returned with a cache ID. The BE makes a caching request to the model (which is stored in a docker image in lambda) and saves the results in S3 with the corresponding cache ID. To get the results back, a GET API call is made with the corresponding cache ID until the results are returned: Essentially we implemented polling.

For the purposes of simplicity, when I use the term model I am referring to packaged code that may have inputs being transformed, API calls being made, an actual model making predictions or generating data, and outputs being transformed.

The problem we are running into right now is that the lambda function hosting the model could take 30-40 seconds to run; however, the caching lambda function won't receive the results for 2 to 3 minutes. This suggests to me that the lambda function hosting the model is running into a cold start problem as it is relatively large for lambda. (Would love to hear your thoughts on this)

The two solutions I was leaning toward were

  1. Use AWS fargate as it seems to be the serverless equivalent for long-running lambda functions. This would play well with the serverless framework as it already has a plug-in for Fargate. However, the problem with Fargate is that it does not seem like you can pass long-form information like you can to the lambda event handler. ECS.Client.run_task() doesnt have a payload option besides the override attribute which only allows you to change environment variables that have a limit on the size you can store. For reference, we are passing file size information. To overcome this it seems like I would have to combine the caching function (the caching function essentially makes GET and PUT requests to S3) and the model into one which would make things harder for debugging as separating the two makes debugging a lot easier and simpler.
  2. Use Sagemaker as it is built for long-term machine learning development. However, the main concern with Sagemaker is it doesnt seem to be meant for serverless applications and does not currently work with the serverless framework. Also, I have tried following steps from here and here to build a custom model hosted in ECR and deployed using sagemaker; however, whenever I make calls to it I get a EndpointConnectionError: Could not connect to the endpoint URL: " even though it says the endpoint is IN_SERVICE. (Help here would also be great)

Also, the reason why having it work with serverless would be nice here is because we have a production and development live service for easier debugging. This works well with the serverless framework as you can easily specify stages.

Thank you for reading this far. Any information, links, or posts would be extremely helpful. Also, this is my first post so please be nice!

1 Answer
Accepted Answer


Sagemaker endpoints are well suited to what you try to achieve in serverless archiecture : you can call such endpoints from a Lambda triggered from API gateway.

This post will give you full details about such an architecture:



profile pictureAWS
answered 8 months ago
  • Thank you for your advice!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions