Sagemaker hourly inference

0

I want to use multiple (possibly 100s) Sagemaker RCF models for regular inferences at 1 hour intervals.

The solution for real time inference is to create a multi-model endpoint which can house many models and then call this endpoint on demand. I don’t think this would be suitable for our use case since the endpoint would be unused for 59 minutes and then overloaded with requests for 1 minutes.

The other solution I have seen is batch transform but my understanding is that this is better suited for infrequent and large batch jobs.

The ideal solution would be for me to train and output the RCF model artifacts (i.e. .tar.gz files) and then pull and create the models on demand (within my own lambda or batch process) without having to use endpoints but I can’t find a way to run inference with the AWS RCF model without having to deploy endpoints first.

If anyone could suggest either a way around using endpoints or a way to set up and call these endpoints in an efficient way for hourly inference that would be greatly appreciated.

Many thanks

shaunak
asked 8 months ago163 views
1 Answer
1

Hi,

given your use case, yes, batch transform jobs are the way to go: you accumulate your input, start the model, run the inferences and stop the model. Since you inferences are infrequent, it's important to stop the engine when you're done with current set of inferences to remain most frugal and cost-efficient.

Question: to further reduce your costs, can you infer less frequently than every hour? Let's say 4 times a day ?

To achieve this level of efficiency, it means that you should develop a fully automated MLOps pipeline: see https://github.com/aws-samples/amazon-sagemaker-safe-deployment-pipeline for a full example with code to implement the below

Enter image description here

Best,

Didier

profile pictureAWS
EXPERT
answered 8 months ago
  • Hi Didier, Thank you for your answer! Documentation doesn't seem to be that available for our particular use case, as per my understanding batch transforms don't inherently support multi-model-endpoints. What you think is a workaround to this would be? Best

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions