Sagemaker hourly inference

Question

I want to use multiple (possibly 100s) Sagemaker RCF models for regular inferences at 1 hour intervals.

The solution for real time inference is to create a multi-model endpoint which can house many models and then call this endpoint on demand. I don’t think this would be suitable for our use case since the endpoint would be unused for 59 minutes and then overloaded with requests for 1 minutes.

The other solution I have seen is batch transform but my understanding is that this is better suited for infrequent and large batch jobs.

The ideal solution would be for me to train and output the RCF model artifacts (i.e. .tar.gz files) and then pull and create the models on demand (within my own lambda or batch process) without having to use endpoints but I can’t find a way to run inference with the AWS RCF model without having to deploy endpoints first.

If anyone could suggest either a way around using endpoints or a way to set up and call these endpoints in an efficient way for hourly inference that would be greatly appreciated.

Many thanks

Answer

Hi,

given your use case, yes, batch transform jobs are the way to go: you accumulate your input, start the model, run the inferences and stop the model. Since you inferences are infrequent, it's important to stop the engine when you're done with current set of inferences to remain most frugal and cost-efficient.

Question: to further reduce your costs, can you infer less frequently than every hour? Let's say 4 times a day ?

To achieve this level of efficiency, it means that you should develop a fully automated MLOps pipeline: see https://github.com/aws-samples/amazon-sagemaker-safe-deployment-pipeline for a full example with code to implement the below

![Enter image description here](/media/postImages/original/IMwdXL0YZ9RhibMNNi_DHm8A)

Best,

Didier

Sagemaker hourly inference

Relevant content