Sagemaker hourly inference

0

I want to use multiple (possibly 100s) Sagemaker RCF models for regular inferences at 1 hour intervals.

The solution for real time inference is to create a multi-model endpoint which can house many models and then call this endpoint on demand. I don’t think this would be suitable for our use case since the endpoint would be unused for 59 minutes and then overloaded with requests for 1 minutes.

The other solution I have seen is batch transform but my understanding is that this is better suited for infrequent and large batch jobs.

The ideal solution would be for me to train and output the RCF model artifacts (i.e. .tar.gz files) and then pull and create the models on demand (within my own lambda or batch process) without having to use endpoints but I can’t find a way to run inference with the AWS RCF model without having to deploy endpoints first.

If anyone could suggest either a way around using endpoints or a way to set up and call these endpoints in an efficient way for hourly inference that would be greatly appreciated.

Many thanks

shaunak
已提問 9 個月前檢視次數 175 次
1 個回答
1

Hi,

given your use case, yes, batch transform jobs are the way to go: you accumulate your input, start the model, run the inferences and stop the model. Since you inferences are infrequent, it's important to stop the engine when you're done with current set of inferences to remain most frugal and cost-efficient.

Question: to further reduce your costs, can you infer less frequently than every hour? Let's say 4 times a day ?

To achieve this level of efficiency, it means that you should develop a fully automated MLOps pipeline: see https://github.com/aws-samples/amazon-sagemaker-safe-deployment-pipeline for a full example with code to implement the below

Enter image description here

Best,

Didier

profile pictureAWS
專家
已回答 9 個月前
  • Hi Didier, Thank you for your answer! Documentation doesn't seem to be that available for our particular use case, as per my understanding batch transforms don't inherently support multi-model-endpoints. What you think is a workaround to this would be? Best

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南