Sagemaker hourly inference

0

I want to use multiple (possibly 100s) Sagemaker RCF models for regular inferences at 1 hour intervals.

The solution for real time inference is to create a multi-model endpoint which can house many models and then call this endpoint on demand. I don’t think this would be suitable for our use case since the endpoint would be unused for 59 minutes and then overloaded with requests for 1 minutes.

The other solution I have seen is batch transform but my understanding is that this is better suited for infrequent and large batch jobs.

The ideal solution would be for me to train and output the RCF model artifacts (i.e. .tar.gz files) and then pull and create the models on demand (within my own lambda or batch process) without having to use endpoints but I can’t find a way to run inference with the AWS RCF model without having to deploy endpoints first.

If anyone could suggest either a way around using endpoints or a way to set up and call these endpoints in an efficient way for hourly inference that would be greatly appreciated.

Many thanks

shaunak
質問済み 9ヶ月前175ビュー
1回答
1

Hi,

given your use case, yes, batch transform jobs are the way to go: you accumulate your input, start the model, run the inferences and stop the model. Since you inferences are infrequent, it's important to stop the engine when you're done with current set of inferences to remain most frugal and cost-efficient.

Question: to further reduce your costs, can you infer less frequently than every hour? Let's say 4 times a day ?

To achieve this level of efficiency, it means that you should develop a fully automated MLOps pipeline: see https://github.com/aws-samples/amazon-sagemaker-safe-deployment-pipeline for a full example with code to implement the below

Enter image description here

Best,

Didier

profile pictureAWS
エキスパート
回答済み 9ヶ月前
  • Hi Didier, Thank you for your answer! Documentation doesn't seem to be that available for our particular use case, as per my understanding batch transforms don't inherently support multi-model-endpoints. What you think is a workaround to this would be? Best

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ