- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hi thanks for reaching out! Let me try to help you with this.
Inferentia is the recommended instance type for inference on AWS and you can use it with Sagemaker and model auto scaling[2], but as you said, you'd need to have at least one instance running all the time, unless you do endpoint lifecycle, what would take take more than 10 seconds to bring the endpoint back when you need it (more likely a few minutes).
If you use Api Gateway + Lambda to call the async Sagemaker Endpoint, it would also take minutes for the job to be completed and you would need to send the results back to the customer using another method and it's not what you are looking for.
So I see the below options based on your use case:
-
If you have a custom model (and really needs a custom model) you can go with Sagemaker + Inferentia and use the most effective instance size with auto scaling to scale out if needed (but you'd need at least one instance running at all times).
-
If you can use a foundation model (which can be fine-tuned), you can have a look at Amazon Bedrock[3], which is a managed service where you pay as you go for inferences/api requests and supports several foundation models.
I hope this helps.
[1] - https://aws.amazon.com/machine-learning/inferentia/
[2] - https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html
Contenuto pertinente
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
Hi Steve, thank you really much for your answer!
It's sobering to hear that the approach as I imagined is not possible.
Today I read about warm pools, where the instances can be left pre-initialized in a stopped state so that they start faster. Wouldn't this option be conceivable for my use case or isn't this suitable for asynchronous endpoints?
The most unfavorable thing is, that I really don't have many requests in the beginning. So it is strictly not worth to host a gpu for the whole time. A serverless soltion would be nice but I am depenend of the gpu..
Best regards