- Newest
- Most votes
- Most comments
I have checked into the code from the GitHub repo [1] and compared the code against the one you provided. In your code, you create a model then deploy this model into an Endpoint. Thereafter, you try to add another model to the Endpoint using inference components. However, in the GitHub repo, they launch an Endpoint without a model (an empty Endpoint ) and then they create the model after then add this model to the Endpoint as inference components.
The reason behind the error you were getting is because when you deployed your Endpoint, you already added a model to the Endpoint which occupies all the CPU/GPUs on your instance, thus leaving no room for another model all CPU/GPUs are occupied.
You need to deploy an empty Endpoint as per this GitHub repo [1] and then add the models as inference components on your instance, specifying how the much CPU/GPUs the model can occupy.
You can also use the provided GitHub repo as an guideline and also this blog [2].
Relevant content
- Accepted Answerasked 8 months ago
- Accepted Answerasked 5 months ago
- asked 2 years ago
- asked 8 days ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 2 years ago
Thanks! this was indeed the problem :)
Don't know who's in charge of documenting this, but adding this info to the CLI documentation page would be quite helpful too. It currently makes no mention of the distinction between inference component enabled endpoints or otherwise.