This question is mostly for educational purposes, but the current SageMaker documentation does not describe whether these things are allowed or not.
Lets suppose I have:
- a
XGBoost_model_1
(that needs a XGBoost container
)
- a
KMeans_model_1
and a KMeans_model_2
(both require a KMeans container
)
1. Here's the first question - can I do the following:
- create a
Model
with InferenceExecutionConfig.Mode=Direct
and specify two cointainers (XGBoost
and KMeans
with Mode: MultiModel
)
That would enable the client:
- to call
invoke_endpoint(TargetContainer="XGBoost")
to access the XGBoost_model_1
- to call
invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_1")
to access the KMeans_model_1
- to call
invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_2")
to access the KMeans_model_2
I don't see a straight answer in the documentation whether combining Multi-Model containers with Multi-container endpoint is possible.
2. The second question - how does the above idea work with ProductionVariants
. Can I create something like this:
Variant1
with XGBoost
serving XGBoost_model_1
having a weight of 0.5
Variant2
with a Multi-container having both XGBoost
and KMeans
(with a MultiModel
setup) having a weight of 0.5
So that the client could:
- call
invoke_endpoint(TargetVariant="Variant2", TargetContainer="KMeans", TargetModel="KMeans_model_1")
to access the KMeans_model_1
- call
invoke_endpoint(TargetVariant="Variant2", TargetContainer="KMeans", TargetModel="KMeans_model_2")
to access the KMeans_model_2
- call
invoke_endpoint(TargetVariant="Variant1")
to access the XGBoost_model_1
- call
invoke_endpoint(TargetVariant="Variant2", TargetContainer="XGBoost")
to access the XGBoost_model_1
Is that combination even possible?
If so, what happens when the client calls the invoke_endpoint
without specifying the variant? For example:
- would
invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_2")
fail 50% of the time (if it hits the right variant then it works just fine, if it hits the wrong one it would most likely result with a 400/500 error ("incorrect payload")?