- Newest
- Most votes
- Most comments
It looks like you're using either the HuggingFace or PyTorch framework containers to deploy your model.
Tarball structure
In either case (as documented here for PyTorch, here for HF), your inference.py
code should be located in a code/
subfolder of your model tarball, but it sounded from the question like you might have it in the root?
If you're preparing your tarball by hand, check also that it correctly extracts to the current directory .
, and doesn't e.g. create a new subfolder model/model.pth
when you unzip it... For example I've sometimes created them with command like tar -czf ../model.tar.gz .
from inside the folder where I've prepared my artifacts.
Model format
Since you're already providing a custom model_fn
, you don't need to go to the effort of converting to a model.pth
if you don't want... For HuggingFace models I find it's easier to just use for example Trainer.save_model()
to the target folder and then at inference time you can directly:
model = DistilBertForSequenceClassification.from_pretrained(model_dir)
As shown in the linked example, I'd probably save the tokenizer in your tarball too, to avoid any hidden external dependencies.
Debugging
Your endpoint's CloudWatch logs are usually the best place to look for what's going wrong with deployments like this, but I know the default configuration can be a bit sparse...
I'd suggest setting the env={"PYTHONUNBUFFERED": "1"}
when you create your HuggingFaceModel
to disable Python log buffering and ensure that logs from a crashing thread/process actually get written to CloudWatch before the thread/process dies. If you're just directly going from the shortcut estimator.deploy()
, you'll need to change your code to create a Model first to be able to specify this parameter.
Deploying a SageMaker endpoint involves 3 API-side steps that the SDK makes a little non-obvious: Creating a "Model", an "Endpoint Configuration", and an "Endpoint". To make matters more confusing, creating an SDK e.g. HuggingFaceModel
doesn't actually create a SageMaker Model
yet because it doesn't have all the information yet (container URI is inferred from instance type, which the SDK only collects when you try to create a Transformer
or Predictor
). Be careful when re-trying different configurations to check (e.g. in the AWS Console for SageMaker) that your previous model and endpoint configuration are actually getting deleted and not re-used: Or you might feel like you're trying different things and seeing the same result.
Finally, you'll want to avoid waiting several minutes every time for an endpoint to deploy, to check your new configuration works. I'd recommend verifying your inference.py
functions work as you'd expect them to against an extracted local folder containing the contents of your model.tar.gz
... To test the whole stack in an environment where Docker is available, you can use instance_type='local'
- SageMaker Local Mode. If you're working on notebooks in SageMaker Studio, note that Local Mode didn't used to be supported, but now is! You just have to enable it and install docker on the Studio instance.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago