Quick first steps to find out if Inferentia or Trainium is an option for you.
The AWS Neuron SDK (which is used on AWS Inferentia and AWS Trainium instances) provides support for specific model architectures. You can see a list of those models that are known to be supported here:
Neuron SDK documentation
Hugging Face Optimum Neuron documentation
However, there are a lot of other models that will run because they share the same underlying architecture as one of the supported models.
Hugging Face does a great job of tracking the attributes of various models, and they know which ones will work on Neuron. If it is a model that should work, they will give you instructions on how to deploy it on Amazon SageMaker using Inferentia or Trainium. If it works on Inferentia or Trainium using SageMaker, it will also work using EC2, EKS, and ECS.
You can see this under the deploy dropdown in the upper right corner of the model card. Click on Amazon SageMaker and then click on AWS Inferentia and Trainium (screen shots below). If you see instructions, you should be good to go! If you see “The model is not yet cached on Hugging Face. If you are interested in it, please request support or try to compile the model yourself using Optimum Neuron.”, then Hugging Face doesn’t know for sure that it will work.
If you see the message that it is not cached, or if your model didn’t have a deploy option, or if your model isn’t on Hugging Face, you may still be able to run it! You can click on the “Request Cache” button in Hugging Face, you can start researching in the SDK, you can post questions here on re:Post, or you can reach out to your AWS account team!