Do we always need the largest model for our use case? Larger models typically comes with higher cost to host the model or per token pricing. This article discuss a strategy that you can adopt to find the smallest possible model size for your requirements.
Introduction
Large language models (LLMs) come in a variety of sizes, with larger models requiring more powerful hardware and incurring higher costs. However, not all tasks require the largest models to be solved effectively. By choosing the smallest possible model that meets our requirements, we can optimize our costs and resources.
Benefits of using smaller models
- Efficiency: Smaller models are more efficient to train and deploy. This can save money on hardware costs and reduce the time it takes to get the model up and running.
- Accuracy: Smaller models can be just as accurate as larger models, especially for tasks that do not require a lot of complexity.
- Deployment: Smaller models can be deployed on less powerful hardware, such as mobile devices and embedded systems. This makes them ideal for applications that need to be portable or have limited resources.
How to choose the right sized model
Step 1. PoC with the larger models
Start with the larger models for Proof of Concept (PoC). This will help to determine early on if the task can be accomplished by the LLM. With a larger models, you often do not need to write very concise prompts to reach our goals. Hence saving lots of time experimenting to find out if you are able to achieve the desired results.
Step 2. Gradually reduce the model size
Keeping everything else the same, swap in a smaller model and evaluate the results. If you are able to consistently get the same results as before, you can continue swap in a smaller model until the results start to degrade. Once the results start to degrade, there are two things you can optimize to retain the quality of the results.
By following the steps outlined in this article, you can choose the right model size for your tasks and optimize effectiveness and cost.