How to find the optimal model size for Large Language Models to optimize effectiveness and cost

3 minute read
Content level: Intermediate

Do we always need the largest model for our use case? Larger models typically comes with higher cost to host the model or per token pricing. This article discuss a strategy that you can adopt to find the smallest possible model size for your requirements.


Large language models (LLMs) come in a variety of sizes, with larger models requiring more powerful hardware and incurring higher costs. However, not all tasks require the largest models to be solved effectively. By choosing the smallest possible model that meets our requirements, we can optimize our costs and resources.

Benefits of using smaller models

  • Efficiency: Smaller models are more efficient to train and deploy. This can save money on hardware costs and reduce the time it takes to get the model up and running.
  • Accuracy: Smaller models can be just as accurate as larger models, especially for tasks that do not require a lot of complexity.
  • Deployment: Smaller models can be deployed on less powerful hardware, such as mobile devices and embedded systems. This makes them ideal for applications that need to be portable or have limited resources.

How to choose the right sized model

Step 1. PoC with the larger models

Start with the larger models for Proof of Concept (PoC). This will help to determine early on if the task can be accomplished by the LLM. With a larger models, you often do not need to write very concise prompts to reach our goals. Hence saving lots of time experimenting to find out if you are able to achieve the desired results.

Step 2. Gradually reduce the model size

Keeping everything else the same, swap in a smaller model and evaluate the results. If you are able to consistently get the same results as before, you can continue swap in a smaller model until the results start to degrade. Once the results start to degrade, there are two things you can optimize to retain the quality of the results.

  • Writing detailed and concise prompts. This involves crafting high-quality prompts or questions. A high quality and tailored prompt can better guide the LLM to generate output specific to your task. Here is an example of the impact of having good prompts. Initial Prompt The LLM failed to provide the total cost for a party of 20 people. Let's try modifying the prompt to see if we can get the desired results. Modified Prompt By adding 'for the party' and the end of the last sentence, now the LLM is able to make the correct calculation. To improve your prompts, take note to use complete sentences, avoid jargons, be as specific as possible.

  • Few-shot prompting. LLM are able to learn from the prompt during inference. This technique involves providing examples within the prompt to guide the LLM to complete the task. Here is an example of how it can be applied, and the impact on the results. Initial Prompt Without guiding examples, the LLM is not able to accurately calculate the final price of the milk. Let's add a guiding example within the prompt. few-shot prompting By adding relevant, varying, and possibly multiple examples to the prompt, the LLM is now able to accurately calculate the price of milk. This is because the LLM is able to learn from the examples and generalize to new situations.

By following the steps outlined in this article, you can choose the right model size for your tasks and optimize effectiveness and cost.

profile pictureAWS
published a year ago4223 views