In the evolution of computing, there was a big shift when cloud services came along. This provided an opportune environment of the desired scale for experimentation and research, particularly in the realm of AI and machine learning models. Amidst this surge, a significant breakthrough emerged in terms of processing and learning from large amounts of digital footprint and the emergence of Large Language Models (LLMs). This was facilitated by not just compute shift, but, also by advances in architectures and algorithms. One of the major turning points was the introduction of the transformer architecture, as published in the paper “Attention Is All You Need,” which revolutionised the processing capabilities of language models.

Since then, many implementations of LLMs have appeared on different platforms, like the one made by OpenAI called ChatGPT. Prior to this, companies like Microsoft and Google had been working on various language models, making important incremental progress. Microsoft made Turing models, and Google worked on PaLM and LaMDA models (the predecessor of Gemini).

These LLMs have grown in terms of number of parameters they can handle and the huge datasets that they consume. For example, GPT 3 had around 175 billion parameters. It amounts to at times taking a few days/months to train these models and get them ready. This incurs huge cost. However, there are localised scenarios where we need to process smaller datasets and could do with a smaller number of parameters while giving acceptable results. This is what can be handled by Small Language Models (SLMs) e.g. the likes of Phi-2(Microsoft, 2.7B), Gemini-Nano (Google, 1.8B), Llama-2 variants (Meta, 7B & 13B), Zephyr(7B), etc.

Consider a use case where you need a general-purpose chatbot (like ChatGPT) versus a company’s internal chatbot to answer queries on internal policies like leaves etc., the former needs LLM whereas the latter can possibly be well catered by an SLM.

SLMs gain momentum for targeted performance and reduced costs

As Generative AI gains traction in enterprises, companies are targeting domain-specific use cases. Additionally, they are vary of comprehensive regulatory policies, such as the AI Act in Europe, which are anticipated to guide AI project deployments with a focus on ethical considerations like real-time biometric analysis restrictions in public spaces. The growing use of synthetic data for democratising access raises concerns about transparency and quality control, highlighting the importance of refining data generation techniques as the technology becomes more widespread. Given the smaller datasets used in SLMs, it gives a better possibility of control and transparency. This would help ease meeting the compliance requirements. Large language models (LLMs), while powerful, may not be necessary for all scenarios due to limitations in customisation for specific domains. Smaller language models (SLMs), with fewer parameters (around a few billion), offer faster fine-tuning and cost-effectiveness. Some call them “tiny” models, although they typically have parameters in the millions, not billions.

The growing scope of SLMs in reshaping the landscape of AI applications

Recent discourse, including discussions led by prominent figures such as Satya Nadella during recent visit to India, has underscored the relevance of SLMs. Microsoft, for instance, has positioned itself as a provider of SLM solutions, echoing a broader industry trend. Hardware manufacturers like Lenovo and Intel are also aligning their offerings to cater to the needs of SLMs, recognising their growing importance.

As organisations strategically integrate AI into their operations, the choice between large and small language models becomes crucial, influenced by factors like data privacy, efficiency, and use case requirements. Small language models offer advantages in operating locally, addressing data privacy concerns, and providing cost-effective solutions with acceptable accuracy.

In conclusion, the rise of small language models represents a nuanced approach to AI deployment, requiring organisations to evaluate model size and computational needs strategically for optimising performance and meeting specific use case requirements.

The author is the CTO, Nagarro