The open-source solutions organisation Swecha has launched a unique method by using the crowd-sourcing model to build a Large Language Model (such as ChatGPT or Gemini). It has partnered with the International Institute of Information Technology (IIIT-Hyderabad) and Ozonotel, Swecha is building a network of one lakh summer engineering interns to build a Telugu LLM and a culture portal.

Building an LLM (large language models such as ChatGPT and Gemini) depends on the humongous  data on which it is trained.  A GenAI solution is as good or bad as the quantum of data it is fed.

Only resource-rich companies with deep pockets would be able to build such LLMs. In order to address this challenge, Swecha has launched a Summer of AI internship programme in association with the IIIT-H and cloud communication solutions Ozonotel. The programme would equip the students with job-ready AI skills and develop a Telugu language-centric LLM.

“This is an important initiative as there is no Indian-language and India-centric LLM available yet. Most Indian languages are considered low-resource languages, making it challenging to develop LLMs for them. A significant amount of foundational knowledge needs to be compiled and digitised to create the necessary digital data for these languages,” Y Kiran Chandra, Founder of Swecha, told businessline.

“This presents an opportunity to create a large pool of trained AI engineers, extending well beyond the small group of researchers and developers specialized in deep models,” he said.

The internship programme targets second-year engineering students. Students interview people in villages and towns to gather information on culture, folk tales, folk songs, local food and places of historical and cultural significance.

“The idea is to collect speech samples and generate texts of such conversations to build an LLM. This can be replicated in other languages and regions,” he said.

“This is going to be a very interesting and useful initiative. It will help build AI talent on a large scale. We are building modalities to build templates to collect relevant information,” Ramesh Loganathan, Professor at the IIIT-H, said.