For building AI in local Indian languages, ShareChat is looking at tie-ups with IITs and other premier engineering institutions.
Debdoot Mukherjee, Vice-President, AI, ShareChat, told BusinessLine that while building AI in English has an existing framework to work on, when it comes to Indian languages, many aspects are evolving. Take for instance data from local languages. “It is entirely unstructured and, hence, the framework for developing an AI model has no ready reference point. This is unlike Mandarin, which has a reference framework,” he said. ShareChat has already built out Natural Language Processing (NLP) models which involve video, speech and text in languages such as English and Hindi. “Now, an Indianised version of this model is what we are looking at,” said Mukherjee.
As an example, ShareChat has rolled out Hindi ‘Shaayari’ generation. In this, a user can key in the thought in a box and a rough structure of a ‘Shaayari’ shows up. It is similar to Google's language translation, where fluent sentences are generated like a translator service. Another example is a piece of Indian music and associate it with a certain visual. The work is around how to build algorithms around this.
It is in such areas that ShareChat is looking to collaborate with institutions such as IITs, IIITs to take the industry-academia partnership into new tech frontiers. This development also comes at a time when apps such as ShareChat are seeing a surge in user growth, many of which are non-English speakers. The issue also assumes significance as instances of building AI capabilities across the globe is in English and other languages; in India it is a different story. “India never had its own social media nor any other products that address geographic and demographic diversity,” stated Mukherjee.
ShareChat has exceeded over 160 million MAUs (monthly active users), with the average daily time spent by users on the platform at 31 minutes. What this means is the millions of bytes of content are generated in many local languages. Also, in India, with its 700 million users who have not yet adopted the internet set to enter the fray, a locally-developed AI model takes centre stage as information in the form of videos, text, audio are increasingly being used to generate misinformation and ‘fake news’.
Advancements in the field of language processing, such as generative pre-trained transformers (GPT), also will not help in the buildout of Indian AI. So far, efforts have been on but at a small scale for a country of India’s size. Recently, IIT-Madras developed AI models and data sets to process texts in 11 Indian regional languages. This was taken up jointly with AI4Bharat, a platform for building AI solutions for problems of relevance to India. Similarly, Logically, a social enterprise using AI to tackle misinformation, and Indraprastha Institute of Information Technology-Delhi (IIIT-D), have launched a two-year research partnership that will explore the provenance, motivations and psychology of misinformation shared online.