One of the biggest challenges that call centres face in the country is people using more than one language in conversations. Though speech recognition solutions are available, they are mostly language-specific, making it difficult to identify, transcribe and analyse the data.

Researchers at the International Institute of Information Technology (IIIT-H) have found a workable solution to this problem.

They have come up with a common Automatic Speech Recognition (ASR) system for six Indian languages. The multi-lingual system will hit the market by August 2021, according to Anil Kumar Vuppala, a Professor at IIIT-H who led the research team.

IIIT-H launches pilot to collect speech data

The use cases are many. For one, e-commerce firms can let their customers place their orders speaking in the language of their choice or mixing words from different languages. Why, the platform can pick up Tenglish or Hinglish (where English is freely mixed with local languages).

“Despite the different orthographies (different systems of spellings), Indic languages have a big advantage. Dravidian and Indo-Aryan languages share a common phonetic space. This made it easy to build a common speech recognition system for various Indian languages,” he said.

Intel-IIIT-H set up AI research centre

“We train the neural network, much like a child is taught the alphabet. We split the word the way we pronounce it and transcribe it into the corresponding acoustic sound biologically, that is then given a phonetic representation,” Ganesh S. Mirishkar, a research scholar, said.

“Mapping is done between the wave format and the matching transcript. This is fed to the neural network, which is essentially a mathematical operation that trains the system,” he said.

As of now, the IIIT-H team has built a framework for three languages — Telugu, Tamil and Gujarati — in a single block.

The team is working on a multi-lingual model with nine languages — Hindi, Marathi, Urdu, Bengali, Tamil, Telugu, Kannada, Malayalam and Gujarati.

Data pile-up

To build a credible speech recognition system, the team requires thousands of hours of properly labelled voice data. “To make up for the scarcity of data, we decided to combine languages. With 60 hours of Tamil and Telugu data and only 10 hours of Kannada data, one can still build a good Kannada ASR,” Anil Kumar said.

The team said this will have several commercial use cases in geographies like India, Europe, South-East Asia and Africa that offer greater opportunities for multi-lingual vernacular voice search for multiple needs.

Collecting voice data

Meanwhile, Ozonetel, a cloud communication solutions player, has said that the IIIT-H has selected it to collect 2,000 hours of Telugu speech from Telangana and Andhra Pradesh.

Post completion of this project, users can converse with their personal digital assistant in Telugu as well, along with Hindi and English.

As a voice telephony partner, Ozonetel will provide interface, KOOKOO platform, to collect data from volunteers. “Through the KOOKOO platform, volunteers can access speech links and record conversation,” Chaitanya Chokkareddy, Chief Innovation Officer of Ozonetel, said.