Opinion

Digital India’s linguistics challenge

Girish Nath Jha | Updated on January 23, 2018 Published on April 27, 2015

The language gap: Can smartphones bridge this?

In order to dismantle information hierarchies, both voice and data should be easily available in multiple languages



A recent research report by Telecom Regulatory Authority of India (TRAI) indicated that there are 935.4 million mobile connections in India out of which nearly 548 million were from urban India and the rest are from rural.

And according to I-Cube 2014, India has about 159 million internet users as of October 2014. Of this 119 million were from urban India and rest 40 million were from rural India. There has been a growth of 45 per cent from October 2013.

So, while India is expected to have 500 million internet users by 2017, a lot of the growth is dependent on getting the non-English speaking and non-internet literate audience on to the Worldwide Web.

Making e-governance effective

There is a saturation of English content on web and hence, a growing demand for consumption of content in Indian languages. Digital initiatives of the government will be a non-starter unless they deliver in 22 major and over hundred minor Indian languages.

All e-governance content needs to be delivered in not only in text mode but also in speech because that is the only way to ensure social and digital inclusion in rural India, where illiteracy is still a harsh reality. The TDIL (Technology Development for Indian Languages) programme of Union Government is a push towards localising e-governance content in 22 Indian languages.

Role of technology

People can and want to communicate with machines in their language – such as smartphones and other mobile devices. In India, smartphone growth has been phenomenal, bringing the best user experience at affordable prices.

With this there is also a demand for smart language processing applications. This implies using tools like machine translation, information retrieval, speech enabled search and multilingual content creation.

Computational linguistics is a multidisciplinary field of study involving primarily computer science and linguistics. We develop programmes which will enable more and more Indians to come online and circumvent barriers to internet connectivity like language and literacy.

For example, an English-Hindi machine translation will enable Hindi speakers from India to search the web in their language. In such cases, even if the native language content is not enough, machine can translate the search string into English, search the content in English and display back in the native language. Computational linguistics also comes with challenges.

For instance, natural languages are very diverse and dynamic. Hence, they keep changing at a faster pace. They could also be ambiguous where one word or phrase or sentence may have varied meaning depending on the contexts. Therefore, computer apps have to be updated constantly.

Besides the general need for smart apps with artificial intelligence, India needs this field much more than other countries.

The scope of this field is vast and could apply to various sectors like digital education, online health services and other areas of development that come under the Digital India initiative.

The game changer to internet connectivity in terms of numbers is the prospect of more companies investing in voice enabled technologies and locally appropriate web content.

Private players

It’s not only the government sector that needs to develop this field; internet companies too are looking at the potential of such technology. For instance, companies like Microsoft have been successful at achieving English-Urdu machine translation for its Bing search engine.

Popular keyboard application SwiftKey’s biggest strength is contextual prediction and adaptation to the user’s own language, learning from their personal writing style.

The app company has been investing resources in making their application more relevant for Indian language speaking users.

The SwiftKey app has found success on the Google app store and the company has seen growing traction across the Indian market, not least due to its extensive Indian language coverage (15 to date) and the multilingual feature allowing users to type in up to three languages at once.

Computational linguistics has helped in building predictive keyboards in several difficult native languages like Sanskrit, Santhali and Sindhi.

For simple things like browsing, navigating, filtering and processing information on web, we need to build software that can get at the contents of the document in any language.

This will ensure a level playing field to access information on internet. The Digital India initiative is a dream, but technologies like these can make it a reality.

The writer, an Associate Professor in Computational Linguistics at JNU, is also a technology consultant

Published on April 27, 2015
null
This article is closed for comments.
Please Email the Editor