e-Aksharayan, an optical character recognition (OCR) engine for seven Indian languages, was launched here on Monday.
The engine was developed by Technology Development for Indian Languages (TDIL) under the Ministry of Electronics and Information Technology (MeitY) and 14 academic institutions including Indian Institute of Technology, Delhi.
What does it do?
One can upload the scanned copy of the document in any of the seven languages — Tamil, Malayalam, Hindi, Kannada, Bangla, Assamese and Punjabi. The file formats accepted are BMP, PNG and TIF with file resolution of 300 DPI. The tool converts printed document images to editable text with upto 90-95 per cent recognition accuracy.
Before the launch, the work on OCR started back in 2007-08 and took a final shape only in 2015-16. The tool again underwent several iterations before it was launched today, the official explained, who was involved in creating the platform since the inception in 2007.
Many of the ancient Indian literature are in the scanned format, which lack clarity and continue to be inaccessible to majority of the population. Since the tool translates the scanned text into searchable format it will popularise Indian languages among the next generation, according to the official.
“There is a need for more such tools for different purposes such as translation and speech recognition if we are serious about creating Indian language ecosystem,” Ajay Prakash Sawhney, Secretary, MeitY, said
At a time when people are using videos for learning purposes, most of the video content is in English, that over 70 per cent of population cannot understand. “We could have a simple tool that will provide subtitles in Indian language real time while watching a video,” Sawhney explained.