Voice recognition technology is gaining usage in a wide range of devices and appliances.

Anyone who was glued to the telly during the 90s will remember the endearing little show called ‘Small Wonder’. The protagonist, Vici (an abbreviation for ‘Voice Input Child Identicant’) was a robot who lived the life of a small girl, talking, playing, interacting with everyone like a human being would. What enabled her to do so was the ability to understand whatever people around her spoke. Although we still might be a couple of decades away from turning that bit of robotic fiction into mainstream reality, voice recognition technology has already taken off and how!

The tech-savvy would be by now familiar with the voice-activated version of Google Search where you literally ask Google to look something up for you. The same goes for Apple’s voice-based digital assistant, Siri. However, voice-activated technology is still in a relatively nascent stage.

Stumbling blocks

Even with really popular voice recognition systems such as Apple’s Siri, most people realised it had trouble understanding the their commands if they didn’t “fake” an American accent.

Adapting to variations in different accent still remains one of the major stumbling blocks in voice recognition technology.

“The single most important factor in providing robust recognition of varying accents is incorporating a large corpus of training data (audio recordings of actual users) spanning a broad range of accents in the creation of the speech recognition language models,” says Sunny Rao, Managing Director, Nuance Communications.

Hence, while soon your smartphone might be able to understand that you are asking if the “weather”, and not “Heather”, is hot, it’ll be sometime before you can go colloquial with it. “How’s it hanging, bro?” will still be as confusing to devices as it is now.


One of the more unexplored platforms where voice recognition would find a very interesting fit is in home automation. Right now, there are sensors that switch off the lights as you walk away from a room or switch on the telly as you recline on your La-Z-Boy. The next step is when you can gain more control over your home appliances through voice recognition.

"We've actually developed a platform to voice-enable all your household devices and appliances. The technology exists today. Over the next few years, we will be talking to our TVs, security systems, lighting, thermostats, washers and dryers and more," says Vanessa Rose from iSpeech.

Enterprise solutions

Voice recognition seems to have gained more ground in enterprise solutions than with personal/mobile applications as of now. Take for example Nuance Communications’ Dragon Medical Practice Edition. It’s a medical speech recognition solution that helps clinicians to create medical notes directly into any electronic health record (EHR) system via voice. Turning to voice tech in the medical field helps health personnel reduce medical transcription costs and improve the comprehensiveness of patient medical records. Additionally, Dragon Medical can save clinicians up to 40-60 minutes per day on documentation according to the company.

“Dragon’s legal and desktop suite of products have also been deployed in High Courts at Mumbai, Delhi, Karnataka, Andhra Pradesh and Kerala as well as in Income Tax, RBI and NABARD offices to name a few,” says Sunny.

On the side

Apart from fine-tuning voice recognition capabilities, software companies are also tweaking on stuff that might be beyond the obvious,.Nuance, for example, has invested heavily in ensuring successful interactions across a variety of environments such as noisy environments such as train stations and traffic noise; cross-talk and side speech; differences in microphone quality and placement as well as wired and Bluetooth; variety of phone signals (wired, CDMA, GSM, VoIP, etc.).

In doing so, Sunny hopes that users can expect natural language capabilities to expand to allow free-form, user driven dialogue where they can interact with systems in a truly conversational manner. “These systems will leverage a broad set of information about the user including information such as their geo-location, past interactions, and communication preferences to not only enable a more intelligent dialog that is contextually relevant to that user and their current situation but to also predict user needs and proactively provide support and make recommendations,” adds Sunny.

If this happens I am going to second Vanessa’s belief - “Over time, talking to our computers will become just as natural as touching them.”


(This article was published on August 9, 2012)
XThese are links to The Hindu Business Line suggested by Outbrain, which may or may not be relevant to the other content on this page. You can read Outbrain's privacy and cookie policy here.