Business Daily from THE HINDU group of publications
Monday, May 07, 2007
ePaper

Clasic Farm

eWorld
Features
Stocks
Cross Currency
Shipping
Archives
Google

Group Sites

Home Page - Telecommunications
eWorld - Insight
Beginning to be heard

Ravikanth Nandula

On the corridor leading from the conference hall to the lobby at a hotel in Cannes, France, last week, Daniel Valcour was speaking to his secretary. The conversation went like this:

"You have three messages", the secretary started off.

"Read first message." Daniel, curt.

"Message from Craig Mc...", the lady begins to oblige.

"Next message."

"Message from...."

"Give me caller ID..."

"001418..."

"Add it to my phonebook. Next message."

"Message from..."

"Give me the weather in New York." The lady is cut short yet again.

Even as the lady was gathering up the weather forecast, Daniel was at it again.

"Search the Web for Indian mobile market growth."

This time she takes a few moments to fire up the search engine, in this case Google, key in the words and come out with the information.

One can be excused for thinking that Daniel was one ungentlemanly boss. But the dame doesn't mind. She's after all an automated voice in his mobile phone. And Daniel was giving a demonstration of his gadget, which is enabled for speech recognition.

Sitting on a tiny footprint of about 100k, the application (and the lady in question) was taking his speech, sending it to a server stationed a continent away where his voice signals were processed by a speech recognition engine and the results were sent back to his mobile phone. All in a matter of a few seconds.

Not once touching the keypad, Daniel had opened the menus, listened to his messages, searched the Web and dictated a couple of text messages and a lengthy e-mail that were duly typed out by the lady even as he was speaking. For good measure, he had gotten a hotel booking done for himself in San Fransisco, filling up the reservation form all by his voice. Done with the day's work, he relaxed with his choice of music, telling the music player on his phone "Play Shakira, `Hips Don't Lie'."

The hip phone that Daniel used isn't lying, either. The big story here is that speech recognition technology is going mainstream. Hitherto limited to Desktops and Networks, speech is about to make its big entry into the world of portable gadgets.

"Everything is beginning to talk and listen to you. Fasten your seat belts," Peter Hauser, an Industry veteran and Senior VP & GM, Nuance Communications International, delivers a mock warning.

"We're actually looking at the tip of an iceberg as far as the amount of power you're going to have at your fingertips with your mobile phone. The advancements in speech recognition technology mean there's a rapid and easy access to content, search or actually interacting with a customer care system. Speaking and dictating is a more natural way of getting things done," Vlad Sejnoha, Vice-President and Chief Scientist at Nuance Communications, says.

As an idea, speech recognition was the subject of science fiction books and films for over half-a-century. Landmark films such as 2001:A space Odyssey and the popular TV serial Star Trek have used the `wow' factor of machines understanding and speaking like humans. As a technology, it's no newcomer to the IT world, either. Even a decade ago, companies such as IBM, Dragon Systems and Lernout & Hauspie were shipping out products and applications for desktops and enterprises. As far as the consumer is concerned, the real turning points that took speech recognition out of the realm of, first, science fantasy then second, that of a distant technological fact, came around the turn of the century.

One of the pivots around which the possibility of speech recognition being embedded in personal devices is the progress of hardware. Over the last decade, we have seen the processing powers and memory capacities of these tiny devices grow exponentially. The PDAs and high-end mobile phones that come out today have more computing power and memory than a commonly available PC from a decade ago, making it easier and possible to run intricate, processor-intensive and memory-hungry applications such as speech recognition.

Cumulatively, our acceptance and embracement of technology itself, be it the Internet, Mp3 players, iPods, DVDs, game consoles, WiFi, Bluetooth connections, etc, has come to a reach where we take these devices for granted. Speech, industry people feel, has the potential of becoming the primary interface to operate and use these products and services.

Second, over the last decade, there had been a rapid progress on the core technology front. Speech recognition software that shipped five years ago needed the user to `train' the application. One basically read out a number of pages of text (prepared by the application) into the computer's microphone. The application then matched your input against its data and found a `pattern' in your speech through which it can recognise your further inputs. This used to take agonising hours putting off the consumer from using the product. Version 9 of Dragon Naturally Speaking, Nuance's flagship desktop speech recognition application released a year ago, did away with the training. Claiming an accuracy level of 99 per cent (reviewers put it between 85-90 per cent) one could use the application right out of the box. Achieving a speed of up to 160 words, it has 16 million users worldwide.

Nuance Speech Recogniser v9, the company's core speech recognition engine released recently, claims an error reduction rate of 27 per cent over its previous version. Available in 44 different languages and dialects, it has an Indian-English version, too.

Juha Iso-Sipila, Head of technology area at Nokia, who are developing their own core technologies in speech, says, "The age of subscriber-dependent usage of speech technology on mobile phones is over. The phones to come will use subscriber-independent natural dialling interfaces." What he says is that the day is not far off when you control your phone totally by your voice. Without giving a timeframe, he hints that a phone that can take dictation in Hindi and send it out as an SMS in Devanagiri script may be in the works.

Scientists approach speech recognition by creating mathematical models that explain how different aspects of language work. The two principal models are the acoustic model and the linguistic model. Acoustic models are the mathematical expressions of how speech sounds. For example, how `how' sounds as against `who'. The second model, linguistic model, tries to predict the likelihood of a certain sequence of words in a given speech against another sequence of words. When one `speaks' to the machine, the recogniser converts the sound waves (using techniques such as spectral analysis) into a representation which then can be interpreted by the models. The `search' component in the recogniser then matches the pattern with those that are already in the database and builds a hypothesis of what you could have said.

"Speech technology is actually kind of stupid. It does all it's calculations and says, `Okay, this is what the speaker could have said. Now let's try to match this against the input'. But our job is to make it smart and we're making it smarter everyday," Vlad Senjoha explains. "We've made vast progress on core algorithms on which we can model various processes that go into the voice recognition chain. There has been an increase in the number of sophisticated statistical and database techniques. Also in terms of magnitude, we have an increasing amount of data to train the models."

The third, and equally important point that contributed to the present robustness of speech recognition is from within the industry. Over the last decade there has been a coming together of various players in the field. Through mergers and buy-outs, and a bankruptcy thrown in, the players pooled in their specialisations to create speech recognition softwares that are increasingly better than a day ago. Nuance Recogniser v9 has six different core speech recognisers that have gone into its making.

Some drawbacks

But technologies like these are not without their drawbacks. What is termed as `environmental issues' by the industry is one of them. Simply put, it talks about the recogniser's performance under noisy conditions such as outdoors. Not `intelligent' enough to differentiate between the sound made by the speaker and the noise made by a passing lorry, the recogniser may throw up results that could be unpredictable, to say the least. The desktop versions of these applications had a similar issue when results would vary if you `trained' the software in a room without a fan running and used it in another room with one running. The advancement to `no-training-required' versions of the applications solved this particular problem. From the hardware side, noise-cancelling headsets, based on military technologies developed for the warfront, are making their appearance in the market and may become a boon for the users of speech technology.

"Our job is far from over. We don't consider it done. There are many details to be worked on and improved. Though we have features like predictive punctuation, we have to go towards a time when the software becomes intelligent enough to punctuate well on its own (right now, the user has to say words such as `comma', `Fullstop', `space' etc, wherever needed). Automatic and intelligent capitalisation, automatic formatting of documents and intelligent and predictive insertion of numerals (`9' than `nine' where needed) are some of the things that we're working on," Senjoha says.

At the same conference, another demonstration was held. Dubbed the `The Amazing Race Part Doux', it featured a teenage SMSing champion from the US, Eli Tirosh, against Nuance's speech dictation software loaded on to a mobile phone. The challenge was to send a lengthy text message fast and accurately and the unforgiving machine beat Tirosh and her twinkling thumbs 16 seconds to 24 seconds. To the left of the stage, however, more interesting things were going on. One of Britain's top racing drivers, Perry 'The Stig' McCarthy (Top Gear Show on BBC, anybody?) was given the simple task of completing one lap of Monaco's F1 racing circuit on a gaming console. As he was driving, he was sent an SMS to which he's asked to reply ('unlawful, but don't we all do it?') and once done, select a particular song from his iPod to play. Needless to say Perry failed the multi-tasking test, crashing 27 times. Nuance's young executive completed all the tasks, speaking to the phone, sans a mishap. What was shown was a glimpse of hands-free, eyes-free mobile usage of tomorrow.

Come to think of it, Indian consumers have been using speech technology for a while now, though in a limited way. The interactive voice systems found in banking, hospitality and travel sectors are an example. Already deployed widely in medical transcription and legal outsourcing businesses, the idea is slowly spreading wings. In fact, it's already active in the mobile arena; when one picks up the mobile phone and says `cricket' to get the latest score, it's speech recognition technology in action. The old man in the advertisement who records the song from the film Julie (disturbing the young couple in the car, remember?) is using it, too. As this technology gathers steam it may just foray out from the `emotional-value-added-services' space that it occupies now. The possibilities for m-commerce and enterprise-level value added services, apart from entertainment and personal productivity products, are immense. Top guns in IT services such as Wipro are already working on it. "There are more phones on this planet now than there were people one hundred years ago. The number of text messages sent from mobile phones in 2006 is one trillion. And it's more natural to speak than type, Peter Hauser says. "Figure it out."

rkanth@thehindu.co.in

More Stories on : Telecommunications | Insight

Article E-Mail :: Comment :: Syndication :: Printer Friendly Page



Hiring

Stories in this Section
Beginning to be heard




The Hindu Group: Home | About Us | Copyright | Archives | Contacts | Subscription
Group Sites: The Hindu | The Hindu ePaper | Business Line | Business Line ePaper | Sportstar | Frontline | The Hindu eBooks | The Hindu Images | Home |

Copyright © 2007, The Hindu Business Line. Republication or redissemination of the contents of this screen are expressly prohibited without the written consent of The Hindu Business Line