Business Daily from THE HINDU group of publications
Monday, Nov 09, 2009
ePaper | Mobile/PDA Version | Audio | Blogs

eWorld
Features
Stocks
Cross Currency
Shipping
Archives
Google

Group Sites

eWorld - Information Technology
Simplify man-machine interaction

‘Multimodal systems’ enable your computer to carry out your commands with less fuss..


Multimodal systems offer advantages of higher task efficiency…For instance, you could encircle an area on a map and say ‘Udupi restaurants in this area’ instead of verbally explaining everything.



Prasenjit Dey

Why can’t we interact with a computer as naturally as we interact with a fellow human being? Why do we have to learn to type, click, scroll or understand icons, error messages and keyboard shortcuts?

If a computer is an advanced machine, it should be able to understand our way of communication instead of demanding that we change ourselves!

These are some of the questions computer scientists are asking as information technology becomes an integral part of our daily personal and business lives.

As computers become indispensable, it is important to make sure that no one is disadvantaged by the fact that they do not know how to interact with a computer.

This can only happen when interaction with computers becomes as natural as interacting with another human being. For this to become possible, computers must understand the means by which humans interact, such as speech, gestures and facial expressions.

Anthropological studies have shown that in human-to-human interaction, speech, gestures and facial expressions are produced as part of a single process and should be regarded as such.

Multimodal systems

In human-computer-interaction (HCI), the systems that attempt to understand and/or produce multiple such natural modalities as speech, hand gestures and facial expressions are referred to as “multimodal” systems. Enabling the unified understanding of different aspects of human interaction for HCI can lead to the widespread usage and acceptance of computing systems in the future.

Multimodal systems can help surmount the barriers of computer-illiteracy and the complexity of computing interfaces in both developing and technologically advanced societies, and thus bridge the digital divide and bring computing to one and all.

Since the time of the invention of computers, the keyboard and mouse have remained the de facto mechanisms for interaction with the computer. In the early years, the use of computers was limited to a group of educated, trained people who used them for productivity and data crunching, among others. However, the times have changed and computers touch our everyday lives in ways never imagined before.

The tasks performed on the computer are much more varied and very different from what was done before. These include entertainment, communication, information seeking and e-governance. These tasks are mostly social, i.e. done with friends and family members, and are momentary and transactional in nature. Though they involve very little data entry, a lot of short interactions happen to manipulate the information or media.

This new paradigm of usage calls for a new paradigm for interaction with the computer. The computer should easily integrate in people’s everyday use and cease to be a barrier to the real task at hand.

All these years, however, the computer has failed to disappear as a barrier to the real task at hand and, hence, has remained confined in use to a select, educated and the technology-savvy group of people. It has failed to deliver the promise of a digital revolution to the masses, especially in the developing world where illiteracy levels are high.

In this context, systems that understand the spoken language represent a ray of hope for bridging the digital divide. However, these systems are yet to overcome challenges in aspects such as language, accents, pronunciation and noise in the environment.

There is also the problem of user acceptance where people do not like talking to a machine. Adding to the challenges are user-expectations, where users assume a speaking computer to be very intelligent and are later disappointed that every small statement needs to be spelled out in detail.

Multimodal systems are the next step in this direction and have the potential to surpass many of the above challenges. The fact is that the human-human interaction, being very rich, also involves either the selected or the simultaneous use of modalities such as speech, hand gestures and facial expressions.

Advantages

From the standpoint of HCI, multimodal systems offer advantages of higher task efficiency by way of the parallel use of modalities to convey different parts of the input command. For instance, you could encircle an area on a map and say ‘Udupi restaurants in this area’ instead of verbally explaining everything.

It also leads to lower error rates by the use of information from one modality to disambiguate information from another modality. For instance, you could touch a photo on your computer or mobile screen, thereby, excluding the possibility of speech commands such as ‘play’ and ‘rewind’, which may be more relevant for music players.

Multimodal interfaces also allow higher user satisfaction because of the availability of a choice of alternative modes to fall back on when one modality malfunctions, the comfort level of using some modalities against others, and the overall satisfaction resulting from better performance of the system.

Interest in multimodal systems has also been driven by the ubiquity of inexpensive sensors such as touch screens, Web-cameras, microphones, and accelerometers in most personal devices such as smart phones, laptops and media players.

These devices, with better processing capabilities, can process a lot of images, videos and speeches in real time for better analysis. In addition, recognition algorithms have now come of age for various modalities such as speech, gestures and facial expression.

Though multimodal systems seem promising, there are lots of challenges that the layman is bound to face. Thus, it is and will remain an active area of research in computer science.

The performance of the recognition engines of the different modalities remains a challenging problem. Without reasonable accuracy in the recognition process, users can get frustrated trying to make the computer understand what they are actually trying to say.

As we try to make interactions more natural with the computer, our interactions with other humans interfere with the interactions with the computer. The challenge is to determine unobtrusively whether the user is interacting with another human or the computer in the same environment.

There is also the problem of informing users about the speech or gesture commands that the computer can actually understand. Moreover, one has to be cautious that multimodal systems are not the panacea for all problems in human computer interaction. There is a need to understand user behaviour and needs, and come with a ‘usable’ and not just a ‘cool’ application!

Multimodal systems would have crossed the chasm of acceptance when people do not even feel they are using them. The computer for them would have disappeared, and they would be getting more daily work done with computers without even knowing about it.

The innate naturalness of interaction with multimodal computing systems and their potential robustness can bring down the barriers of human-computer interaction.

This can greatly benefit people on both sides of the digital divide, both in the emerging and the developed markets. This is the one factor that can, perhaps, bring computing to one and all, all over the world.

The author is Research Scientist at HP Labs India.

More Stories on : Information Technology | Research & Development

Article E-Mail :: Comment :: Syndication :: Printer Friendly Page



Stories in this Section
Simplify man-machine interaction


Orkut promises new look, feel
‘Revival of Indian offshoring has begun’
‘Big role for BIOS, boss’
More teeth and byte to IT law
Quiz
Space technologies in defence
Waiting for the Wave
A Prodigy launch
Solar recharge




The Hindu Group: Home | About Us | Copyright | Archives | Contacts | Subscription
Group Sites: The Hindu | The Hindu ePaper | Business Line | Business Line ePaper | Sportstar | Frontline | The Hindu eBooks | The Hindu Images | Home |

Copyright © 2009, The Hindu Business Line. Republication or redissemination of the contents of this screen are expressly prohibited without the written consent of The Hindu Business Line