Business Daily from THE HINDU group of publications
Monday, Nov 19, 2007
ePaper | Mobile/PDA Version


eWorld
Features
Stocks
Cross Currency
Shipping
Archives
Google

Group Sites

eWorld - Interview
Web Extras - Technology
‘Speech expands reach’

Meet the man looking to reach more people, more markets, and more profit — all through speech.


“A human agent cost per call averages about Rs 3-6/call. If this call is automated with speech, direct savings of about 60-70 per cent of this cost is possible.”




C. Mohan Ram

D.Murali

In the world according to C. Mohan Ram, you ‘just talk’.

Simply talk, “Summa Pesunga”, “Bas Baath Keejiye” – that’s what he wants to promote through LatticeBridge Infotech Pvt Ltd (LB), Chennai, a company he founded in 2002.

“Speech Expands Reach,” declares Ram. He talks about reaching more people, more markets, and more profit, all through speech, while recently interacting with eWorld over lunch.

“We are also working on something like simple ‘tags’ with only speaker and mic and connectivity to a central sever (with ‘personal assistant’ for everyone) to do connections, transactions and search for information, etc, simply speaking through this simple device,” describes Ram about how newer devices, such as a minimal mobile, can help overcome the shortcomings of low PC penetration.

“If we are able to do this at, say, Rs 200/set, then we have solved the communication problem of this country.”

He smiles with the happiness of someone who holds the key to big puzzles, and his enthusiasm is infectious. And we carry on the dialogue over e-mail…

Excerpts from the interview.

What was the first project? What work did it involve?

Our first project was with the Railways. When I was doing background research, I found that the Railways contributed a lot to the acceptance of interactive voice response (dual-tone multi-frequency) or IVR (DTMF)-based systems in the country, by leading the adoption of this technology very early. This later paved the way for mass deployments in other sectors such as banking.

Hence, I approached the Railways for deploying this technology — as people normally need to know train number to operate the IVR and end up making multiple calls to reach “manual enquiry counter”. ASR (automatic speech recognition) could solve this by giving status of trains, by recognising their name. The Railways accepted to test the technology — but were not willing to pay. In fact, we paid a licence fee to them to get data. I justified it as an investment into marketing budget and deployed the first-ever ASR deployment in India, in English and Tamil.

The ‘133 system,’ for Chennai division of the Southern Railways, became a benchmark for passenger services in IR (the Indian Railways) and we had opportunity to deploy an ASR-based system in Patna (which belongs to the Railway Minister’s territory) – serving entire Bihar (in Hindi and English). The ‘133 system’ helped us as a real marketing tool — to win customers such as SBI, IOCL, CONCOR and so on.

What is your revenue model?


LB has multiple revenue models to suit everyone. Customer can own the system (SBI, Etisalat), rent a system (hosted model like CONCOR), or pay per transaction ( Tata, AirTel, Vodafone) etc.

Normally we offer transaction-based solution to telcos (telecom companies), with a minimum volume commitment.

In the ‘hosted and transactions’, customer does not incur any CAPEX (capital expenditure), but spends only on OPEX (operational expenditure) and hence it is easy to experiment.

Can cost savings be quantified?

Yes. Alternative to ASR is traditional IVR (DTMF-based): press 1 for xxx and Press 2 for yyyy etc. It is menu-driven and time-consuming. Average call duration is higher. With ASR, what you have is simpler navigation and a cut-down of the average duration of calls. More calls are, therefore, handled with the existing infrastructure.

Also, call costs of agents can be saved because automation facilitated by ASR makes it possible to extend expanded services that would normally have required human agents located in a branch office.

As you may know, a human agent cost per call averages about Rs 3-6/call. If this call is automated with speech, direct savings of about 60-70 per cent of this cost is possible.

For example, with one of the leading telcos, in the Delhi circle, they were paying Rs 6/call (and the agents handled about 80,000 calls per day). When we implemented ASR, the calls to agent dropped by 35 per cent in the first month — they paid approximately Rs 1.20 / call on ASR, which means a direct saving of Rs 1,34,400/day.

This has increased month on month, with only 15 per cent of calls going to agents now. Thus, huge savings are possible, as also very quick ROI (return on investment).

In what sectors do your products work?

LB’s business is divided into two major business lines, viz. customer interactions with self-service, and enterprise efficiency solutions with speech.

For customer interactions, LB is currently focusing on four major segments — viz. travel and logistics, BFSI (banking, financial services and insurance), telecom, and media and entertainment.

Most of the success stories are currently in the travel and telco segments.

The enterprise segment is relatively new and speech products for enterprises are under beta and pilot stage. I expect this to contribute to about 25-30 per cent of business next year.

A success story of a customer abroad?

For a telco in West Asia, we implemented “international operator assisted calls” – with ISD (international subscriber dialling) code based on ASR. The investment by the customer is about $2,50,000 in all: for hardware, software, speech and TTS (text-to-speech) RT (run-time) licences and application.

Prior to implementing the ASR, calls were answered by agents.

Cost per call on agent (as given by customer) is about 80 cents. On average, calls per day numbered 6,000 (even now it is approximately the same).

First three months, calls completed with ASR system were about 60 per cent. Which means, the payback happened in 86 days.

A quarter million dollar investment return in less than a quarter!

The system went live on January 4 and the customer recovered the investment before the quarter ended on March 31.

The system is in Arabic and English and has been a great reference for us to generate additional business opportunities in the region. We now have ADNOC, Etisalat, Etihad Airways and the Abu Dhabi Tourism Authority as our customers abroad.

What are the emerging applications? (HR or human resources, GIS or geographic information system, education, solutions for the physically challenged, speech biometrics, home, professional productivity tools such as ‘Journo’, music on demand?) Does speech technology permit the integration of other cutting edge technologies? How?

There is no limitation to application of speech technology. With changing times (mobile penetration), the most appropriate technology for mobiles is ASR. Speech is becoming a convergence tool, across many ‘day-to-day’ affairs of the common man. The next wave is location-based services (LBS) combined with ADA (automated directory assistance) or Yellow Pages. It is predicted to be about $10 billion by 2010.

Speech is now replacing keypads and keyboards (in future — it is predicted that mobiles would have only mic and speaker with optional display – keypads would vanish). LB would slowly — but steadily — make all relevant use of ASR applications to improve productivity and bring enterprise efficiency — resulting in growth of our customers — multi-fold.

Also, I see application of speech biometrics across multiple areas in customer interactions over phone as well as in internal applications such as the IT-helpdesk using the solution to provide various services efficiently.

LB would be concentrating on helping the physically challenged, as I am personally committed to doing this.

Any interesting insights (about human languages) that you have gathered over the years you and your team have worked on speech technology. (Wonder if speech systems work in non-human environments too!)

Handling “Madras Tamil” is the funniest experience. For most of our applications, we write a “grammar” for recognition. This completely fails with typical “Madras bashai”, as it is not Tamil in many ways. This made us adopt “mixed” language (Tanglish) for many applications. The non-human environment, we have not tested this application yet.

Where does ASR (automatic speech recognition) fail? Or, where is it not apt?

In a very noisy environment – say in a loom factory where noise is more than signal – ASR would fail. In India, we have the habit of honking even when not necessary and in such a place (say, the roads), ASR does not work very well. It does not require soundproof rooms, but relatively quiet places would give excellent results.

Again, when a person has a “grievance” and he/she wants to talk to a person, someone they believe to be trustworthy, it is better to have a human reassuring that the grievance would be redressed (at least an assurance). Putting an automated system in such a situation would only lead to further aggravation of the grievance. In some such cases, ASR is not the best fit. But, in more than 90 per cent of the situations, ASR would be more convenient and simple.

On the IP (intellectual property) creation at LB, and also the IIT advantage. And the skills you find most relevant to your work.

Currently we use COTS (commercial off-the-shelf) product from Nuance and do the “regionalisation” and “localisation” to suit an application and territory. But, to realise the dream of “bridging digital divide” – for a target of 700 million people to use ASR for transactions and information – we cannot be dependent on COTS product; it would be cost prohibitive.

LB is, therefore, working with IIT-Madras to develop our own engines for ASR and TTS in about 3-5 years time.

Working with IIT-Madras has been great, as they are not only helping in R&D (research and development) but also invested in the company with an aim to capitalise the tech deployment when ready to rural masses, to bridge digital divide.

Towards this, we have already started work and have gone through a couple of iterations of R&D plus testing over the past 18 months. We also have commissioned a full-fledged ‘language resources’ (LR) team that would build a corpus of transcribed data for a new acoustic model and a language model for various Indian languages. A team comprising computational linguists is working on this already, which would be the key IP of the company in the future.

Our company has filed three patents so far for very cutting-edge application of speech in the common man’s life. This would add to the IP of company in future and help in valuation and also capitalising huge biz opportunities. Two more patents are under processing.

Do you think speech recognition systems can help serve larger interests too: such as – better governance (by facilitating democracy?), education, ensuring the right fit (of students in study choices and employees in jobs), and counselling (therapeutic?)

This is a nice question — there is nothing that can stop these applications. I see that many initiatives for e-Governance come with self-service kiosks, without realising that more than 70 per cent of the population cannot read the screen. The purpose of sharing the info through self-service kiosk is to cut the middleman. But, in some places you can find the villagers paying a ‘smart guy’ money to find info using “kiosk”!

We are thinking “big”, but would act once we have our own engine to proliferate into larger apps. Already some initiatives related to education segment and employment etc. are being worked with telcos.

We are also betting big on “speech biometrics”, which can securely identify a person at the other end of the phone and help in various transactions, including financial, trading and so forth.

You would see LB very visible in fuelling/contributing to the growth of India to be a “superpower”.

On the origins: why speech as a passion?

I am a trained geophysicist from IIT-Roorkee — one among the best institutes for geosciences in the world. We worked on “signal processing” – mainly to understand all the signals given by the Earth (electrical, magnetic, seismic, acoustic etc.). But, somehow I knew I would not fit in a Government/ PSU (public sector undertaking) job. GSI (Geological Survey of India), ONGC (Oil and Natural Gas Corporation), CGWB (Central Ground water Board), and OIL (Oil India Ltd) were the mainstream recruiters at that time.

I searched for an alternative in the computer field, as we were used to writing a lot of long programs to process signals to find oil, minerals, water and so forth. In 1986, it was easy to get a job in the then emerging IT (information technology) field.

Thus I have been in IT for more than 21 years…I started as a technical specialist in X-Windows and moved to sales and marketing, to handling very large projects for my companies, which took me to various parts of the world.

Speech passion probably comes from my signal processing background. I was amazed to find the possibilities of applications for socio-economic and cultural as well as technical applications of speech.

I have personally given the first four years of my professional life (after M.Tech. – from 1986 to 1990) to rural upliftment/betterment when I worked on the application of technologies like GIS and image processing to help resolve local issues in Bahraich (most backward district of Eastern Uttar Pradesh), Deogarh and Dumka in Bihar, Bundelkhand in Madhya Pradesh etc. working closely with people, on income generation at the local level.

As a result, I could understand the local issues in rural India and firmly believed that technology and communication improvements are required to make the country develop into a superpower. Which explains why I always kept searching for relevant appropriate technologies, wherever I travelled across the globe (in work-related assignments, as I do even now).

One such finding, while on a trip to the US (in early 2001) was ‘automatic speech recognition’ application (called ‘Hey Anitha’). I realised that my search for the appropriate technology is over and that ASR is an ideal solution to “bridge the digital divide”, and relevant for country like India. Then I began doing a lot research to convince myself that the technology is mature enough to make the best use of.

When I decided to be on my own, I knew CRM (customer relationship management) is going to be the biggest growth area as product differentiation would vanish over time and only “innovative customer interactions management” would be a key determining factor for growth.

Hence, with conviction we ploughed resources and took some risks in this technology – a “reasonably new” one worldwide (the second wave had just started in 2001). God has been kind to support with me the right people, right customers, right investors and right use of tech (so far) for our growth (we grew 376 per cent on topline – from Rs 2.2 crore to 8.29 crore, last year and also declared a maiden dividend to investors).

dmurali@thehindu.co.in

More Stories on : Interview | Technology | Software

Article E-Mail :: Comment :: Syndication :: Printer Friendly Page



Stories in this Section
Put some zing into SMS


‘Walk into my… lab’
Computing differently
Open Source, sans the risk
Downloading video
‘Speech expands reach’
New threats to the Net
Quiz
Plugging knowledge leaks
Music to the ears
Watch it!


The Hindu Group: Home | About Us | Copyright | Archives | Contacts | Subscription
Group Sites: The Hindu | The Hindu ePaper | Business Line | Business Line ePaper | Sportstar | Frontline | The Hindu eBooks | The Hindu Images | Home |

Copyright © 2007, The Hindu Business Line. Republication or redissemination of the contents of this screen are expressly prohibited without the written consent of The Hindu Business Line