Last month, Uniphore, a unicorn incubated at the IIT-Madras Incubation Cell, announced it had taken over a French company, Hexagone. The acquisition, says Uniphore, will add muscle to its offering, which falls under a growing branch of science — conversational artificial intelligence.

It has been a while since software plugged into human voice — translating speech into text — and things have since moved far ahead. For instance, Uniphore uses automated speech recognition and natural language processing (NLP) to help clients understand the emotions of their customers when the latter reach out to their contact centres to complain or seek help in resolving an issue.

So, how does it work?

Have you ever had your mother or partner ask over the phone why your voice is sounding low, and checking whether you’re feeling fine or ill? What did they base their question on? Voice!

Computer science and computational linguistics have converged over the past several years to help convert speech to text — also called speech recognition — which is helping companies sense the stress, delight or anger in customers’ voices. Conversational AI is proving good enough to evaluate customer sentiment.

NLP is used for automated speech recognition. Virtual assistant applications such as Siri or Alexa are ready examples of services that use speech recognition.

When it comes to speech recognition used by companies to understand customer sentiment, especially by analysing call records at contact centres, Uniphore says its conversational artificial intelligence (AI) helps predict intent from voice, sentiment, tonal cues and emotion in customer conversations. Such technology helps drive self-service before the customer’s call needs to be transferred to contact centre agents for complex queries. This also helps companies gain insights into agent performance, Uniphore says.

Conversational AI is driven by three core technologies, says Uniphore: NLP, AI, and machine learning (ML).

NLP software analyses natural human language and speech, interpreting contextual nuances and extracting relevant information. Together with natural language understanding (NLU), NLP allows humans to have conversations with AI.

AI uses the data analysed by NLP to predict patterns of communication. Conversational AI allows machines to communicate back and forth with humans, generating relevant automated responses based on the speaker’s intent and other contextual insights.

ML enables AI-based systems to ‘learn’ and improve from experience without being explicitly programmed. “A subset of machine learning, deep learning, allows conversational AI models to cluster and categorise extracted data to make highly accurate predictions. Deep learning models run on neural networks. Many voice AI-based virtual assistants use deep learning models to mimic natural human speech,” according to the Uniphore website.

These three together power many of the tools and experiences we take for granted today, such as search engines, email spam filters, language translation software, and grammar analysis.

Uniphore says the technology from Hexagone will “enhance its capability to fuse all data derived from computer vision, natural language processing, knowledge AI, and voice and tonal analysis to pick up behavioural and emotional cues”.

In a video announcing the acquisition, Hexagone chief Camille Srour explains that even the sound of frustration — the “aargh” from an angry customer — or a laugh in every conversation are picked up and analysed by this technology.

He points out that without such technology companies tend to pick up voice calls at random at the end of every year and arrive at marketing or sales strategies based on insights gained from them. This, he says, is not the ideal way. After all, if only three customers want a certain feature among the thousands of end-users, then the company would end up being misled if it happened to listen to only those three customers randomly.