Behold GPT-4 — while ChatGPT continues to fascinate society, OpenAI has already unveiled its successor, even though no other generative AI could possibly capture the same level of public interest.

Well, generative AIs are often termed “human-like”. But would they ever reach the limits of human reasoning? It’s important to note that ChatGPT or its ilk is “a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question,” as summarised by Noam Chomsky, Ian Roberts, and Jeffrey Watumull in a fascinating recent piece in the New York Times. In contrast, the human mind “seeks not to infer brute correlations among data points but to create explanations,” these authors wrote.

GPT-4 passed a simulated bar exam with a score around the top 10 per cent of test takers, whereas GPT-3.5’s score was around the bottom 10 per cent, indicating an increase in capacity. However, when asked “Son of an actor, this American guitarist and rock singer released many songs and albums and toured with his band. His name is “Elvis” what?” GPT-4 chose “Elvis Presley,” although he was not the son of an actor. Thus, GPT-4 can still miss subtle details.

Yet there is a more serious issue. A generative AI makes up information when it doesn’t know the exact answer — an issue widely known as “hallucination.” As OpenAI acknowledged, like earlier GPT models, GPT-4 also “hallucinates” facts and makes reasoning errors, although it scores “40 per cent higher” than GPT-3.5 on tests intended to measure hallucination. Back in 2020, while giving GPT-3 a Turing test, a US tech entrepreneur named Kevin Lacker found that GPT-3 happily answered absurd questions without realising they made no sense. “How many rainbows does it take to jump from Hawaii to seventeen?” was one of Lacker’s “nonsense” queries. GPT-3’s response, “two,” received huge attention. It indicates that while a remarkable AI can write like humans, it still lacks common sense in its understanding of how the world works, physically and socially.

Predictably yes. “How many lightnings does it take to jump from Dhaka to nineteen?” is a similar query I deliberately posed to ChatGPT. Understandably, while upgraded, most of the known bugs are expected to be rectified. ChatGPT replied: “It is not possible to jump from Dhaka to 19 using lightning. Lightnings are electrical discharges in the atmosphere and are not a means of transportation.” Smart enough! ChatGPT, however, failed to mention that “19” is not a valid destination either. Would GPT-4 correct this? I’m not sure, though. Yet, it’s conceivable that a bigger model would perform better if it had more parameters, training data, and learning time.

Yet “ChatGPT and similar programs,” according to Noam Chomsky and his co-authors, “are incapable of distinguishing the possible from the impossible.” A tree, an apple, gravity, and the ground are all physical concepts that an AI would not understand, although, in most cases, it would continue to explain how an apple would fall on the ground due to gravity with spectacular accuracy. But the AI’s lack of comprehension of the real world would remain. And when the exact answer is unknown, it would continue to assign possibilities to impossibles without explanations.

The writer is Professor of Statistics, Indian Statistical Institute, Kolkata