Google’s new large language model - PaLM 2 - introduced at the company’s annual developer’s conference last week, uses almost five times the amount of text data for training as its predecessor model LLM to perform advanced coding, math, and creative writing tasks, a report by CNBC has revealed. The tech giant earlier said that the model is trained in 100 languages and performs a wide range of tasks

Google has said that PaLM 2 is smaller than previous LLMs.

The tech giant claimed that PaLM 2’s wide-ranging dataset includes scientific papers and web pages that contain mathematical expressions. According to reports, the first version of PaLM (Pathways Language Model) in 2022 was trained on 780 billion tokens and 540 billion parameters. The PaLM 2 model is trained on 3.6 trillion tokens and 340 billion parameters, an internal document viewed by CNBC revealed. Tokens, strings of command, are used for training LLMs.

LaMDA, a conversation LLM, that Google introduced two years back was trained on 1.5 trillion tokens. The technology was touted in February alongside Bard AI.

At Google I/O, the company announced over 25 new products and features powered by the model, including the expansion of Bard AI to new languages. “It is being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API,” Google said.

Google earlier in a blog post that its language model has improved multilingual, reasoning, and coding capabilities. The tech giant improved the ability to understand, generate and translate nuanced text — including idioms, poems, and riddles — across a wide variety of languages.

Meanwhile, Google recently opened access to its Bard AI chatbot to more than 180 countries. Earlier this month, the company opened its chatbot Bard AI for all users with Workspace accounts.