AI’s energy dilemma: Can GPUs keep up with rising demand?

Sanjana B Updated - April 02, 2025 at 07:43 PM.

With data centres struggling to dissipate heat fast enough, experts warn that simply scaling up hardware is no longer a viable long-term solution

Although literal melting is rare due to built-in safeguards, with exaggerated claims of “GPU melt”, this narrative serves as a reminder of the physical limits reached

As AI models grow larger and more complex, the Graphics Processing Units (GPUs) powering them are running hotter than ever —sometimes to the point of overheating or even “melting”. Following the recent trend of content generation using ChatGPT to “Ghibli-fy” images, even OpenAI CEO Sam Altman stated GPUs were overheating at alarming rates, sparking concerns about electronic waste and environmental impact.

With data centres struggling to dissipate heat fast enough, experts warn that simply scaling up hardware is no longer a viable long-term solution. Instead, the industry must innovate by developing more efficient algorithms, smarter cooling systems, and sustainable operational practices to keep AI’s growth from burning out its infrastructure.

Environmental concerns: AI’s growing energy footprint

According to Jaspreet Bindra, the co-founder of AI&Beyond, GPUs are power-hungry, with a single high-end GPU potentially consuming 300–700W. AI data centres often house thousands of them. Training large AI models like GPT4o or Gemini 2 can consume hundreds of MWh, equivalent to powering a small US town for years.

“As AI adoption grows, AI-related electricity use could rival that of small countries with predictions exceeding Japan’s power requirement in a few years. Carbon footprint depends heavily on the energy source (coal vs renewables). For fossil-fuelled grids, the impact is significant,” he explained.

Rahul Mahajan, the CTO of Nagarro, a digital engineering company, attributed this “melting” to the immense computational load required by today’s large AI models.

Substantial heat

“Running these complex calculations, especially continuously at peak capacity, as often happens during training or large-scale inference, generates substantial heat. If the cooling infrastructure — whether in a data centre or a workstation — can’t dissipate this heat fast enough, the chip’s temperature rises, leading to performance throttling or even shutdown.”

Although literal melting is rare due to built-in safeguards, with exaggerated claims of “GPU melt”, this narrative serves as a reminder of the physical limits reached. This also highlights the unsustainable thermal and energy demands of throwing more hardware at the problem.

The community is now tasked with developing more efficient algorithms and sustainable operational practices, rather than just relying on brute-force compute. “Overheating itself, however, remains a very real operational challenge in demanding AI applications.”

Inadequate cooling due to broken fans, poor thermal paste, airflow issues, and manufacturing defects like soldering issues and faulty power delivery, or poor system design in some custom-built PCs or data centres are among other reasons, Bindra said. Alongside, high workloads over extended periods due to heavy user load like during the recent image generation trend can also cause this.

However, most modern GPUs have thermal throttling to prevent overheating. While melting incidents may make headlines due to design flaws, they are not everyday occurrences, he shared.

Mahajan elaborated on the environmental costs associated with the raw materials for GPUs. Producing these chips requires mining and processing materials like silicon, copper, and gold, among other rare earth elements. These processes naturally involve environmental disruption like habitat damage, water usage, pollution from tailings, and significant energy consumption for extraction and refinement, adding to the carbon burden even before the GPU is powered on.

The road ahead: Rethinking AI’s infrastructure

Rohit Pandharkar, Technology Consulting Partner, EY India, said, “The most practical and achievable way to optimise the environmental impact of GPU and Gen AI usage is to optimise algorithmic processes and data centre architecture. Using techniques like Distillation, Quantisation and Model pruning in LLM usage, a smaller GPU and environmental footprint can be achieved. In the area of data centre sustainability, multi-tenant GPUs can be used in shared workloads to reduce idle GPU time and further reduce total GPU footprint.”

The Nagarro CTO noted that manufacturers are continually innovating through improved chip architectures, transitioning to smaller, more efficient manufacturing processes, optimising software and drivers, and developing specialised AI accelerators such as tensor cores designed to perform specific AI tasks using less power.

Advanced cooling technologies also play a role, often delivering better performance-per-watt with each generation. However, these are gradual enhancements within the current framework and are struggling to keep pace with the exponential rise in model complexity and deployment scale.

Vast datasets

“We are nearing the limits of the current approach. Reductions in the environmental footprint will likely require big breakthroughs, potentially involving fundamentally different AI architectures moving beyond simply mimicking patterns in vast datasets, or exploring entirely new computational methods, maybe even quantum computing down the line, to achieve intelligence more efficiently.”

While alternatives like AI chips and quantum computing could replace GPUs, though early to know, especially for quantum, Bindra observed. AI chips, ASICs like Google’s TPU, are more efficient for specific AI tasks, and Quantum computing is not a near-term replacement but could outperform GPUs once scalable. However, GPUs for most AI and graphics workloads, will dominate soon.

Published on April 2, 2025 14:13