How Heat Waves and AI Challenges Are Piling Pressure on Data Centers

Rising heat waves are placing strain on data centers around the world. Explore how AI exacerbates the issue and offers solutions for resilience against outages.

Nathan Eddy, Contributor

July 24, 2024

6 Min Read
US city in a heat wave
Hot in here: Heat waves can result in cooling system failures and data center outages.Image: Alamy

At a Glance

  • Record-setting heat waves have caused data center outages and highlighted the importance of maintaining optimal temperatures.
  • Power-hungry AI increases data center cooling challenges, but it can also enhance thermal management and optimize systems.
  • Data centers are adopting new technologies and transitioning to more efficient power systems to cope with increased demands.

The optimal temperature range is a crucial factor in the efficient operation of a data center. However, there is a serious – and growing – risk of outages as the US and other nations around the world enter a period of extreme heat.

Heat waves can cause data center components to overheat and fail, leading operators to shut down servers to prevent damage, resulting in downtime and potential outages.

In July 2022, for example, record-setting heat in London topped 104 degrees Fahrenheit (40 degrees Celsius), causing cooling system failures that knocked Google and Oracle data centers offline. Two months later, scorching heat knocked out Twitter’s Sacramento region data centers. 

Peter Mattis, CTO and co-founder of Cockroach Labs, noted sensitive electronic equipment and individual components within hardware such as servers, storage devices, and networking gear have a defined operating temperature to run optimally.

The recommended temperature range for a data center, which could be as low as 65 or as high as 95 degrees Fahrenheit. plays a key role in preventing overheating and potential damage to equipment.

This range is determined by the specific hardware target’s operational temperature range and the conditions in which that hardware can operate.

Related:Moody’s Report Reveals Surge in Data Center Demand Driven by AI Boom

“This is going to be a recurring problem and an increasing problem as we have more and more of these heat waves – you have a heat wave combined with a power outage and boom, your data centers are offline,” he said.

Mike Mattera, director of corporate sustainability at Akamai, explained fluctuating temperatures are always a consideration for data center operations, and expected ranges in weather are not a predominant issue.

“We’ve solved for that,” he said. “Conversely, extreme temperatures, especially heat, place enormous strain on the electricity grid and the potential increase in the use of the local domestic water system depending on the cooling system.”

When a heat wave hits, power and water usage will increase depending on the system and the cooling technology type, translating to additional strain on the local market.

Mattera noted this is an especially pertinent problem in areas where electricity and water resources are more finite, including Texas and Arizona.

Ensuring Continuity During Heat Waves

Mattera explained with the extreme heat being seen across the globe today, many people are involved in ensuring data centers can continue to operate.

The key stakeholders who ensure continuity during a heat wave are the site facility managers and, more broadly, the facility team, including electricians, mechanical engineers, and heating, ventilation, and air conditioning (HVAC) professionals.

Related:How a New Two-Phase System Aims to Revolutionize Data Center Cooling

“That team needs to ensure critical systems are up and running and that uninterruptable power is available on site if or when an issue arises,” he said.

He cautioned a slight power drop could disrupt components like pumps, fans, and compressors, inhibiting the system from cooling and conditioning air.

In addition, data center cooling has a vast network of control systems that require a steady flow of electricity to operate the various components of the system to ensure optimal flow of conditioned air into the data center space.

Zachary Smith, community board member of the Sustainable and Scalable Infrastructure Alliance (SSIA), said data center operators and the mechanical teams that support these facilities plan for a range of natural disasters and resource limitations.

He added data center operators then work closely with their customers to meet published or agreed upon Service Level Agreements (SLAs).

“They may also have contingency plans with their customers if resources or natural disasters require shutting down or limiting certain services,” he said.

From his perspective, the biggest focus over the past several years has been on efficiency – using the power, cooling, and water resources as effectively as possible and reducing waste throughout the facility.

Related:Data Center Industry Calls for Environmental ‘Nutrition Labels’ to Cut Carbon Emissions

This has been done by raising the data center temperature, improving monitoring solutions and intelligent building management systems, and advances in power distribution and conditioning.

Increasingly, data center operators are implementing liquid cooling technologies that can improve the efficiency of their facilities even more, while in many cases moving to closed-loop, ‘waterless’ cooling designs at the facility or IT equipment level.

“All of this helps the data center to be more efficient and operate under increasingly challenging conditions,” Smith said.

Krishna Subramanian president and COO of Komprise, said energy-efficient infrastructure and more effective cooling designs such as liquid cooling are two techniques currently being considered.

“Another effective but less explored strategy for efficient data center power management is to reduce the amount of actively managed data,” she said.

Since data consumes 30% or more of a data center’s resources, and since 80% of the data is cold, efficient data management can help reduce one-third of the burden on data centers without even requiring any overhaul of the infrastructure.

“As the frequency of heat waves rises, coupled with the greater heat output of higher density AI processors, the problem is compounding on two fronts,” Subramanian said.

Dta-Center-AI-Chip-Heat.jpg

AI Complicates Challenges, Offers Solutions

The continued rise of AI will contribute to the challenges but many also help solve the problem of keeping data centers running at acceptable operating temperatures.

AI is power-hungry and more AI processing increases data centers' heat output and power consumption, thus exacerbating the problem.

“On one hand, AI workloads for model training and inference with denser hardware configurations require a lot of computing power and energy,” Smith said. “Servers powering AI models and applications generate a lot of heat that must be dissipated and cooled.”

This is where a lot of rack-level innovations are happening to increase cooling and power efficiency.

This includes moving from air-cooled data centers to liquid and immersion cooling at the rack level and moving from 12V to 48V for more efficient heat dissipation.

Read more of the latest AI data center news

Mattera said the complex computations that occur with training these models require more resource-intensive hardware, leading to increased overall power for the models to run optimally.

“This increased resource utilization and power generation translate into more heat within a data center, which strains the cooling systems,” he explained.

Furthermore, the dynamic nature of AI algorithms and models can cause spikes in power demand and heat generation, which a traditional cooling system might struggle to keep up with.

"Given the huge investments in centralized data center buildouts over the past year to support the voracious appetite for LLMs, I expect we’ll see increased strain on the grid,” he said.

Smith noted while the rise of AI workloads is creating more challenges for keeping data centers at optimal operating temperatures, it can also be an antidote to the problem.

This can include AI to optimize thermal performance management, including demand flow for liquid cooling or airflow and predictive maintenance for cooling systems.

“With the increase in heatwaves, AI can also be used to power systems for real-time weather and longer-term environmental patterns that allow for automatic adjustments in energy consumption and cooling systems based on external factors,” he said.

About the Author

Nathan Eddy

Contributor

Nathan Eddy is a freelance writer for ITProToday and covers various IT trends and topics across wide variety of industries. A graduate of Northwestern University’s Medill School of Journalism, he is also a documentary filmmaker specializing in architecture and urban planning. He currently lives in Berlin, Germany.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like