Top Data Center Outage Trends and Strategies for Reducing Risk

Uptime Institute’s latest report on data center outages reveals what seems to cause the most outages – and how companies can reduce their risk.

Christopher Tozzi, Technology Analyst

July 18, 2024

4 Min Read
Abstract composite of energy problems and conflicts
Alamy

Data center outages are on the decline, and investment in on-site backup systems is the main reason. That's the one-line takeaway from the latest Uptime Institute study of data center outages.

Keep reading for a deeper dive into data center outage trends this year, as well as an analysis of what they mean for data center resilience and recovery planning.

The key findings from the Uptime report include the following:

  • The total number of outages per facility has decreased compared to earlier Uptime Institute reports. (In absolute numbers, outages have increased, but that's because there are more data centers than there were in the past.)

  • Fifty-five percent of organizations reported having experienced a data center outage in the past three years.

  • However, only 27% of organizations that experienced an outage identified it as "significant, "serious" or "severe."

  • This means that overall, fewer than 15% of businesses have been subject to a notable outage within the past three years.

  • The failure of power and cooling systems was the most common cause of data center outages, accounting for about 71%t of all outages.

  • Human errors contributed to about half of notable data center outages, with failure by staff to follow procedures topping the list of types of human errors associated with this trend.

  • Cyberattacks were negligible as a cause of data center outages, accounting for a mere 1% of all such events. (It's important to note that the study examined the causes of outages affecting data center facilities as a whole, not disruptions to individual workloads. If it had done the latter, cyberattacks would probably have factored in much more prominently.)

Tnail_UII_Keynote_Outages_Power.webp

Why Data Center Outages are Declining

The main reason why data center outages are declining in frequency, according to the Uptime research, is that companies have invested in redundancy systems for their facilities. More than one-third of respondents reported having increased power and cooling system redundancy.

The Uptime Institute cites this data to suggest that building redundancy into each data center – as opposed to constructing multiple data centers and distributing workloads across them – is the best way to improve overall uptime. It says this trend flies in the face of "expectations that multi-site approaches will undermine expensive, physical site redundancy strategies."

That said, a statistician (which I am not, although I once took an “Introduction to Statistics” course in college) might take issue with the implication that a correlation between higher rates of system redundancy and lower outage frequencies translates to causation. It's not actually crystal-clear that this is the case, and the Uptime research doesn't elaborate on this point.

Nor does it detail how investments in multi-site strategies have changed in recent years. It's plausible that the average number of sites has also increased, which could be a factor in lower outage rates.

Still, the undeniable fact is that more companies are investing in redundancy, and there is at least a correlative relationship between this trend and decreased outages.

Strategies for Reducing Data Center Outages

On the whole, the report suggests that the following are winning strategies for increasing data center availability and reducing the risk of outages as of 2024:

  • Invest in redundant power and cooling systems (keeping in mind the caveats discussed in the preceding section).

  • Deploy advanced resiliency solutions, such as software that automatically moves network traffic and workloads during an outage. Uptime says this approach "can reduce outage risks and their associated impact over time," although it notes that there may be a temporary increase in outages because it might take time for companies to learn the intricacies of the new software.

  • Don't focus on cybersecurity as a key strategy for preventing data center outages. Protecting individual workloads is certainly important, but the data shows that cyberattacks very rarely cause entire data centers to fail.

  • Invest in training for data center technicians, and/or automate processes using autonomous tools, to reduce the risk of outages caused by human error.

Conclusion

No single survey of data center outage trends can reveal everything that businesses should do to increase uptime. But the Uptime Institute's data is among the most recent and detailed information available about what seems to cause outages and how companies can reduce their risks, and the takeaways are clear: Overall outage rates are declining, plausibly because of increased investment in redundancy – although human error remains a major threat.

About the Author

Christopher Tozzi

Technology Analyst, Fixate.IO

Christopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like