IT Leaders Brace for Major Outages in 2025, PagerDuty Study Reveals

A PagerDuty study finds that 88% of global executives anticipate major IT disruptions in 2025, highlighting widespread concerns about operational resilience.

Sean Michael Kerner, Contributor

December 17, 2024

3 Min Read
road sign warning "Disruptions Just Ahead"
Alamy

IT operations should prepare now for what could be a challenging year in 2025.

A study by PagerDuty reveals that technology and business leaders are bracing for significant IT disruptions in 2025. The study, conducted by Wakefield Research, surveyed 1,000 senior executives across four major markets. It paints a sobering picture of organizational preparedness.

The findings come amid increasing complexity in digital operations and following a series of high-profile global outages in 2024 that exposed vulnerabilities in many organizations' incident response capabilities.

Key findings from the global study include the following:

  • 88% of executives anticipate a major IT incident within the next 12 months.

  • 44% resorted to manual processes during recent disruptions.

  • 39% experienced significant impacts on organizational decision-making.

  • 37% reported direct revenue losses or inability to process transactions.

For Eric Johnson, the CIO of PagerDuty, the biggest surprise was the revelation that 86% of executives had been prioritizing security at the expense of readiness for service disruptions.

"It's still important for organizations to protect their data and systems from external threats, but they also can't neglect the risks associated with downtime or outages caused by service disruptions — many of which aren't security-related," Johnson told ITPro Today. "The report highlights how critical it is for organizations to prioritize preventing service disruptions to protect against revenue and reputational harm."

Related:Data Sovereignty: Where Is Your Data Held and Who Has the Right to Access It?

Why Technical Debt and Complexity Are Barriers to Resilience

The report identifies multiple barriers that organizations need to overcome to improve resilience. Among the top issues are technical debt and complexity.

Johnson pulled quote

As organizations keep piling new apps on top of old systems, Johnson noted that the numerous interdependencies between applications create a vulnerability where service disruption becomes inevitable.

"Organizations face mounting technical infrastructure challenges, with nearly half of the exec respondents pointing out that outdated tech and a lack of real-time data tools are major weak spots," Johnson said.

How to Balance Security and Service Disruption Readiness

What's clear is that there is a need for some balance between security and operational resilience. 

In Johnson's view, the balance should be where there is trust in protecting the organization's data and infrastructure, but in a way where operations teams don't become too inundated to be preventative because that's where the neglect comes in.

Related:Equinix Outages Through the Years: Key Incidents and Lessons Learned

"Ideally, teams should have the bandwidth to focus on building proactive measures and mitigate the risk of operational failures," he said.

There are several things that organizations can do to help establish the right balance:

Automation. The use of AI and automation are valuable tools here as they can empower engineers to automate the low-level, repetitive actions and minimize toil. Johnson noted that the extra time allows them to instead focus on high-priority items that can help anticipate incidents better and develop more efficient strategies for quick remediation.

Monitoring and alerting. To ensure they're prepared for a service disruption, organizations should establish more robust monitoring and alerting systems that allow them to stay ahead of risks before they escalate into bigger incidents.

Clear communications. Enhancing clear communication plans and cross-department coordination helps teams to be more aligned on response strategy and restore service more efficiently. 

Testing. Regular stress testing and drills, particularly those simulating system failures, are also key in building operational resiliency and refining recovery processes.

Read more about:

ITPro Today

About the Author

Sean Michael Kerner

Contributor

Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He consults to industry and media organizations on technology issues.

https://www.linkedin.com/in/seanmkerner/

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like