Data Center Disaster Recovery: Essential Measures for Business Continuity

Data center disaster recovery is vital for business continuity. Learn essential strategies to protect against outages, human error, and cyber-attacks.

Alyse Burnside, Contributor

July 29, 2024

8 Min Read
Data center disaster recovery illustration
Image: Alamy

From texting and streaming services to critical government, education, and healthcare applications, data centers enable daily life as we’ve come to know it. With the world relying on data centers more than ever, it is crucial to ensure these facilities remain secure and operational. As such, digital infrastructure organizations must develop strong data center disaster recovery plans.

What Is Data Center Disaster Recovery?

While advancements have been made to avoid data center downtime at the construction stage and through backups and secondary power sources once operational, data centers are still vulnerable to unforeseen circumstances, including natural disasters, human error, and cyber-attacks.

Although it’s impossible to prevent every disaster, it’s crucial that organizations do everything they can to prepare for the worst. The best way to ensure that data centers are ready for the unexpected is to develop a strong plan for data center disaster recovery.

Types of Data Center Disasters

Power Outages

Power outages are often a primary cause of data center downtime and systems failure. This can result in significant losses, both in terms of revenue and customer confidence. Businesses are increasingly turning to hybrid providers and cloud services to ensure their data is backed up by redundant systems and limit the number of customers affected by a potential outage.

Related:Top Data Center Outage Trends and Strategies for Reducing Risk

Human Error

To err is human and, therefore inevitable, but of the disasters data center operators can expect, human error is a risk that can be significantly decreased with the right preventative measures. According to Uptime Institute’s 2022 Outage Analysis Report, human error accounts for around two-thirds of all outages.

“Nearly 40% of organizations have suffered a major outage caused by human error over the past three years,” the organization said. “Of these incidents, 85% stem from staff failing to follow procedures or from flaws in the processes and procedures themselves.”

Examples of human error include accidentally disconnecting power sources, overloading circuits, or unsafe structural design.

Cyber-Attacks

While power outages, structural damage, and human error are the cause of many data center disasters, cyber-attacks including ransomware are also high on the list of threats to data centers – and these cyber-attacks can be just as expensive. According to AFCOM’s 2023 State of the Data Center report, two-thirds of global organizations suffered a cyber-attack in 2022, and businesses were disrupted for an average of five days due to the attacks.

Related:How Heat Waves and AI Challenges Are Piling Pressure on Data Centers

Why Data Centers Need a Disaster Recovery Plan

In the face of numerous operational risks, a disaster recovery plan is arguably the single most important step in preparing for a data center emergency.

A real-world incident illustrates this well: On October 15, 2021, a fire broke out at two major South Korean tech companies, Kakao Corporation and Naver Corporation. While Naver was able to get its servers up and running relatively quickly, Kakao’s servers were down for hours, leading to widespread and significant disruption for users who suddenly could not use their messaging platforms, payment apps, or rideshare services.

Importantly, although Kakao did have a disaster management protocol in place, that protocol did not account for the power outage at the time of the fire, slowing down service restoration efforts. Learning from this incident, Kakao put together a recurrence prevention committee to prevent a similar event from happening.

Data-Center-Disaster-Fire.jpg

Data shows that businesses are increasingly understanding the importance of disaster planning. According to Forrester’s "State of Disaster Recovery Preparedness in 2024" report, nearly 90% of organizations have some form of disaster recovery plan. In the same stroke, however, the majority of respondents (70%) allocate very little of their budget (0%-10%) to disaster recovery planning. One issue is that disaster recovery planning is largely the responsibility of IT workers, with little direct reporting to C-suite executives.

Related:A History of AWS Cloud and Data Center Outages

“Disaster recovery programs have limited C-suite visibility, with only 41% of disaster recovery program heads reporting to a C-level executive,” Forrester said. “Though in this year’s survey, we saw an equal number of respondents report that the head of disaster recovery reports two levels down from the C-suite – a big jump from the 26% reported in our last survey. Moving the role up in the organization strengthens alignment with overall business needs and increases access to resources for ensuring technology resilience for critical business.”

Future-Proof Data Center Construction

While there’s no way to prevent a natural disaster, data center developers are designing facilities that are considerably more resistant to extreme weather, fire, and geographic demands.

Each data center must be designed with the specific geography of its location in mind. Greg Metcalf, senior director of design at Equinix, explains how the operator’s Miami facility is built to withstand “extreme weather conditions” including a Category 5 hurricane. “This facility has 17-inch-thick walls and is strategically located 14 feet above sea level, which is a significant elevation in a city like Miami,” Metcalf told Data Center Knowledge.

With facilities located in ‘Tornado Alley’ in the US Midwest, Tonaquint Data Centers developed its “tornado-resistant” data centers for its Oklahoma campus, in which engineering analyses were used to design a facility that could withstand wind speeds of up to 310 mph – the highest wind speed recorded in Oklahoma. Terry Morrison, the CTO of Tonaquint Data Centers explains which considerations factored into their design.

“We studied optimal building materials, construction techniques, and facility layouts to survive F5 tornado forces, including wind and flying debris, while adhering to IBC 2003 specifications,” said. Engineers helped design unique louver systems capable of operating in hurricane-force winds.

“We engineered redundant power and cooling systems to keep operating through severe storms. Structural analyses validated the bespoke building materials, construction methods, and layout to survive extreme winds and uplifts. All support equipment, including generators and otherwise, are internal to the data center, meaning the interior equipment is protected and able to operate in tornado conditions.”

Developing a Data Center Disaster Recovery Plan

Determine Your Data Center's Mission-Critical Services

When developing a disaster recovery plan, it’s crucial to understand which services are mission-critical. One such way some businesses are approaching disaster recovery is through resilience and reliability practices, which allow an organization to recover from outages by including off-site backups, which might feature a secondary infrastructure for failover.

Consider the Costs

It is also important to consider not only the cost of downtime or structural damages, but who your data center services impact, as well as what a natural data center disaster might mean for the local community. Morrison of Tonaquint Data Centers suggests disaster recovery program heads include local officials when developing an incident response or disaster recovery plan.

“Data center disasters can disrupt local community services, like government functions, utilities, healthcare, and internet access,” he told Data Center Knowledge. “Disaster recovery plans should account for the direct and indirect impacts on citizens’ lives and provide contingency plans to enable basic community functionality during an outage. Disaster recovery plans should consider providing alternate community ‘access points’ during disasters like WiFi-connected disaster recovery centers where citizens can file claims and connect with loved ones. Operators should coordinate with local officials on disaster recovery planning.”

Implement Security Best Practices

In terms of cybersecurity, as attackers become more sophisticated in their methods, data center IT must enhance security practices with regular backups, endpoint protection, frequent penetration testing, and continual workforce training.

Backing up data is one of the key challenges in disaster recovery. Data center operators might opt for SaaS-based backups, which limits the need for on-premises server management. SaaS data is hosted online, making it accessible from anywhere which enables operations to continue in the event that a facility is inaccessible. “[SaaS-based backups] provide inherent disaster recovery since SaaS data is stored remotely, providing redundancy. SaaS providers manage the underlying infrastructure and disaster recovery, reducing the burden on organizations,” Morrison says.

Develop Your Disaster Recovery Plan

Data center disaster recovery plans should be tailored to an organization’s specific needs, but the SANS Institute offers some general guidelines organizations must consider when designing a disaster recovery plan for data centers.

Key elements of a data center disaster recovery plan

Once a comprehensive plan is developed, organizations must ensure all key data center employees are aware of the protocol for declaring an emergency. In addition, organizations must perform frequent testing of their incident response and disaster recovery plan, which might include running simulations of disaster scenarios. 

At this year’s Data Center World (DCW) expo, Jose Pelicano, technical program manager at Cloudflare, underlined the importance of having a disaster recovery plan. Pelicano offered a real-world example, where a Cloudflare data center was impacted by a flood.

“Everything was down,” he said during DCW. “Everybody started calling the IT department in charge of the data center. Immediately the next day, the management decided we need to avoid this situation [from happening] again.”

READ MORE: Incident Response: Lessons Learned from a Data Center Fire

In addition to creating a disaster recovery facility where critical services could be shifted in the event of a widespread outage, Pelicano said Cloudflare placed renewed focus on its incident response procedures.

“Why are procedures important?” he said. “When you have a disaster situation, you don’t want to start thinking about what you need to do. [The] disaster may happen during business hours, it may happen on the weekend, or it may happen it may happen on Christmas Day or Thanksgiving.”

Given the unpredictable nature of outages, Pelicano said a list of easy-to-follow procedures will make it clear what each team needs to do in case of a disaster situation. Importantly, teams also need to rehearse these procedures so they are well prepared for any situation.

“You need to practice. You need to test [the incident response plan] with some regularity because otherwise, you may discover that if you don’t test the procedure… you may find out that something is not working,” he said.

About the Author

Alyse Burnside

Contributor, ITPro Today

Alyse Burnside is a writer and editor living in Brooklyn. She is working on a collection of personal essays about queerness, visibility, and the hyperreal. She's especially interested in writing about cybersecurity, AI, machine learning, VR, AR, and ER. 

alyseburnside.com

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like