Stress Test: The Data Center Industry and the Pandemic
Enterprise data center teams, data center providers, and cloud platforms meet the biggest crisis of their lifetimes.
It has not been an easy month for the data center industry.
Inhouse corporate data center teams rushed to build out their ability to support more remote workers than ever before. Commercial data center providers tightened access to their facilities around the world and dusted off their pandemic contingency binders – if they had them. Some data centers had to be disinfected after having coronavirus-positive visitors.
Data center operators scrambled to decide and formalize which of their staff were essential to keep onsite, which to send home, and make sure that among those sent home were people who could take over for those remaining onsite in case the latter get sick.
Data center providers have put most construction projects on hold. Facebook has suspended data center construction in Huntsville, Alabama, and in Clonee, Ireland.
As work and school shifts to bedrooms, living rooms, and kitchen tables, people are leaning on internet services more than they ever have. Besides trying to keep their own families’ lives in balance, infrastructure teams responsible for making sure those services stay online have been busy expanding bandwidth on their networks and computing muscle in their data centers.
Microsoft Azure reported a massive surge in use of its cloud services in some regions over the last week, which caused problems when customers were trying to spin up some cloud compute resources. The company, however, said it hadn’t seen any “significant” service disruptions. It is now “expediting the addition of significant new capacity that will be available in the weeks ahead.”
The Internet Is Doing OK, Mostly
The internet as a whole has been handling record spikes in traffic -- and shifts in when people access the internet and from where -- without major meltdowns. However, ThousandEyes, which monitors global network health, has noted an upward trend in network outages in various parts of the world.
People in charge of infrastructure for several popular online services said the problems have been mainly at the “last mile” – the networks that deliver internet content to end users in homes and offices. Some of those networks aren’t architected to support the levels of traffic they are now seeing.
That’s the reason big content services like Netflix and YouTube have been reducing video bitrate in some parts of the world. They want to relieve some of the strain on the last-mile ISPs whose networks weren’t ready for all their customers to get online at the same time, said John Graham-Cumming, CTO of Cloudflare, operator of one of the world’s largest content delivery and DDoS protection networks, told us.
Data Center Access Restrictions
Equinix, the world’s largest data center operator by revenue, cut off nearly all customer and vendor access to its facilities in France, Germany, Italy, and Spain last week, allowing people in only for “critical and essential work.” The company said it would extend the same measures to its UK data centers starting March 31.
Equinix has not done this in New York and Santa Clara, California, both major data center markets heavily hit by the pandemic. A company spokesperson did not respond to a request for explanation in time for publication.
Out of the big providers, Equinix’s measures in Europe have been the most drastic. Its largest rival, Digital Realty Trust, has reduced critical onsite staff to a minimum in areas of the world where US health regulators have said the virus was spreading aggressively (Level 1 or higher). The company has been taking visitors’ temperature before granting access to its facilities in Singapore and Hong Kong.
CyrusOne, which operates data centers in the US and Germany, continues to provide customer access to all its facilities, “subject to certain pre-screening questions,” Anrea Munoz, CyrusOne’s VP of operations and customer success, told DCK via email.
Its tenants in Germany are hyperscale cloud platform operators, which use their own onsite teams, and “suspending all customer access has the potential to compromise their ability to maintain critical services,” she said in response to the question why CyrusOne hasn’t taken the same measures Equinix has in that country.
Ramping Up Remote Work
The most frequently sited bottleneck the crisis has created for enterprise IT teams has been enabling a larger remote workforce than ever before. Many companies’ networks hadn’t been set up to give most of their employees private, secure network access to corporate networks from their homes.
One large US transportation company for example could support VPN access to its network for a single-digit percentage of its workforce (in the thousands) before the crisis hit, a senior IT and data center leader at the company who asked not to be named told DCK. But the company’s management decided to have everyone work from home in the beginning of March, and the team in the company’s two data centers undertook “emergency deployments to increase capacity and capability for remote workers.”
While social-distancing requirements caused some segments that drive business for the company to grind to a halt (automotive manufacturing for instance), other segments have been busy “with spot business,” where things like medical supplies need to be moved in an emergency fashion, he said.
The VPN capacity ramp had to be quick, but the team had been mostly prepared for it. Conversations about readying the infrastructure to respond to the crisis had been taking place in February, and a “war room” had been established. Still, “we all underestimated the velocity of this thing coming up,” the data center leader said. “Who knew it would come this quickly?”
The team increased VPN capacity about 10-fold in a “very short period of time,” he said. The company had a bring-your-own-device (BYOD) strategy for remote workforce. As of last Friday, after ramping up, about half of the company's employees could access the company’s network from their own machines at home through the VPN, he said.
They also expanded capacity for supporting remote application access via the Citrix thin client and created dashboards for business executives who now wanted visibility into the infrastructure. “A lot of dashboards were suddenly requested to track things” that were tracked before but not displayed in a way executive-level management could easily understand.
In the company's data centers, the team has been following advice like the list of recommended pandemic-related steps by the Uptime Institute. “I hope we’ve done all of them, ” he said.
Among others, those steps include blocking vendor access for preventive maintenance, allowing them in only to fix problems; physically separating people that had been sharing office space; reducing the amount of staff per shift down to the most essential personnel; and rotating essential personnel (“three days on, two days off”) to ensure there’s backup available if a person with essential skills falls sick.
Essential Staff
Fred Dickerman, a senior VP at the Uptime Institute, said he and his colleagues had been in touch with their clients, including both enterprise data center operators and commercial data center providers, and the exercise of sorting through staff to identify who was essential to keep onsite and who wasn’t had, predictably, been common across the industry these weeks.
“We’re seeing a lot of evaluation of who’s essential, who’s not essential,” Dickerman told DCK. “The approach has been almost universally: OK, identify some of the essential people and send them home, so they become the reserve, or split it up into teams and make sure the teams don’t ever cross-contaminate.”
In many cases, senior-level facility and site managers are asked to work from home, he said. While they are essential, they have the knowledge and experience to step in for multiple types of lower-level employees, such as line engineers, technicians, or operations staff.
Dickerman said he expected that one of the broad industry changes to come out of this crisis will be organizations formally documenting which staff are essential to keep onsite during an epidemic, and who are their backup that should stay at home. Teams know who those people are today, but it’s rarely documented, he said.
Can Data Centers Run Unmanned?
On March 23, Digital Realty Trust notified customers at three of its data centers in New York and New Jersey that the facilities had been visited by individuals who had tested positive for the coronavirus, Marc Musgrove, a company spokesperson, told us. (This was first reported by DCD.)
One of the individuals entered the big New York City carrier hotels at 60 Hudson and 32 Avenue of the Americas on March 16, Musgrove confirmed. Another individual visited Digital Realty’s data center at 2 Peekay Dr. in Clifton, New Jersey, two days later.
In response, the operator had common areas in the affected facilities cleaned and disinfected, the spokesman said.
The company issued an update Monday afternoon Pacific time, revealing more known instances of potential virus exposure in its facilities. A person who last worked at one of its facilities in Elk Grove, Illinois (outside Chicago), on March 24 has tested COVID-19-positive, the company said. Customers were notified on March 27. Another person who last worked at a Digital Realty data center in Piscataway, New Jersey, on March 24 has tested positive. Customers were notified on March 28. A person who last worked at a Digital Realty facility in Atlanta on March 24 has tested positive. Customers were notified on March 30.
The company said that in all three cases it "completed a full disinfection procedure for all common areas."
In an FAQ on its website, Digital Realty said it would evacuate a facility in case a person who visited it tested positive for 24 hours and conduct “a full building disinfection” in common areas and any areas operated by Digital. It’s unclear whether the facilities in New York, New Jersey, Chicago, and Atlanta were evacuated.
If it does come to evacuation – of any data center – whether the facility can stay online without staff for the evacuation’s duration depends to a great extent on its design, Uptime’s Dickerman said. A facility with enough infrastructure redundancy to meet Uptime’s Tier III or IV standards could run for 24 hours unmanaged, he said. But “it is a little bit like closing your eyes [and] driving your car on a straight stretch of highway… You don’t want to do it for too long.”
A facility with a lower level of redundancy would be too risky to leave fully unmanned.
Also playing a role here is the organization’s philosophy about remote data center management tools and security. There are two general camps, Dickerman explained. In one camp are organizations that have remote network operations centers, remote facilities monitoring, and provide technicians remote access through private network connections.
In the second camp are organizations that treat their data centers like “fortresses.” They don’t do any remote monitoring and don’t allow any remote connections. In some cases, when a vendor must access a particular piece of equipment remotely, they’ll have someone onsite physically plug into the network and unplug once the work is done.
While both approaches are valid, organizations in the first group are finding themselves better prepared for the current crisis than the ones in the second, Dickerman said.
Big Unknowns
Because planning and preparedness are in the data center industry’s DNA, Dickerman said, citing a phrase by one of his colleagues, data center operators have mostly taken all the necessary steps to minimize exposure to the virus while ensuring they can keep their facilities running.
But there are some big unknowns today that are concerning. One is the uncertainty about when the crisis is going to subside, he said. Should operators make plans to operate under the current conditions for the 18 months development of a vaccine for COVID-19 is expected to take?
Also unknown are plans governments could be making behind the scenes to escalate measures to fight the spread of the virus. They may not want to reveal those plans now “because they don’t want to scare the bejesus out of people,” Dickerman said.
Most operators he’s been in touch with are making plans to get through spring and well into the summer, he said. “Most authorities are saying there should be a shift in the environment by that time, but nobody knows that [for sure]… and people are thinking, ok, what if it does last a year?”
About the Author
You May Also Like