How to Prevent DRUPS-Related Data Center Outages
DRUPS systems were implicated in several high-profile outages recently. If you're deploying DRUPS, here's what to watch for.
May 17, 2017
Diesel rotary uninterruptable power supply (DRUPS) systems were implicated in power disruptions that in the past three years affected Amazon Web Services in Sydney, a former Telecity facility called Sovereign House in London, now owned by Digital Realty Trust, and the Singapore Stock Exchange. Disruption at Amazon was caused by what the company called “an unusually long voltage sag.”
There are several steps data center operators can take to minimize the risk of DRUPS-related reliability issues.
How a DRUPS Works
“DRUPS systems use kinetic energy generated by a flywheel. Its momentum generates enough energy to deliver 15 to 20 seconds of ride-through before the diesel generator comes on,” Peter Panfil, VP of global power for Vertiv, says. Because nearly 99 percent of power disruptions last less than 10 seconds, kinetic energy generally is sufficient to ride through the fluctuation without requiring the diesel generator. As it approaches the threshold, however, the genset activates. Frequent on/off cycling causes wear.
Unlike battery UPS systems, data center DRUPS are line-interactive – they draw power directly from the utility to keep their flywheels spinning. Consequently, they don’t have power conditioners, so any fluctuations in utility power are passed on to the DRUPS.
“A single DRUPS failure isn’t a problem with 2N redundancy,” Jacob Ackerman, CTO of SkyLink Data Centers in Florida, says. SkyLink prefers battery-based UPS systems to DRUPS. The decision, he says, was based on the desire to maximize cross-over time in case generators have issues.
That said, Ackerman recommends running multiple DRUPS units in parallel rather than in isolated redundant mode. With that configuration you can avoid an outage even if one generator fails. “DRUPS systems are a UPS for the entire facility, including chillers and air handlers. So, if your DRUPS goes down, everything goes down.”
Issues to Watch For
Synchronization-related failures can occur when power voltage and frequencies of the utility and the bypass path don’t match. DRUPS is line-interactive, so it’s synchronized initially. The challenge is in syncing back to utility power. “If the utility wobbles, my UPS has to wobble with it,” Panfil explains. Following the fluctuations mechanically can be challenging. A battery system eliminates that need with a double power conversion (AC to DC to AC), which conditions the power. Synchronization issues reportedly caused the outage at the Sovereign House colocation data center.
Voltage sag can cause DRUPS units to back-feed and trip. When using a mechanical system to compensate for momentary reductions in voltage the system should have a slight lag to ensure the power generated by the DRUPS goes to the data center rather than back-feeding to the utility.
In Amazon’s Sydney outage, the breakers that isolated the data center DRUPS from utility power didn’t open quickly enough, which caused the DRUPS power to feed back to the power grid. Amazon fixed the problem by adding more breakers and conducting regular system tests on unoccupied hosts within AWS.
See also: How Amazon Prevents Data Center Outages Like Delta's $150M Meltdown
Bad fuel could be a factor in any diesel genset failure. Diesel fuel doesn’t last indefinitely. After six to 12 months it may be contaminated by bacteria, water and solid particulate. Fuel also may gel.
Cisco keeps 96,000 gallons of diesel on hand at its Allen, Texas, data center – enough to run at full load for four days. The facility’s staff refresh the fuel every three to four months and store it in environmentally-controlled areas to protect it from temperature variations, Sidney Morgan, Cisco Distinguished Engineer, says.
Maintenance and mechanical issues – like a genset starter failure – can derail any generator. DRUPS systems, however, sometimes can be started using the flywheel momentum. This is like popping the clutch in a truck for a rolling start.
The maintenance schedule should include inspecting data center DRUPS units:
Weekly, to check coolant fluid levels and winding and bearing temperatures
Monthly, to assess wear on carbon brushes and to test cross-over capabilities
Yearly, to change oil, check the control circuit frequency and clean the unit
At five years, to replace bearings and inspect internal components
“It’s just like maintaining your car,” Morgan says.
Human error also can trigger critical load interruptions. “People are generally the weakest link,” Ackerman says. Although load switching occurs automatically, company policy and documentation govern activities during outages. “You can build a fully redundant system, but if someone flips a couple of switches during testing and maintenance or during a failure, it can cause an outage.” One data center experienced an outage because the switch that would move power from the generator back to the utility failed, and the data center lacked the protective arc suits its personnel needed before they could enter the area and throw the switch manually.
Practice your procedures for outages. “It’s not easy, but run a ‘pull the plug’ test on an isolated system to ensure you can cross over and cross back without issues,” Panfil says. “Systems often don’t operate the way you expect.”
DRUPS design affects reliability. Several designs are on the market, including models with dynamic speed control, in-line mechanically-coupled storage, and solid state energy transfer for power storage. When evaluating data center DRUPS units, also consider ease of maintenance.
Cisco’s Allen data center has used an integrated DRUPS system since going online in 2011. “Because the DRUPS was designed as a self-contained unit in which electromagnetic clutches connect the flywheel to the diesel generator, the unit engages within five seconds. This system, (which uses eight DRUPS units to generate 15MW of power) has been active more than six years with no failures,” Morgan says.
Limited lifespan is a concern with any system. “If inheriting a DRUPS, know how old the bearings are, the start cycle count, and run hours,” Panfil says. “Also look at the mechanical locking. Understand what the system already has gone through.”
DRUPS systems have been implicated in notable outages. Usually, however, they were caused by related issues. So, if you use data center DRUPS, keep them well-maintained, check their fuel, and test your cross-over procedures to ensure everything works as expected.
CORRECTION: A previous version of this article erroneously referred to a DRUPS-related data center outage at Global Switch. According to a Global Switch representative who reached out to DCK after the article was posted, the incident in 2016 was related to switchgear, not DRUPS, and did not take the entire site down, affecting only a part of it. The article has been corrected accordingly. We regret the error.
About the Author
You May Also Like