The Dunk Tank Reconsidered: How OCP May Yet Immerse Your Servers
Rising data center power density and fluid innovation are fueling new momentum for immersion cooling.
It’s still a scary thought: dunking a high-performance server into a vat of fluid, turning the power on, and hoping sparks don’t fly. But maybe we should be equally scared by something else: continuing to build data centers with lifespans well into the 2030s that are singularly dependent upon airflow for cooling. How sustainable is an industry that spends more to power rotating fans than solid-state processors?
“An immersion-cooled environment has the server completely dunked inside the dielectric fluid,” Jimil Shah, an application development engineer with 3M, said while speaking at Facebook’s Open Compute Project Virtual Summit 2020 in May. “We can significantly increase the power density with an immersion-cooled environment as compared to an air-cooling method.”
The value proposition of immersion cooling, as described by vendors, has oscillated over the years between enabling high density, improving energy efficiency, and prolonging useful life of existing hardware. (You could have your server investment last a few years longer by plunging the servers into a kind of sustained stasis.) The life extension argument is now out the window. We’re no longer dunking existing servers. Instead, there’s now almost the reverse assertion: As long as newer processors continue to get hotter, they simply cannot continue to be designed the way they are today. In other words, now that power density is going up, we’re back to the density argument.
“We’ve discovered that the server must be re-architected,” Mark Shaw, general manager for advanced hardware at Microsoft, said. “The racks, the power, the servers, and server components must all be changed.”
Cold Bath
Cooling fluid was the through-line of an uninterrupted day-and-a-half of OCP Summit sessions. In the three years’ time since we painted an almost uncomfortably icky picture of dunking servers in tanks of mineral oil, immersion cooling has progressed from a carnival sideshow act to a genuine science.
The three factors that so rapidly improved the outlook for immersion cooling from the wealth of information we received during the event, are these:
Chassis designed for immersion. We’re no longer just dunking servers and throwing the switch. Fluid flow is a different beast from airflow, especially since solid-state memory is now commonplace. As a result, the chassis standardization processed under development within OCP enables a new and unusual form factor that’s the length of some refrigerator doors, has the thickness of a candy bar, and when fully populated weighs up to 75 pounds.
Manufactured dielectric fluids. It’s not mineral oil anymore. Processed hydrocarbon and fluorochemical fluids, with lower boiling points than H2O, produce vapor bubbles that are little containment vessels for heat. Not only can new systems operate comfortably in a tank whose contents are literally under a rolling boil, but the new fluids they use evaporate and re-condense so quickly that an extracted chassis emerges from the fluid both cool and dry.
A robotic chassis extracting arm by Asperitas
Better automated monitoring and maintenance. Airflow is a fickle thing. By contrast, fluid baths effectively create microclimates for servers, where temperature sensors are far more reliable and accurate, and operating conditions are much easier to stabilize. A maintenance trolley with a robotic armature can extract a specific server tray from a tank and place it on an illuminated flat bed like an operating table.
The New Racks
Immersed systems engineers are now coalescing around a likely industry standard based around the emerging Open Rack v3, which would allow for long, wide shelves as thin as 22 mm, as well as support for a 54V busbar (similar to the one shown below) as an alternative to the power shelf (a long-time OCP staple). For now, this design exists only in diagrams.
OCP 2020 immersion busbar
“Given the volume of an Open Rack v3 to be about 1.2 cubic meters, we can easily have enough fluid capacity to support hundreds of kilowatts,” Microsoft’s Shaw said. “The question is really whether or not we can actually get enough servers into the rack to meet the cooling capability.”
One firm, Amsterdam-based Asperitas, is open-sourcing its work on an optimized immersion chassis using 15-, 19-, or 21-inch width and 1 or 2 OU (“open unit”) height. Some engineers have taken to calling each chassis – a drawer designed to be slid in vertically and locked into place – a cassette.
“The basic property of this chassis is the fact that it’s optimized for liquid flow,” Asperitas founder, Rolf Brink, explained. “Most server chassis are designed for air and do not allow as much flow through all the hole patterns and gaps which liquid requires. It is also optimized for servicing. It is completely flexible regarding IT equipment design. It can facilitate virtually any kind of IT combination, and it’s suitable for working with C13 power delivery or busbars.”
“The position of the components in an immersive server is directly related to the heat flow,” Michael Helezen, a research engineer with the Strasbourg-based high-performance server maker 2CRSI, said. Helezen and Brink both suggested the long, flat Open Rack-based chassis be divided into three temperature zones.
One new design for single-phase systems is being presented to OCP by engineers from Asperitas, Intel, 3M, 2CRSI, and Flextronics. In this proposal, the lowest portion of the submerged chassis, called T0, would be reserved for the most heat-intensive components (primarily GPUs and power supply units). CPUs, which run slightly cooler than GPUs, may populate the middle zone, called T1, while components that can tolerate temperatures up to 18C warmer may populate T2.
A small portion of the chassis, or cassette, resides above the coolant, although no active processing components are intended to be stationed there. “Components with minimal thermal properties can be placed there, as long as they have a very high temperature tolerance,” Brink said.
The New Fluids
Easily the most captivating part of these sessions, if not the conference as a whole, was watching a boiling liquid cool a computer system. Sadly, this being a virtual conference, we only saw glimpses.
Set aside everything you think you know about cooling, except for this: heat rises. Since heat is the thing you don’t want in a server, the entire design of the system should enable unwanted heat to be carried up and out. Sometimes this can be done naturally, as in the case of a single-phase system, where rising hot fluid is replaced with cooler, denser fluid, creating a natural roll. In other scenarios, such as two-phase systems, the fluid is allowed to boil. New, manufactured fluids boil at lower temperatures than water. Boiling converts fluid to vapor bubbles, which carry heat away with them. The vapor is piped out to a cool surface, where it re-condenses into liquid and, like rain, is readmitted to the tank — a process the engineers call “rewetting.” “The rewetting makes sure that the liquid is there to replace the one that just changed phase, into gas or a bubble,” Brink said.
There are now two types of immersion coolants available in the market:
Hydrocarbon — often a petroleum product that easily changes state from a liquid to a gas when heated;
Fluorocarbon — a manufactured product that bonds hydrogen with fluorine, of the class generally used today in liquid- and oil-resistant coatings such as Scotchgard, and fire retardants such as Novec (both of which are in 3M’s wheelhouse).
Asperitas has been partnering with Shell on production of a synthetic hydrocarbon fluid, which they premiered during the virtual conference. (Yes, you read right: Shell presented at a tech conference.) Called S5 X, this fluid would be produced exclusively for immersion cooling.
“The liquid is very safe to work with, and has very high purity properties,” Brink said. “It meets all of the purity requirements of the European Union and US pharmaceutical organizations. It has a very low volatility, so it’s a very stable liquid. It’s non-halogenated, food-grade, and free from any kind of allergens. The same base oil is also used as medicinal liquids.”
3M’s Shah warned though that hydrocarbon fluids do have the side-effect of moisturizing the components they immerse, which can change their physical properties. As multiple engineers noted, ordinary mineral oils that were used in early-stage immersion projects caused sockets to expand and cards to slip out, as well as decomposing the plasticizers and coatings on wires that made them flexible.
“In electronics, there are so many hydrocarbon polymer materials being used,” said Shah. Dissolved particulates in the fluid become contaminants that eventually clog filters and disturb regular flow, he told attendees. “So the electronics and the supporting hardware can introduce multiple types of contaminants to the oil, and can have an effect on the properties of the oil, as well as performance.” That’s not to say that manufactured oils are somehow unsafe or unsuitable for use in immersion, but they will require some type of regular maintenance and filtration.
Mineral oils, Shah declared, “should be avoided. Only synthetic liquids should be used. Synthetic liquids can minimize most of the common malfunctioning related to immersion cooling.”
Perhaps most astonishingly, these new fluids are unlike ordinary oils and lubricants. Just seconds after a cassette is removed from its bath, for instance, those tiny fluid drops that you’d expect to find deposited throughout the components (and get all over your hands and clothes) will vaporize and re-condense into the pool, leaving the chassis both clean and nearly bone dry.
The New Setup
Removing the airflow fan from the CPU and dunking the bare chip into mineral oil, it turned out, was a bad idea. For the fluid flow design to work effectively, the surface area of the chip assembly exposed to the fluid must be increased.
OCP 2020 immersion Asperitas xeon heatsink.jpg
The way to do that in the new single-phase scheme is to attach a passive heatsink with copper fins that steer the fluid flow direction like rudders on a boat. Optimally, these fins should be parallel to the flow. 2CRSI’s Helezen suggested that for a properly certified and optimized thermal design the size of the fins should be determined by the viscosity and thermal properties of the fluid.
OCP 2020 immersion zones.jpg
Power supply units and any other parts that may be shaped more like bricks than plates should be placed in the bottom zone of the chassis, the T0. Yet even here their gratings should be opened to allow for more flow. In chassis that are thick enough to allow for them, PCI Express cards should be installed vertically, just like the chassis containing them. One problem engineers are still working on, they admitted, is finding the proper pitch for DIMM sockets, which are typically tipped at an angle. In a perfect world, one engineer said, a socket would make it possible for a DIMM to slip in almost parallel with the substrate, like in a laptop PC. But that might not be an option yet.
Although it may be a good idea to populate the chassis as densely as possible, while still allowing for unobstructed fluid flow, it was suggested there should be plenty of margin between the topmost component of the chassis in the T2 zone and the surface of the fluid, as well as the lowest component in T1 and the bottom of the tank.
Warmup Phase
What will determine immersion cooling’s progress from this point forward will be demand. At the moment, with cloud data centers being minimally staffed and many academic facilities shut down completely, such demand will not be immediately forthcoming. There’s a big fork in the road ahead, put there by circumstances beyond most anyone’s control, and we won’t know for a while whether hyperscalers will heed Yogi Berra’s advice and take it.
But if economic progress resumes its regular pace sometime in the near future, there may yet be time for immersion cooling to register sustained power and cost savings. Maybe dunk tanks won’t become fixtures of all data centers within our lifetimes, but at least for HPC, dunk tanks are looking more and more inevitable. Hyperscalers may want to prepare for scaling down.
About the Author
You May Also Like