Efficient cooling for high-performance computing

Liquid cooling for high-performance computing

Created by Matt Gellert

For many years now, predictive AI has been an integral part of many data centre control and monitoring systems. It helps operators increase energy efficiency and detect impending failures at an early stage. Today, generative AI has moved from emerging technology to the dominant driver of new data centre design. Frontier model training, large-scale inference and the agentic and assistant features now embedded across mainstream productivity, search and developer tools are pushing both training and inference into a phase of sustained, large-scale infrastructure expansion.

The data and storage capacities required would have been considered exceptional only a few years ago, and the same trajectory is visible in science, medicine and applications such as autonomous driving. GPU and accelerator-based servers used for AI workloads multiply their performance and energy requirements many times over with each new generation. The result is very high power densities in the rack, well past the practical limits of the air cooling systems traditionally used in data centres. Liquid cooling is therefore the obvious choice for cooling AI servers efficiently and reliably under this additional load.

The most common method of cooling a data centre involves the traditional separation of the room into hot and cold aisles in a process known as containment. Cold air is supplied directly into the cold aisles, and in most new data centres it is delivered straight into the space rather than blown up through a raised floor, an approach that has become far less common in modern designs. The servers take in cold air at the front, the air absorbs their heat, and it is blown back out into the hot aisles at the back of the rack. From there, the air is conveyed through ducts into the air conditioning units and cooled once more.

Alternatively, server racks can be supplied with cold air by a rack-based cooling system. In-row cooling units positioned to the sides of the rack deliver cold air to the servers and draw in the heated air at the back, to cool it again. However, when airflow is conveyed through IT equipment it will generally not reach all components uniformly. This effect is especially pronounced in room air cooling, whereas in-row side cooling systems such as the STULZ CyberRow have a much lower risk of hot-spot formation. Traditional IT applications sit at relatively low densities, often around 10 kW or less per rack, and air cooling handles these comfortably. In practice, around 50 kW per rack is the upper limit for air cooling, and rear-door heat exchangers can help a rack approach that ceiling rather than push significantly beyond it. These figures remain more than adequate for many traditional IT applications, but rapidly become a limiting factor where GPU-dense AI systems are concerned.

Liquid cooling: Energy efficient cooling also at high power densities

How far hot and cold aisle containment can be relaxed depends on the type of liquid cooling. With immersion cooling, most of the heat transfer takes place in a closed system without an intermediate medium, so hot and cold aisles are largely unnecessary. With direct-to-chip, which is the more prevalent approach, only the hottest components are liquid-cooled and a meaningful proportion of air cooling is still required, so hot and cold aisle management continues to apply. Additional air cooling is needed for certain components such as power supply units and any parts not in direct contact with the cooling fluid. There still needs to be sufficient space between the racks or tanks to allow for maintenance work or to replace equipment. Because it requires less room, liquid cooling is also well suited to edge locations with little space and frequently changing ambient temperatures.

Overall, liquid can absorb more heat than air, which means rack power density can be increased significantly. With liquid cooling, around 120 kW per rack is now standard for current-generation AI platforms (such as the NVIDIA GB200 NVL72 reference design), and 250 to 300 kW racks are increasingly common in production AI deployments. The next wave of GPU platforms is being designed for rack densities approaching 600 kW, and hyperscale roadmaps already point toward megawatt-class racks within this decade. In practical use, this places significant additional demands on the electricity infrastructure and hydraulics.

Where waste heat recovery is concerned, liquid cooling has clear advantages over pure air cooling, because a higher temperature level can be reached, making direct connection to a transfer heat exchanger easier.

Versions of Liquid Cooling

Currently, different versions of liquid cooling are available, which differ in design and efficiency. In one version, the parts to be cooled come directly into contact with the cooling liquid (immersion cooling); in the other, the components are equipped with a heat sink, or cold plate, through which the cooling liquid flows (direct-to-chip liquid cooling). It is important to differentiate the two, because they are very different in practice and direct-to-chip is by far the more popular at this stage. Immersion remains comparatively niche.

Direct-to-chip is the easier conversion at facility level, because the rack form factor and the layout of the room are largely preserved. The servers themselves, however, are generally purpose-built liquid-cooled models with integrated cold plates rather than air-cooled units converted after the fact; retrofitting cold plates onto existing air-cooled servers is rare in practice. At the rack, a manifold distributes coolant to the individual servers and connects to the Technology Cooling System (TCS) pipe network. A coolant distribution unit (CDU) sits on the other end of the TCS. The CDU contains an internal heat exchanger that transfers heat from the TCS into the Facility Water System (FWS), the building's water loop. The pipework required for this can be routed through the existing raised floor where one is present.

Direct-to-chip systems typically run a water-based coolant rather than pure water or an expensive dielectric fluid. The most common choice is a propylene glycol and water mix known as PG25. The main trade-off is the risk of coolant escaping in the event of a leak, which is managed through leak detection and careful rack-level design rather than by switching to dielectric fluid, which is uncommon in direct-to-chip systems.

When immersion cooling is used, the cost of converting air-cooled systems is relatively high. Existing servers usually need to be replaced by ones specifically developed for immersion, operated in trays or tanks of dielectric fluid, and existing racks can no longer be used. As well as absolutely uniform heat dissipation, the liquid also ensures that the motherboards no longer take in any dust and therefore no longer need cleaning. If a dielectric fluid is used, a leak has no impact on the operational reliability of the IT system.

Circulation with or without pumps: 1-phase and 2-phase liquid cooling

In 2-phase liquid cooling, the liquid changes its aggregate state because of the temperature differences across the system. As it absorbs heat, the dielectric fluid exceeds its flash point at a temperature determined by its specification, becomes gaseous and rises. A condenser in the upper part of the tank is cooled from outside by a water circuit; when the gas reaches the condenser it cools, becomes liquid again and runs back down to absorb more heat.

The advantage of the 2-phase version is that it manages entirely without pumps, so fewer moving parts are required. On the other hand, the higher GWP of these fluids has to be taken into consideration, and ongoing PFAS regulation in Europe and the US is restricting the availability of several dielectric fluids historically used in 2-phase systems. This has slowed broad adoption of 2-phase immersion in favour of single-phase alternatives, and any 2-phase deployment should be evaluated carefully against the regulatory outlook for the chosen fluid.

Conclusion

Rising heat loads per rack are demanding new methods of data centre air conditioning. Once power densities exceed roughly 50 kW per rack, with rear-door heat exchangers only helping a rack reach that ceiling rather than move beyond it, there is currently no realistic alternative to liquid cooling. If the direct-to-chip version is used, the facility and rack layout can generally be preserved, although the servers are typically purpose-built liquid-cooled models and additional components such as CDUs will need to be procured.

If a completely new high-performance data centre is being built, immersion cooling should also be included in the planning and both options compared in detail, bearing in mind that direct-to-chip is the more widely adopted approach today. Whichever version is chosen, it is worth bearing in mind that a residual proportion of air cooling will still be required, typically 20 to 30 percent for direct-to-chip and 5 to 10 percent for immersion.

FAQs

At what rack power density do I actually need liquid cooling?

Air cooling is practical up to about 50 kW per rack, which is its effective ceiling, and typical traditional IT loads sit well below that at around 10 kW or less per rack. Rear-door heat exchangers can help a rack reach the top of the air-cooling band but do not extend it much further. Above that point, liquid cooling becomes the only realistic option. Most current-generation AI platforms already exceed this threshold, which is why GPU-dense deployments are designed liquid-first rather than retrofitted later.

What is the difference between direct-to-chip and immersion cooling?

In direct-to-chip cooling, a coolant flows through a cold plate mounted on the CPU, GPU or other hot components, while the rest of the server is still air-cooled. In immersion cooling, the entire server is submerged in a non-conductive dielectric fluid that absorbs heat directly from every component. Direct-to-chip is by far the more common approach today because it keeps the standard server form factor and room layout; immersion delivers more uniform heat dissipation but generally requires immersion-specific servers and tanks, and remains comparatively niche.

Can I retrofit my existing data centre for liquid cooling?

At facility level, yes, particularly with direct-to-chip, because the rack and room layout are largely preserved. The servers themselves are usually purpose-built liquid-cooled models rather than converted air-cooled units, as retrofitting cold plates onto existing servers is rare. The rack gets a manifold that connects to the Technology Cooling System (TCS), and a coolant distribution unit (CDU) bridges the TCS and the building's Facility Water System (FWS) through an internal heat exchanger. Pipework can often be routed through the existing raised floor where one is present. Immersion retrofits are more involved because the rack and server form factor changes, so existing hardware typically needs to be replaced.

Is liquid cooling more energy efficient than air cooling?

At high densities, yes, by a clear margin. Liquid carries far more heat per unit volume than air, so far less energy is spent moving the cooling medium. Liquid cooling also reaches a higher return temperature, which makes waste heat recovery genuinely viable, for example feeding a district heating loop or a process heat exchanger. At low densities the gap narrows, but for AI and HPC workloads liquid is consistently the more efficient choice.

What is a CDU and why do I need one?

A CDU (coolant distribution unit) is the interface between the rack-level coolant loop and the building's water loop. It contains an internal heat exchanger that transfers heat from the Technology Cooling System (TCS), which serves the IT equipment, into the Facility Water System (FWS), and it manages flow rate, pressure and temperature to keep conditions at the chip within specification. Any direct-to-chip deployment of meaningful scale will include one or more CDUs, sized to the IT load.

Is two-phase immersion cooling still viable given PFAS regulation?

It is still technically viable, but availability of dielectric fluids has tightened. Several of the fluids historically used in two-phase immersion are affected by PFAS restrictions in Europe and the US, which has slowed adoption in favour of single-phase alternatives. Any two-phase deployment now needs to be evaluated against the regulatory outlook for the specific fluid, as well as the higher GWP of these fluids in general. For most new AI projects, single-phase liquid cooling is the lower-risk path.