WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS AQUILA WHITE PAPER HIGH DENSITY RACK COOLING CHOICES by Phil Hughes Reprinted with permission of Clustered Systems Company, Inc 1 THIS WHITE PAPER DISCUSSES THE OPTIONS FOR SUPPORTING RACKS WITH HIGH POWER DENSITY COVERING BOTH THE VARIOUS COOLING SYSTEMS AND WORKING FLUIDS. IT CONCLUDES WITH AN ANALYSIS OF THE COSTS OF BOTH AIR MEDIATED AND CONTACT COOLING SYSTEMS. PREAMBLE Demands for more compute power are outstripping the rate of server capability increase. Older data centers were designed for 40 watts per square foot and most have no room to expand. The only way to pack in more horsepower is to increase power density. In some sites this has reached 400W per square foot with further increases needed. Many data center operators complain that they cannot buy more power for their site or it is prohibitively costly. The capability of expanding cooling is also limited. Just to make things more interesting, there is political and economic pressure to reduce energy consumption. While considerable progress is being made to increase the performance per watt of servers, it is insufficient to restrain overall power consumption growth. Power and cooling systems also need to improve. The biggest return on investment can often be had by upgrading cooling to handle high power densities more efficiently. This paper discusses a few of the air cooling methods and makes a comparison with Aquarius, Aquila s Liquid Cooled OCP rack with cooling by Clustered Systems. STEALTH COOLING? According to a report on site surveys from Lawrence Berkeley Lab, PUE (power usage effectiveness the ratio of total load to IT load) varied between 1.1 and 2.5. Of the non IT component, cooling accounts for over 90%. Even this measurement is open to suspicion as server fans are counted as part of the IT load, not cooling. This hidden fan load can vary from 5% at idle in cool air (oc) to over % when the server is under load and the air is hot (oc). A reported PUE on a hot day of, say, 1.05 may really be over 2 when the fans are going flat out to keep the server running. Assume that the fan load is 5% of motherboard load, subtracting it from the denominator (IT load) gives: Stated PUE = 1.05 Real PUE = 1.2 (1.05)/(1.5) A chiller based system would be more efficient as it would add between 22% and % to the UPS load. DISCUSSION Most systems that handle high power density use air as the heat transport medium so server fans are a necessary evil. This paper briefly reviews those systems that can be installed in an existing data center, given sufficient room for additional piping and wiring. Table 1 (next page) presents a summary of some of the various options. AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com
WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS WATER OR REFRIGERANT? WATER COOLED DOORS Heat is extracted from the air passing through the system by warming water passing through the door s coils. For each of that one pound of water warms up, one BTU is absorbed (specific heat). However, due to this warming effect, the temperature differential between air and water drops, diminishing the effective cooling. To overcome this problem, very cold water has to be used which, in some cases, its temperature is below the dew point, causing condensation. While this can be accommodated, it is an extra installation expense. Chilled water can be connected to cooling units either under floor or overhead with hard or flexible tubing. Heat can be removed from the water directly from a primary system which could be either a standalone remote mechanical refrigeration unit or one that is part of the building system. A better solution which avoids the entry of contaminants from the primary system into the unit cooling loop is to insert a CDU (coolant distribution unit, a heat exchanger and pump assembly) between the primary and secondary (unit) loops. REFRIGERANT Heat is extracted from air by refrigerant passing through the door s coils. Instead of making use of specific heat, latent heat of evaporation is used. A refrigerant can capture 100 times more heat per unit of weight than water. As the evaporative process is isothermal, warmer coolant can be used, avoiding condensation problems all too common with water based systems. Heat is removed from the refrigerant using CDU (coolant distribution unit, a heat exchanger and pump assembly) connected to a primary cooling water system. Due to the higher efficiency of refrigerant, the use of a chiller to cool the primary water can be eliminated or at least significantly curtailed and replaced by a water tower, adiabatic or dry cooler. Refrigerant can be brought to cooling units with hard or flexible overhead tubing. Refrigerant piping in a server room does require that most joints must be soldered or brazed to ensure a leaktight system. However, due to the extra work needed to protect against condensation drips in water based systems, costs are a wash. In the event of a leak, the refrigerant evaporates immediately while water can drip onto and damage sensitive electronics. Passive Rear Door Water Passive Rear Door Refrigerant In Row Cooling Water Active Rear Door Water Aquarius Fixed Cold Plate Rack Cooling Capability KW 15 0 0 Cooling Efficiency Low MediumHigh Medium Medium High Eliminate Hot Spots Server Fan dependent Server Fan dependent Server Fan dependent Additional Cooling required CRAC Required Aisle Containment Required Additional floor space Minimum Minimum 50% Minimum ne Isothermal Redundant pumps & control Chiller Required TABLE 1 COOLING OPTIONS AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com
WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS PASSIVE REAR DOOR A water cooled passive rear door was originally developed by IBM for its own servers. They worked quite well at the 5kW rack power levels then current. Later it was licensed to other companies. By bringing the cooling source closer to the heating source, energy savings can be realized both in the reduction of energy for air circulation fans and by reducing cool and return air mixing. The latter allows warmer coolant to be used, in some cases, eliminating the need for chiller operation. (Bell, 10) While these doors are quite effective at relatively low rack power levels, recirculation becomes a problem as the fan speeds increase to maintain the servers internal temperature. The speed up causes the differential pressure from back to front of the rack to increase. In one study this was shown to increase from % at 5kW to over 45% at kw (Khanakari, 08). The intake air to servers at the bottom of the rack exceeded the maximum limit of 2oC as then defined by ASHRAE. Counter intuitively, decreasing the recirculation rate by adding blocking plates can increase the pressure at the server outlet which will decrease fans efficiency, thus the air flow rate hence impeding cooling. In addition, cables can also partially block air flow, creating yet a further impediment. It should also be noted that passive rear door coolers cannot be used to condition the data center space. ACTIVE REAR DOOR COOLERS These systems are similar to passive coolers but with the addition of fans. These fans can eliminate some of the drawbacks encountered with passive doors. It is claimed that they can handle up to 50KW when added to a single rack. The pressure between the server outlet and door is reduced and this can cut down hot air recirculation and improve the efficiency of the servers internal fans. It is necessary that door fans are synchronized with the servers. If too slow, they can act as an impediment to air flow and if too fast, can waste energy. The additional fans will increase power draw and create another layer of devices to be regularly serviced and repaired. The increased cooling efficiency may also mean that warmer water can be used, possibly eliminating humidification and dehumidification issues. Another benefit may include extended use of economizer modes which cuts chiller energy expense. INROW COOLERS Inrow coolers are modular enclosures the same height and depth of the server racks and interspersed between server racks depending on the density to provide increased cooling in the vicinity of the racks. Inrow coolers function best when used in conjunction with aisle containment systems that force the cooled air from the inrow coolers to pass through the server racks from a cold aisle to a hot aisle. This also means for optimum efficiency inrow coolers require modular aisle containment systems, adding cost. While moderately effective, inrow coolers cannot be used for cooling a specific rack or racks because they are not directly connected to any rack. They cannot determine the exact direction of 100% of their airflow or cooling capacity. Current capacity limitation is around kw per inrow cooler (not per rack). Inrow coolers are connected to a central chilled water system via flexible hydraulic hoses. Pumped refrigerant inrow cooling units are also available see the discussion on water and refrigerant based cooling above for discussion of the differences. CONDUCTION COOLING Server conduction cooling is a recent development for high density rack cooling. It totally eliminates air as the heat transport mechanism from the higher power chips to the cooling fluid. Instead, heat risers (in their simplest form a block of highly conductive material e.g. aluminum), is placed atop the components. The blocks bring the heat up to a single plane, usually set as the top of the DIMMs. A layer of highly compliant and conductive thermal interface material (TIM) is placed on top and a flexible cold plate is pressed down on top of that. This mechanism provides a direct conductive path from chip junction to the coolant flowing through the cold plate. AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com
WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS CONDUCTION COOLING (CONTINUED) The cold plates are 0.08 (2mm) thick but, using water, absorb well over 1KW of heat in the x 22 format used in the OCP rack application. A module is 12OU tall and holds servers. The cold plates are integrated with the Module. After a server tray is inserted (hanging below the cold plate), it is raised up to press the cold plate onto the components. With chassis in a rack, it can cool and power up to100kw. With today s servers, a typical load would be about 500kW. HOW MUCH? Overall, there is relatively little equipment capital cost difference in the various forms of close in air cooling. All will have the same infrastructure consisting of chiller, economizer (possibly), CDU, airliquid heat exchanger (with or without fans) and possibly a CRAC for humidification and dehumidification. The passive rear door solutions installed cost about $5,000 to $,000 per instance installed, about $50,000 per megawatt. Air w CRAH Air w rear door Aquarius W/sq ft Data room 150 4 2400 Required sq ft 2 41 Cost per sq ft 250 250 80 Mechanical 1 44 0 Built area 8000 284 41 KW per cabinet 5 0 Number of cabinets 0 50 1 DC Construction $2,000,000 $ 9,000 $, Electrical system $ 1,000,000 $ 910,000 $ 540,8 Static discharge prot $ 100,000 $ 4,800 $,250 Cooling Chiller/cooler $ 1,400 $ 1,402 $,400 Cooling CRAH/HX $ 19,500 $ 412,89 $ Cooling CDU $ $ 21,52 Fire suppression $ 10,000 $ 55,80 $ 8, Physical security $ 1, $ 58,000 $ 10,41 Cabinets $ 250,000 $ 2,500 $ 8, Rear doors $ $ 2,9 $ Sub total $ 4,040,5 $,04,890 $ 1,505,900 Contingency @ Architect & Engineering % % % Project Mgr/Consultant 5% Totals $ 5,091,114 $,89,081 $ 1,,92 TABLE 2 BUILDING COST COMPARISON AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com
WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS HOW MUCH? (CONTINUED) In the case of the conduction cooling system, it consists of a rack with integrated cold plates (solid to liquid heat exchanger), a CDU and a non chiller based heat disposal system. This could be a dry cooler, adiabatic cooler or cooling tower, depending on location. Table 2 (previous page) gives estimates of the build costs per megawatt for a data center using conventional air cooled racks, air cooled racks with passive rear doors and Clustered Systems conduction cooled racks. Of course there are also energy savings. These are shown in Table (below). Air w CRAH Air w rear door Aquarius DC Floor area, 2, 41 ENERGY USE COMPONENT Data Center IT load KW 950 950 1000 DC internal cooling (server fans) 50 50 Data Centre Cooling Load (UPS) 1,000 1,000 1,000 Chiller Load @ C 2 2 Electric Room Cooling Load 140 122 CRAH Fan Power 1 Data Centre Cooling Load (lighting & skin) 80 28 5 Back of House Skin Load 10 2 Chilled Water Pump Refrigerant pump 0 0 Pump Cooling Load Condenser Water Load Ventilation Latent Heat Load Ventilation Sensible Load 2 Cooling Tower 2 Chiller Heat Rejection to Ambient Back of House Lighting 1 0.2 Total 1 148 1,080 True PUE 1.8 1.2 1.08 Cost of power $0.10 $ 0.10 $ 0.10 Annual cost $ 1,50, $1,2,250 $ 945,84 TABLE ANNUAL ENERGY COST COMPARISON AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com
WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS WHAT DO I GET FOR MY MONEY? Servers with fans are specified differently from those without. At idle, the fans draw ~ watts each (8 per server) and put out 50% of rated capacity. Under full load and 100% of capacity they will draw 8 times as much, about 150 watts. Assuming that a motherboard has a power rating of 500W the server nameplate in an air cooled system will specify 50 watts. However, the contact cooled server is still specified at only 500 watts, eliminating unnecessary capital costs for the same number of servers or permitting more servers with the same infrastructure. Air w CRAH Air w rear door Aquarius Server power (spec) Watts 25 25 500 Number of servers 100 100 2,000 Cost of server $2,000 $2,000 $2,000 Total initial cost $,0,000 $,0,000 $4,000,000 Failure rate per year 5% 5% 1% Service life years Total cost $,80,000 $,80,000 $4,1,000 Per year $1,22, $1,22, $1,, Cost per server/year $1,91.2 $1,49.4 $1,255.01 Gigaflops/server peak sustained 500 500 550 Total teraflops/mw 800 800 1100 TABLE 4 SERVER COST & PERFORMANCE Table 4 shows the amortization computation for each DC component. Per megawatt you can cool: 100 air cooled server or 2,000 contact cooled servers. Further, as turbo mode can be sustained indefinitely with liquid cooling, peak sustained processing power increases by. All these factors combine to yield % increase in peak sustained teraflops per Megawatt at a cost 4% lower than that of equivalent conventionally cooled servers. AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com
WATERCOOLED HIGH PERFORMANCE COMPUTING SYSTEMS Years Air w CRAH Air w rear door Aquarius Building 9 $ 51,282 $ 1,84 $ 855 Design & Mgmt 15 $ 29,1 $ 5,4 $ 52,012 Facility 15 $ 119,1 $ 1,95 $ 42,1 Cabinets $ 41, $ 10,41 $ 18,889 Rear Doors $ $ 2,9 $ Servers $1,22, $1,22, $1,, Total Amortization $1,2,124 $1,919,018 $1,0,04 Energy Cost $1,50, $ 1,2,250 $ 945,84 TCO $,28,844 $,221,28 $2,55,489 Per server $ 2,024.28 $ 2,01.29 $ 1,2.4 Per teraflop $ 4,048.55 $ 4,02.59 $ 2,21.5 99% 5% TABLE 5 ANNUAL AMORTIZATION PER SERVER COMPARISON AQUILA 8401 Washington Place NE Albuquerque, NM 811 TEL (505)92155 EMAIL support @aquilagroup.com