Pramod Mandagere Prof. David Du Sandeep Uttamchandani (IBM Almaden)
Motivation Background Our Research Agenda Modeling Thermal Behavior Static Workload Provisioning Dynamic Workload Provisioning Improving Data Center Efficiency by Fixing Existing inefficiencies/hotspots Layout Planning Current Status
Total US Data Center Power Consumption is about 1.5% of US total electricity consumption Total Cost for Data Center Electricity consumption ~ $5B in 2007 ($7.5B by 2011) Issue: Though it accounts for a very small percentage of overall consumption, the concentrated nature/density leads to supply issues (concentrated demand on power grids)
Data Center Power Consumption 50% Heating Ventilation & Air Conditioning (HVAC) 20-35% Servers 10-25% Storage 5% Networking Different Types of data centers Compute Centric (Ex: HPC) 35% Servers,10% Storage, 5% Networking Data Centric (Ex: Enterprise) 20% Servers, 25% Storage, 5% Networking Average Case 25% Servers, 20% Storage, 5% Networking
Almost all Energy consumed by all IT equipment is released as Heat Heat Extraction Process Fans suck in Cold Air from the vents at front of servers (inlets) As the cold air passes through the server, heat is extracted/absorbed and air exits the system at a higher temperature Q: Heat generated is a function of System Load Inlet temperatures should be kept below 25 0 C for safe operation (Thermal Redlining) : Failure rates increase non linearly above this threshold
Computer Room Air Conditioning Units (CRACs) extract out the heat generated by devices and supply cold air to the data center Higher the Supply Temperature -> Higher the CRAC Efficiency (Coefficient Of Performance) Q: Amount of Heat W: Work done is removing/extracting Q units of heat [HP Usenix 07] Device Inlet temperatures are a function of Supply temperature 25 0 C Supply temperature!= 25 0 C Server Inlet temperature Higher Supply Temp -> Higher Inlet Temp (Ideal setting: Highest Supply temp that leads to Max Inlet Temp < 25 0 C)
Heat Recirculation or Hot gas bypass Hot air generated by servers/storages does not completely travel across and reach the CRAC for extraction, a portion of it recirculates into cold isle. Cause Natural recirculation around the end of isles and top of racks or unused open spaces in racks in combination with flow rates of supplied cold air Effect With Supply temperature set to a given point, the Inlet temperatures at various servers tends to be higher that the supply temperature Factors that affect HR Data Center Layout/dimensions Workload distribution
Top View Profile View Typical Raised Floor Based Layout
Height:3ft Height:6ft Impact of Heat Recirculation Increases with height Temperatures at rack tops are higher than at rack bottom
Difference???? Difference???? Row Ends Row Middle Impact of Heat Recirculation Lesser at middle of rows/isles Increases towards row/isle ends
Objective Predict Temperatures Profile of Data Center Inlet temperatures of all Server & Storages as a function of Workload on all systems for a fixed layout and cooling system Given Power Usage of all equipment Physical Location of all equipment Physical Dimensions and Layout of the Data Center(fixed) Our Proposed Solution Use Supervised Machine learning techniques to build regression based predictors Support Vector Machines
Limitations of Related work (ASU) Modeling based approach & (HPLabs) Neural Net based approach Does not account for On/Off nature of server/fans (ASU) Does not provide any means of understanding/ verifying learnt functions (HPLabs) Parameter space has to be reduced for reasonable learning time (HPLabs) Assumes homogenous equipment (flow to power ratio) Our approach uses SVM based predictors Incorporates both Flows and Power profiles of servers Scalable & Verifiable Support Vectors Provides a means for understanding HR characteristics
Objective Determine relationship between Supply temperatures of Cooling Units and Server Inlet temperatures Given Power Usage of all equipment Physical Location of all equipment Physical Dimensions and Layout of the Data Center(fixed) Our Proposed Solution Profiling based approach to determine Zone Ownership Vary Supply temperature of each of the CRACs one by one and observe the corresponding change in Server Inlets Difficult to Model: Highly Dependent/Coupled with Server Workloads Zones of ownership more easier to determine and often adequate
Objective Minimize Data Center Power Consumption for a given Workload Set Assumptions Provisioning from scratch No deadlines all at once Constraints Server Inlet Temperatures < Threshold (25 0 C) CRAC loads < Maximum Capacity (100kW each) Given Physical Layout Information
Our Proposed Solution: Discrete non-linear optimization Objective function: Minimize overall Power Consumption Minimize Power consumption of Servers + Storages Minimize Peak Inlet temperature of devices (Zones) Basic Constraint: Inlet Temp < threshold Advanced Constraint: Connectivity + Policy based Solution Technique: Genetic Algorithms Workload distribution as the population string Challenges Trade off between Workload/platform efficiency and Cooling system efficiency Impact of Modeling accuracy on optimization process Incorporating Constraints
Objective Minimize Data Center Power Consumption for a given Workload on an running system/reprovisioning existing system to minimize inefficienies Assumptions System with pre running workloads and pre specified system state Constraints Server Inlet Temperatures < Threshold (25 0 C) CRAC loads < Maximum Capacity (100kW each) Given Physical Layout Information
Challenges Minimizing impact of reprovisioning on real time performance Trade off between repositioning one or more existing workloads vs performance/power gain by reprovisioning in the long run Accounting for power cost of reprovisioning Accounting for constraints
Objective Derive best practices for Floor Planning using the system models Temperature Profile as function of Data Center Dimensions Raised Floor Depth Ceiling Height Row Width CRAC placement Constraints Prevent thermal redlining Given Thermal Characteristics of devices Performance characteristics of devices
Layout Planning Designed simulations in Flovent CFD simulator Impact of Datacenter Characteristics on Thermal Profile Thermal Modeling as a function of Workload Collected Training Data for Supervised Learning Implement SVM based regression learners Thermal Modeling as a function of Variable Cooling Units Designed simulations in Flovent Impact of Cooling System variation on Thermal Profile Determined Zone based ownership
Raised Floor Depth 0.15m 0.3m 0.45m 0.6m # of Servers > 25C 37 28 25 6
Ceiling Height 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 # of Servers >25C 6 3 4 6 4 2 2 3 2
Layout EEWW NSEW NNSS # of Servers > 25C 4 15 6
Size 4ft 6ft 8ft # of Servers > 25C 4 23 30 *Room Size: 4ft = 2 floor tiles at any point between racks and walls
C R A C A Effect of 1 0 Change on CRAC Supply Temperature C R A C B Rack Bottom Rack Top
C R A C C Effect of 1 0 Change on CRAC Supply Temperature C R A C D Rack Bottom Rack Top