FOM 2.4 The Performance Indicator: Assessing & Visualizing Data Center Cooling Performance Mark Seymour AFCOM DCW 12th Sept. 2016 Future Facilities - speaking on behalf of The Green Grid 1
The Green Grid Mission Drive accountable, effective, resource-efficient, end-to-end ICT ecosystems Membership Non-profit, open industry consortium of end-users, policy-makers, technology providers, facility architects, and utility companies Holistic business approach Establish metrics, strategies, and best practices Drive understanding of risk Proactively engage governments to influence effective policy Provide frameworks for organizations to realize operational efficiency and maturity www.thegreengrid.org 2
Data Center World Certified Vendor Neutral Each presenter is required to certify that their presentation will be vendorneutral. As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete. 3
The Performance Indicator: Assessing & Visualizing Data Center Cooling Performance Assessing performance is critical to data center operational planning. Historically, the only metric available is that for energy efficiency (PUE ). Building on PUE, The Green Grid has introduced the "performance indicator". It presents three key cooling performance metrics to ensure that in the bid to save energy, the assessed facility maintains its ability to protect equipment during all modes of operation over its entire life. The need for, the logic behind the metrics (that can be based on measurement or measurement and simulation) and the metric definitions are presented. In addition the importance of business input to targets and the relevance to the specific business type are discussed Example case study data is used to illustrate the use of the metric. 4
3 Key Things You Have Learned During this Session 1. Why the business be interested in building on PUE by using the Performance Indicator for the ongoing assessment and improvement of your data center cooling? 2. The three metrics are easy to calculate and you can start on the 2 additional metrics with just some simple temperature measurements and a calculator 3. You can look into the future by adding simulation this allows you to avoid future problems, rather than react to them 5
Power Usage Effectiveness (PUE) TGG created PUE (power usage effectiveness) to understand the performance of the data center from an energy point of view PUE has allowed people to improve the infrastructure energy efficiency of their DCs However, changes to PUE do not consider the impact on overall performance What about failure/resilience? 6
Raising Supply Air Temperature Supply Temperature PUE IT Conditions 7
Raising Supply Air Temperature Supply Temperature PUE IT Conditions 8
Raising Supply Air Temperature Supply Temperature PUE IT Conditions 9
Failure or Scheduled Maintenance Scenario 10
Performance Indicator (PI) TGG decided to understand performance in a broader way so they created the PI A method for assessing and visualizing Data Center cooling performance in terms of balancing the risk of downtime against the waste of over engineering. Risk Waste 11
Evaluating the Linked Metrics The use of PUE alone has resulted in improving energy efficiency There has been no measure of performance with any other metric Energy Efficiency PUE 12
Evaluating the Linked Metrics How effectively are we cooling our IT Equipment? Cooling Effectiveness Energy Efficiency Devices poorly cooled in normal continuous operation 13
Evaluating the Linked Metrics Resilience How resilient is the IT when full cooling is unavailable due to maintenance or failure? Cooling Effectiveness Energy Efficiency Devices at risk of overheating in short term redundant failure or maintenance scenario 14
Evaluating the Linked Metrics Resilience The Performance Indicator visualizes the performance of all metrics The PI shows the balance and interaction between each and, more importantly, the impact a change to one will have on the other two Cooling Effectiveness Energy Efficiency 15
The Business Target Performance Resilience The business can set targets for the performance they require a target range The business can see how changes affect the different metrics and decide on priorities Cooling Effectiveness Energy Efficiency KPIs can now translate to a visual indication of the DC performance 16
The Business Target Performance IT Thermal Resilience The KPIs/metrics are known as: Energy Efficiency PUE Ratio (PUEr) Reduced Risk IT Thermal Resilience IT Thermal Conformance PUEr(X) Cooling Effectiveness IT Thermal Conformance 17
As a Business, which triangle is your best fit? Enterprise DC Hyperscale DC Colo DC IT Thermal Resilience IT Thermal Resilience IT Thermal Resilience IT Thermal Conformance Enterprise Business Drivers - No Risk of Downtime - Efficiency Sacrificed PUEr(X) IT Thermal Conformance PUEr(X) Hyperscale Business Drivers - High Efficiency - Virtualized IT allows for failure IT Thermal Conformance PUEr(X) Colo Business Drivers - Competitive edge on PUEr - Sacrificing Conformance allows user flexibility but maintains SLAs 18
Explaining the Metrics There are three metrics to consider: PUE ratio (PUEr): How effectively is the facility operating in relation to its target efficiency? Current Load IT Thermal Conformance: How much of the IT equipment is well cooled - receives air at appropriate inlet temperatures - during normal operation? Target Range IT Thermal Resilience: Is any equipment at risk of overheating in the case of cooling failure or planned maintenance? 19
PUE Not all Data Centers are created equally. In Legacy site, striving for a low PUE may not be possible or be excessively costly. A poor PUE will dominate the visualisation and could lead to an excessive focus on energy efficiency PUE 20
Explaining the Metrics PUEr(X) Based on PUE* and Energy Efficiency Ratings The PUEr Equation PUEr(X) = PUE ref (X) PUE actual Reference or intended PUE Measured ipue where PUE ref (X) is the lowest PUE in Rating X Ratio represents the deviations from the desired operation PUE ranges for the Energy Efficiency Ratings * PUE was originally defined by The Green Grid and is now documented in ISO 30134 Part 2. 21
Explaining the Metrics PUEr(X) By plotting the PUE based on your target agreed by the business will ensure you are meeting agreed performance without skewing the metric. 100% 90% 80% 70% 60% PUE 22
The Metrics IT Thermal Conformance How safe is it to house IT during designed for failure scenario/maintenance operation? IT Thermal Conformance is the % of IT load operating within the ASHRAE Temperature Compliance recommended range (green - to avoid increased risk from overheating < 27 C) Temperatures can be measured using discrete sensors or IT reported inlet temperatures 23
The Metrics IT Thermal Resilience How safe is it to house IT during normal operation now and in the future? When redundant cooling units are off-line, IT Thermal Resilience is the % of IT load operating within the ASHRAE Temp. Compliance allowable range (< 32 C not red) Temperatures can be measured using discrete sensors or IT reported inlet temperatures when redundant units are off-line for maintenance or simulated otherwise The Green Grid does not recommend turning off cooling units with the sole intent of calculating resilience 24
Planning the Future The three metrics can also be used to evaluate future plans using simulation for: New deployments Projected full load performance Safe capacity for the business is the lower of IT Thermal Resilience IT Thermal Conformance with the simulation model loaded with additional IT to match the design load Designed-for failure or maintenance scenarios Target Range Current Load Future Load 25
Looking into the Future Using simulation the PI allows you to look at where your DC is heading and the impact of changes on all 3 metrics. Higher loads often benefit PUE but is the commonly negative impact on cooling effectiveness acceptable? 26
DC Capacity Fill your DC to full design load to show you how much capacity you will safely house. This DC will only reach 80% of IT Capacity before IT is at risk during a failure/maintenance scenario 27
Levels of Assessment PI can be assessed using measurement for inlet temperature for the installed configuration Future planning can take advantage of PI if simulation is available Level of Assessment Current State Future States Level 1 Level 2 Level 3 Level 4 Rack Level (Measured) IT Level (Measured) Rack or IT Level (Simulated or Measured) IT Level (Simulated or Measured) N/A N/A Rack Level IT Level 28
Case Study PI This site has qualified and experienced FM and IT teams, the latest DCIM toolset and best practices to maintain a well run facility. 2x Data Halls @ 10,000 ft 2 each Commissioned in 2008 Tier 4 Mission Critical 400 cabinets with 6,000 assets per hall 45% of design IT load Live power and temperature monitoring Asset Management (nlyte) software 29
Case Study PI The assessment highlighted: IT Thermal Conformance of 87% IT Thermal Resilience of 97% PUEr(D) of 70% 30
Future State Fill to Design Load Design Load 5 kw Operational Load 3.7 kw 31
Case Study PI The Future When the facility was simulated with full design IT load, the PUE rating improved dramatically as the Cooling and Power Infrastructure becomes significantly more efficient. However, the IT Thermal Resilience suffered the most, with 15% of IT operating outside of ASHRAE allowable temperature range. 32
Assessment Improvement Rack Delivery Issues Assessment found multiple internal rack issues. This example was down to IT with inlets and exhausts on the same side. 33
Assessment Improvements Cabinet Configuration Bespoke ducts/blanking was proposed to deliver cool air direct to the inlets 34
Assessment Improvement - Air Delivery Issues Low flow rates through the perforated tiles near the edges of the room is caused by the high speed of the air leaving the CRAC units. Air bypasses the first tiles and even drags room air back into the void. Underfloor baffles and modifications to perforated tile configuration were recommended Plane of Airflow Velocity 35
Floor Grille Choice - Improvements Installing a different floor grille In the edge locations improved conditions for the IT Old Grille New Grille 36
Case Study PI After the recommendations were installed into the live facility, the improvements to the Performance Indicator can be demonstrated 37
Summarising the Performance Indicator The Performance Indicator (PI) may be used to: 1. Visualize the balance between risk and efficiency 2. Assess a facility s performance in relation to the company s target range 3. Track a facility s progress over time, as and when changes to the facility and IT are implemented With Simulation in addition to measurement 4. Assess performance effects of changes before actual implementations, from IT deployments to the installation of containment 5. Compare alternative configuration options Target Range Current Load Future Load 38
Summarising the Performance Indicator The Performance Indicator (PI) may be used to: 1. Visualize the balance between risk and efficiency 2. Assess a facility s performance in relation to the company s target range 3. Track a facility s progress over time, as and when changes to the facility and IT are implemented With Simulation in addition to measurement 4. Assess performance effects of changes before actual implementations, from IT deployments to the installation of containment 5. Compare alternative configuration options 39
Thank you www.thegreengrid.org 40