Computer system energy management. Charles Lefurgy

Similar documents
Advanced Multimedia Architecture Prof. Cristina Silvano June 2011 Amir Hossein ASHOURI

8205-E6C ENERGY STAR Power and Performance Data Sheet

8233-E8B 3x6-core ENERGY STAR Power and Performance Data Sheet

POWER7+ TM IBM IBM Corporation

Active Management of Timing Guardband to Save Energy in POWER7

POWER7: IBM's Next Generation Server Processor

Systems and Technology Group. IBM Technology and Solutions Jan Janick IBM Vice President Modular Systems and Storage Development

Power and Cooling Innovations in Dell PowerEdge Servers

POWER7: IBM's Next Generation Server Processor

ClearPath Forward Libra 8480 / 8490

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Peeling the Power Onion

ibm IBM EnergyScale for POWER7 Processor-Based Systems March 2013

Power Technology For a Smarter Future

Hardware/Software T e T chniques for for DRAM DRAM Thermal Management

Power Control in Virtualized Data Centers

The End of Redundancy. Alan Wood Sun Microsystems May 8, 2009

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

Power your planet. Optimizing the Enterprise Data Center POWER7 Powers a Smarter Infrastructure

Bridging High Performance and Low Power in the era of Big Data and Heterogeneous Computing

Measuring and Managing Power Consump2on

ECE 571 Advanced Microprocessor-Based Design Lecture 24

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Dell Dynamic Power Mode: An Introduction to Power Limits

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

Quad-core Press Briefing First Quarter Update

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

Post Silicon Electrical Validation

IBM Power Systems HPC Cluster

IBM's POWER5 Micro Processor Design and Methodology

HP Power Capping and HP Dynamic Power Capping for ProLiant servers

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Power 7. Dan Christiani Kyle Wieschowski

A Simple Model for Estimating Power Consumption of a Multicore Server System

Making Performance Understandable: Towards a Standard for Performance Counters on Manycore Architectures

Looking ahead with IBM i. 10+ year roadmap

Dominick Lovicott Enterprise Thermal Engineering. One Dell Way One Dell Way Round Rock, Texas

Intel SSD Data center evolution

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Cisco MCS 7828-I5 Unified Communications Manager Business Edition 5000 Appliance

HPE ProLiant DL580 Gen10 Server

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

HP ProLiant DL385p Gen8 Server

Charles Lefurgy IBM Research, Austin

Z8 G4 Site Prep Guide Table of contents

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

Energy Efficient Servers

IBM System p5 570 POWER5+ processor and memory features offer new options

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

ANNEXES. to the. COMMISSION REGULATION (EU) No /.. of XXX

Towards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters

Dynamic Power Capping TCO and Best Practices White Paper

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

IBM SYSTEM POWER7. PowerVM. Jan Kristian Nielsen Erik Rex IBM Corporation

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE

Towards Power Management for FreeBSD

THERMAL BENCHMARK AND POWER BENCHMARK SOFTWARE

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

AMD Opteron 4200 Series Processor

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

HPE ProLiant DL360 Gen P 16GB-R P408i-a 8SFF 500W PS Performance Server (P06453-B21)

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

RUBIK: FAST ANALYTICAL POWER MANAGEMENT

Increasing Energy Efficiency through Modular Infrastructure

IT Level Power Provisioning Business Continuity and Efficiency at NTT

Data Sheet Fujitsu Server PRIMERGY CX400 M1 Compact and Easy

Performance Analysis in the Real World of Online Services

Sun Fire X4170 M2 Server Frequently Asked Questions

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

Transforming Utility Grid Operations with the Internet of Things

p5 520 server Robust entry system designed for the on demand world Highlights

April Power Systems Roadshow

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

IBM POWER7 Systems Express Blades Quick Reference Guide October 2012

IBM System x3850 M2 servers feature hypervisor capability

HES-7 ASIC Prototyping

IBM _` p5 570 servers

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Quotations invited. 2. The supplied hardware should have 5 years comprehensive onsite warranty (24 x 7 call logging) from OEM directly.

Best Practices for Setting BIOS Parameters for Performance

Low-Power Processor Solutions for Always-on Devices

Jae Wook Lee. SIC R&D Lab. LG Electronics

IBM System i Model 515 offers new levels of price performance

An Approach for Adaptive DRAM Temperature and Power Management

Power Management in Cisco UCS

What Computer Architects Need to Know About Memory Throttling

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

Power Systems with POWER8 Scale-out Technical Sales Skills V1

ECE 486/586. Computer Architecture. Lecture # 2

Open Innovation with Power8

IBM POWER7 Systems Express Blades Quick Reference Guide April 2012

Understanding Reduced-Voltage Operation in Modern DRAM Devices

Cisco UCS C200 M2 High-Density Rack-Mount Server

Optimizing Apache Spark with Memory1. July Page 1 of 14

Power Aware Architecture Design for Multicore SoCs

Transcription:

28 July 2011 Computer system energy management Charles Lefurgy

Outline A short history of server power management POWER7 EnergyScale AMESTER power measurement tool Challenges ahead 2

A brief history of server power management in IBM 2005: Power measurement in servers. Industry first 2006: IBM PDU+ (power cord-based power measurement) 2007: Power capping in servers. Industry first 2008: POWER6 with DVFS dynamic DRAM power management 2010: POWER7 uses DDR3 self-refresh mode POWER7 with Turbo mode Partition-aware power capping 2011: Partition-aware DVFS Measure Improve reliability Save energy Improve performance Support virtualization 3

POWER7 energy management Source: M. Floyd, B. Brock, M. Ware, K. Rajamani, A. Drake, C. Lefurgy and L. Pesantez, Harnessing the Adaptive Energy Management Features of the POWER7 chip, Hot Chips 22, 2010. 4

IBM POWER 750 Express Expansion card slots 4 Processor cards 4 redundant removable fans 2 redundant power supplies Tape drive bay airflow Processor card POWER7 processor 8 removable disks DIMM buffers DDR3 DIMM slots 5 Connector

Address variability in hardware and operating environment Complex environment Installed component count, ambient temperature, component variability, etc. How to guarantee power management constraints across all possibilities? Feedback-driven control Capability to adapt to environment, workload, varying user requirements Regulate to desired constraints even with imperfect information Models Estimate unmeasured quantities Predict impact of actuators Power management controller Guarantee constraints Find energy-efficiency settings Sense Real-time monitoring Power, temperature, performance, Actuate Set performance state (e.g. frequency, voltage) Set low-power modes (e.g. DRAM power-down) Set fan speeds 6

EnergyScale Cooperative hardware and software solution for power management EnergyScale firmware runs on dedicated microcontroller DVFS, power capping, fan control, etc. POWER7 microprocessor has hardware accelerators for power management Sensor gathering, thermal sensor conversion, power proxy calculation, etc. Goals Increase performance Reduce power at same performance level IBM Systems Director Policy management Power & performance measurement Set policy Sense POWER7 System EnergyScale Microcontroller Sense Sense Sense & Control Sense Control Memory Power Supply POWER7 Processor Control I/O Devices 7

POWER7 sensors POWER7 microarchitecture activity & event counters Processor core, memory hierarchy, and main memory access Provide performance, utilization, and activity measurements Used to direct power/performance tradeoff decisions & techniques POWER7 Digital Thermal Sensors 44 on-chip sense points POWER7 Critical Path Monitor Detects circuit timing margin System sensors Fan speed Power by voltage domain Temperature by component Physical Locations of Thermal Sensors Core 0 MemCtrl 0 Off-chip Interconnect Core 1 Core 2 Core 3 SMP Coherence Fabric MemCtrl 1 Core 4 Core 5 Core 6 Core 7 Off-Module Interconnect 8

Performance control Per-core frequency control Digital PLL (DPLL) clock source supports -50% to +10% of nominal frequency 25Mhz resolution Automated fast frequency slew in excess of 50Mhz per us Supports energy optimization in partitioned system configurations Relative power (8 cores) 1.2 1 0.8 0.6 0.4 Benefits of Per-Core Frequency Scaling 30% reduction for run-frequency scaling Each partition can run under different energy-savings policy Less-utilized partitions can run at lower frequencies Heavily utilized partitions maintain peak performance EnergyScale Dynamic Power Savings algorithm looks for workload slack in time and across cores 0.2 0 Increasing Core Voltage 8 Cores Run Fmax @ V 20% reduction for nap-frequency scaling 1 Core Runs Fmax; 7 Cores Nap Fmax 1 Core Runs Fmax, 7 Cores Run Fmin 1 Core Runs Fmax, 7 Cores Nap Fmin Note: highest frequency core determines the required voltage 9

Dynamic power savings SPECPower_ssj2008 running on IBM Power 750 Express system** SPS (Static Power Save): fixed, low-power operating point = improved score almost 25% DPS (Dynamic Power Savings): DVFS with Turbo mode = improved score almost 50% Source: Heather Hanson, IBM research 10 * Results shown on our prototype system, should not be construed as committed capability for a shipping IBM Server. * SPEC and the benchmark name SPECpower_ssj are trademarks of the Standard Performance Evaluation Corporation

Power capping controller Caps when redundant power supply fails or customer sets power cap target Every 8 ms, measure system power and adjust processor voltage and frequency to meet power cap Partition-aware: take down frequency of Dynamic Power Saving partitions first. Precision measurement desired Measurement error translates to lost performance More guardband in power cap target Opportunity for improvement On-line modeling Relationship of frequency to power changes over time Power (W) Frequency (GHz) 1500 1400 1300 1200 1100 1000 900 4000 3800 3600 3400 3200 3000 System power Set cap to 1kW Settles in 16 ms 1000 1100 1200 1300 1400 Time (ms) Processor frequency 1000 1100 1200 1300 1400 Small undershoot Lefurgy et Time al., ICAC07 (ms) 11

Summary POWER7 builds upon initial POWER6 EnergyScale features by including automated on-chip functions and accelerators to assist the off-chip microcontroller firmware POWER7 energy management features combined with new energy-saving algorithms show a 50% improvement in SPECpower score over baseline operation Customers can select the best EnergyScale policy to match their needs, relying on the system to balance power consumption and performance accordingly 12 * Results shown on our prototype system, should not be construed as committed capability for a shipping IBM Server. * SPEC and the benchmark name SPECpower_ssj are trademarks of the Standard Performance 2011 Evaluation IBM Corporation Corporation

Measuring power in POWER7 13

AMESTER: Automated Measurement of Systems for Energy and Temperature Reporting A research tool for detailed monitoring and control of power consumption on a single IBM server Non-intrusive, remote measurement Interacts with server firmware Scriptable for rapid prototyping Related product: IBM Systems Director Active Energy Manager Monitors entire data centers 14

How it works AMESTER runs on a laptop or server (Windows/Linux) Connects to remote system to measure EnergyScale microcontroller Firmware for power management Implements AMESTER command set Out-of-band data collection (no OS support required) POWER7 system AMESTER Internet EnergyScale Performance Temperature P7 Service Processor Power measuring circuits 15

Basic functions included in AMESTER Sensor data collection Whole system power measurement Component power on POWER: CPU, DIMMs, fan, etc. CPU temperature, CPU frequency, CPU utilization, voltage, instructions per second, etc. Histograms High-resolution tracing 1ms for sensors Power capping Scripting Tcl command line Job management library to run workloads remotely Data library to collect and graph user data 16

Insights Visualization is key to rapid prototyping and problem solving Understand how power capping controller reacts to workload changes Correlation of power with other metrics Study DVFS algorithm High-resolution collection of core utilization, clock frequency, and performance Example debugging: small blips in an otherwise steady-state behavior 17

Available for academic collaborations Current collaborations National Center for Supercomputing Applications Barcelona Supercomputing Center Forschungszentrum Jülich Contact Charles Lefurgy (lefurgy@us.ibm.com) 18

Challenges 19

May you live in interesting times Power management is a first class design consideration (from circuits to full systems) Dark silicon power limitations predicted to severely impact multi-core chip performance H. Esmaeilzadeh,E. Blem, R. St Amant, K. Sankaralingam, and D. Burger, Dark Silicon and the End of Multicore Scaling, ISCA 2011. W. Huang, K. Rajamani, M. R. Stan, and K. Skadron. "Scaling with Design Constraints Predicting the Future of Big Chips." IEEE Micro special issue on Big Chips, July/Aug. 2011. Power and thermal management becomes more important for future technology nodes Opportunities: Virtualization of power management (give end-user greater role) Guardband reduction (save energy, increase performance) Combining power measurement with other measurements for insights (save energy) On-line modeling to improve all of the above 20

Measuring virtual machine power Clouds are managed on a virtual machine partition basis Traditional platform power management is insufficient Measurement: Today: Power measured at the power supply How to bill each VM user for energy cost? Provide server owner with energy meter for each VM Billing Provisioning Insight POWER7 server AIX Linux VM 1 VM 2 Hypervisor Use modeling to cover gaps in power measurement Core-level power models based on power proxies Allocate energy according to VM-to-hardware mapping Per-socket measurements provide ability to learn and correct on-line models Unanswered questions Fairness: How to pay for fan power? Splitting the bill is not fair for low-power VM. Fairness: If a VM on core A, heats up core B, should B pay for the extra leakage power? Infrastructure: How to charge for remote device power (network storage)? 21

Processor Core Power Proxy Estimate per-core chiplet active power For each functional unit, pick subset of activities Weight each activity counter relative to power it consumes Sum weighted counter, clock grid power, and constant offset Chiplet Active Power = (Wi * Ai ) + K*f + C +/-10% error for 90% of samples ISU L 2 L 3 NCU CORE IFU Core Activity D F U FXU LSU VSU & FPU 5 events 4 events Power Proxy = Activity Sense point Processor Core Chiplet 22

Guardband reduction The voltage used on a microprocessor is conservative to provide a safety cushion in case workload spikes causing noise or voltage droops Microprocessor operating points failure Guardband is the difference between the operating voltage and the voltage at which the microprocessor fails Concern: Energy-efficiency is reduced to guarantee reliability. Frequency Voltage guardband Selected for reliability 23

Timing margin sensor Direct measurement of remaining timing margin with Critical Path Monitor (CPM) ISU CORE D F U FXU VSU & FPU CPM output IFU L 2 LSU 5 CPM 5-bit output per CPM 11111 = large margin 11110 = some margin 11100 = ideal margin 11000 = margin too small 10000 = not enough margin NCU L 3 POWER7 core chiplet 5 DPLL 24

Undervolting solution Protect Protect Optimize Optimize Timing Timing margin margin controller controller responds responds to to changing changing operating operating conditions conditions by by setting setting highest, highest, safe safe frequency frequency that that avoids avoids timing timing failures. failures. Performance Performance controller controller adjusts adjusts voltage voltage to to meet meet desired desired clock clock frequency frequency target. target. ISU CORE D F U FXU VSU & FPU Workload, temperature, voltage, and frequency influence CPM output Clock frequency target IFU LSU L 2 5 CPM 5-bit output per CPM CPM +/- freq Voltage Adjust regulator voltage - 25 NCU L 3 POWER7 core chiplet 5 DPLL DPLL POWER7 Real frequency Microcontroller

Undervolting results Prototype POWER 750 Express server Run SPEC CPU 2006 workloads 20% power reduction of POWER7 processor + memory buffer 18% power reduction in system power Fan power is reduced by 50% No change in performance Normalized System DC Power Fan Processor + memory buffer DIMM Other 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 100% (product) 50% 0% Timing Guardband 26

Guardband reduction to save energy in POWER7 prototype server Click box to play video Undervolting Video 27

Using power measurement with other metrics TAPO: Thermal-aware power optimization Minimize total of server fan power and microprocessor leakage power Reduces server power 5% at peak Turbo performance in P7 server prototype No performance loss N ormalized System Power 1.05 1.00 0.95 0.90 0.85 0.80 Frequency Frequency drop drop from from overheating overheating Optimize Optimize default default thermal thermal setting setting 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Fan speed (RPM) 5.4% 5.4% power power reduction reduction Source: W. Huang, M. Allen-Ware, J. Carter, E. Elnozahy, H. Hamann, T. Keller, C. Lefurgy, J. Li, K. Rajamani, and J. Rubio, TAPO: Thermal-Aware Power Optimization Techniques for Servers and Data Centers, IGCC, 2011. 28

Summary Server power management has made significant progress in just a few years Extending for virtual machines Extending for reliability Power measurement is the basis for server power management Many improvements yet to come Wider coverage of components Shorter timescale measurements and correlation to other metrics Self-tuning on-line modeling Measurement improvements lead to guardband reduction Improve performance, save energy, lower cost, improve reliability 29

End 30