Managing Hardware Power Saving Modes for High Performance Computing Second International Green Computing Conference 2011, Orlando Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr timo.minartz@informatik.uni-hamburg.de Scientific Computing Department of Informatics University of Hamburg 26-07-2011 1 / 21
Motivation High Performance Computing Increasing performance and efficiency of calculation units But: Increasing need for calculation power increases size of installations High operational costs for large-scale high performance installations High carbon footprint of installations should be reduced for environmental and political reasons 2 / 21
1 Introduction 2 Hardware Power Saving Modes in HPC 3 Power Mode Control System 4 Evaluation 5 Conclusion and Future Work 3 / 21
Introduction (1) High Performance Computing (HPC) Important tool in natural sciences to analyze scientific questions in silico Modeling and simulation instead of performing time consuming and error prone experiments Models from weather systems to protein folding to nanotechnology Leads to new observations and unterstanding of phenomena 4 / 21
Introduction (2) Top500 list http://www.top500.org Since 1993 the Top500 list gathers information about achieved performance of supercomputers Exponential growth in computing performance can be observed Current (June 2011) rank #1: K computer by Fujitsu Peak performance: 8773.63 TFlop/s Power consumption: 9898.56 KW Green500 list http://www.green500.org Ranking based on energy-efficiency Energy-efficiency is defined as Flop/s per Watt K computer rank (June 2011): #6 5 / 21
Exascale computing Roadmap "Practical power limit" of 20 MW (U.S. Department of Energy) Energy-efficiency must be increased from different viewpoints The data-center itself including cooling facilities etc. The hardware running the scientific applications The scientific applications themselves 6 / 21
Hardware power saving modes Adaption of mechanism from mobile devices Idle / power saving modes of hardware Dynamic Voltage and Frequency Scaling of processors (DVFS) Adapt frequency and disable ports of network devices Spin down and flush caches of hard disks HPC related problems Synchronization problems OS Jitter increase Possible performance loss 7 / 21
Idle power saving potential Sheet5 180 160 AMD Opteron Intel Xeon 140 120 100 Watt 80 60 40 20 0 CPU Max Freq CPU Min Freq CPU Min Freq + CPU Min Freq + NIC 100 Mbit NIC 100 Mbit + Disk sleep Opteron: up to 11 % power savings Xeon: up to 18 % power savings Page 1 8 / 21
CPU power saving potential for Xeon node Sheet5 400 350 Load Idle 300 250 Watt 200 150 100 50 0 1600 1733 1867 2000 2133 2267 2400 2533 2667 2800 2801 MHz 30 % power savings Page 1 Interesting in phases of busy-waiting or memory-boundness 9 / 21
When to switch which component... Performance/usage prediction Utilization based approach Instructions Per Second (IPS) Other performance counters (e.g. memory bandwidth) Knowledge about future hardware use Application developer Compiler Libraries 10 / 21
Daemon architecture 11 / 21
Daemon design Server daemon Make decision about hardware power states Processor Reduce frequency (P-States) using cpufreq Network card Reduce speed / switch duplex mode using ethtool Harddisk Reduce OS access and enforce standby modes using hdparm Client library Linked to (MPI) application Forwards desired device power state via sockets to server 12 / 21
Test MPI applications partdiff-par PDE solver Computation intensive phases, communication intensive phases and IO phases PEPC Pretty Efficient Parallel Coulomb-solver Computation intensive phases and communication intensive phases 13 / 21
Hardware Details 3 LMG 450 power meter 4 channels each up to 20 samples per second 5 AMD Opteron 6168 Dual socket 24 cores per node 5 Intel Xeon X5560 Dual socket 8 cores per node SMT disabled 14 / 21
Experimental setup Application test setup Instrumented: State switching dependent on phase CPU Max Freq: Processor frequency set to maximum CPU Min Freq: Processor frequency set to minimum CPU Ondemand: Processor governor set to ondemand Results Time-to-Solution (TTS): Total time for test setup Energy-to-Solution (ETS): Total energy for test setup 15 / 21
TTS and ETS for partdiff-par on Xeon nodes Sheet5 450 400 350 300 Time-to-Solution Energy-to-Solution 180 160 140 120 seconds 250 200 150 100 50 100 80 60 40 20 kj 0 0 Instrumented CPU Min Freq CPU Max Freq CPU Ondemand Freq 5 % savings in Energy-to-Solution Time-to-Solution increase of about 4 % Page 1 16 / 21
TTS and ETS for partdiff-par on Opteron nodes Sheet5 1200 1000 Time-to-Solution Energy-to-Solution 600 500 800 400 seconds 600 400 300 200 kj 200 100 0 0 Instrumented CPU Min Freq CPU Max Freq CPU Ondemand Freq 8 % energy savings Runtime increase of 9 % Page 1 17 / 21
TTS and ETS for PEPC on Opteron nodes Sheet5 2500 2000 1500 Time-to-Solution Energy-to-Solution 2000 1800 1600 1400 1200 seconds 1000 500 1000 800 600 400 200 kj 0 0 Instrumented CPU Max Freq CPU Instrumented CPU Min Freq 7 % energy savings, 4 % runtime increase compared to Page 1 4 % energy savings, 2 % runtime increase 18 / 21
Conclusions Power consumption of idle nodes can be reduced by 11 % and 18 % respectively Power consumption can be decreased by more than 30 % in phases with unnecessary high utilization (e.g. busy-waiting) Control device power states from userspace by introducing a hardware management daemon Reduce the Energy-to-Solution by up to 8 % with an Time-to-Solution increase of about 9 % for presented applications 19 / 21
Future work Identify energy saving possibilities (Scalasca enhancement) Benchmark to measure power consumption in various states Enhanced measurements with larger count of applications 20 / 21
CPU power saving potential (C-States enabled) Sheet5 400 350 300 Load, cstates = off Load, cstates = on Idle, cstates = off Idle, cstates = on 250 Watt 200 150 100 50 0 1600 1733 1867 2000 2133 2267 2400 2533 2667 2800 2801 MHz 21 / 21