Managing Hardware Power Saving Modes for High Performance Computing

Similar documents
Tracing and Visualization of Energy Related Metrics

Energy-centric DVFS Controlling Method for Multi-core Platforms

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

POWER MANAGEMENT AND ENERGY EFFICIENCY

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

The Mont-Blanc Project

Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems

Energy Models for DVFS Processors

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

Portable Power/Performance Benchmarking and Analysis with WattProf

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION

Power Management for Embedded Systems

Overview. Energy-Efficient and Power-Constrained Techniques for ExascaleComputing. Motivation: Power is becoming a leading design constraint in HPC

Servers: Take Care of your Wattmeters!

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms

Pedraforca: a First ARM + GPU Cluster for HPC

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Motivation Goal Idea Proposition for users Study

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques

Tuning Alya with READEX for Energy-Efficiency

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Practical Scientific Computing

Challenges in HPC I/O

Matrix Computations. Enrique S. Quintana-Ortí. September 11, 2012, Hamburg, Germany

HPC projects. Grischa Bolls

Intel profiling tools and roofline model. Dr. Luigi Iapichino

The ICARUS white paper: A scalable, energy-efficient, solar-powered HPC center based on low power GPUs

Operating Systems Design 25. Power Management. Paul Krzyzanowski

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Brand-New Vector Supercomputer

KNL tools. Dr. Fabio Baruffa

Cray XC Scalability and the Aries Network Tony Ford

Mathematical computations with GPUs

ECE 574 Cluster Computing Lecture 18

The Future of Interconnect Technology

Architecture without explicit locks for logic simulation on SIMD machines

Parallel Programming

OP2 FOR MANY-CORE ARCHITECTURES

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Energy Efficiency for Exascale. High Performance Computing, Grids and Clouds International Workshop Cetraro Italy June 27, 2012 Natalie Bates

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

Static and Dynamic Frequency Scaling on Multicore CPUs

Potentials and Limitations for Energy Efficiency Auto-Tuning

COL862 Programming Assignment-1

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

Power and Energy Management

Energy Aware Scheduling in Cloud Datacenter

Power Bounds and Large Scale Computing

Multicore Performance and Tools. Part 1: Topology, affinity, clock speed

Abhishek Pandey Aman Chadha Aditya Prakash

Power Measurement Using Performance Counters

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

ECE 571 Advanced Microprocessor-Based Design Lecture 7

GPU-centric communication for improved efficiency

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing

A Global Operating System for HPC Clusters

Managing data flows. Martyn Winn Scientific Computing Dept. STFC Daresbury Laboratory Cheshire. 8th May 2014

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

HPC Technology Trends

Green Supercomputing

The Mont-Blanc approach towards Exascale

Adaptive Power Profiling for Many-Core HPC Architectures

High performance computing and numerical modeling

John Hengeveld Director of Marketing, HPC Evangelist

Application Performance on Dual Processor Cluster Nodes

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces

LEoNIDS: a Low-latency and Energyefficient Intrusion Detection System

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications

Toward Automated Application Profiling on Cray Systems

Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

Analyzing I/O Performance on a NEXTGenIO Class System

A priori power estimation of linear solvers on multi-core processors

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

LOWERING POWER CONSUMPTION OF HEVC DECODING. Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014

CPU-GPU Heterogeneous Computing

COL862 - Low Power Computing

ECE 571 Advanced Microprocessor-Based Design Lecture 6

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Modeling CPU Energy Consumption for Energy Efficient Scheduling

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Performance Analysis and Prediction for distributed homogeneous Clusters

Parallel Computing. Parallel Computing. Hwansoo Han

Managing Data Center Power and Cooling

CME 213 S PRING Eric Darve

Building supercomputers from commodity embedded chips

Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Unifying UPC and MPI Runtimes: Experience with MVAPICH

Advances of parallel computing. Kirill Bogachev May 2016

CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER

Transcription:

Managing Hardware Power Saving Modes for High Performance Computing Second International Green Computing Conference 2011, Orlando Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr timo.minartz@informatik.uni-hamburg.de Scientific Computing Department of Informatics University of Hamburg 26-07-2011 1 / 21

Motivation High Performance Computing Increasing performance and efficiency of calculation units But: Increasing need for calculation power increases size of installations High operational costs for large-scale high performance installations High carbon footprint of installations should be reduced for environmental and political reasons 2 / 21

1 Introduction 2 Hardware Power Saving Modes in HPC 3 Power Mode Control System 4 Evaluation 5 Conclusion and Future Work 3 / 21

Introduction (1) High Performance Computing (HPC) Important tool in natural sciences to analyze scientific questions in silico Modeling and simulation instead of performing time consuming and error prone experiments Models from weather systems to protein folding to nanotechnology Leads to new observations and unterstanding of phenomena 4 / 21

Introduction (2) Top500 list http://www.top500.org Since 1993 the Top500 list gathers information about achieved performance of supercomputers Exponential growth in computing performance can be observed Current (June 2011) rank #1: K computer by Fujitsu Peak performance: 8773.63 TFlop/s Power consumption: 9898.56 KW Green500 list http://www.green500.org Ranking based on energy-efficiency Energy-efficiency is defined as Flop/s per Watt K computer rank (June 2011): #6 5 / 21

Exascale computing Roadmap "Practical power limit" of 20 MW (U.S. Department of Energy) Energy-efficiency must be increased from different viewpoints The data-center itself including cooling facilities etc. The hardware running the scientific applications The scientific applications themselves 6 / 21

Hardware power saving modes Adaption of mechanism from mobile devices Idle / power saving modes of hardware Dynamic Voltage and Frequency Scaling of processors (DVFS) Adapt frequency and disable ports of network devices Spin down and flush caches of hard disks HPC related problems Synchronization problems OS Jitter increase Possible performance loss 7 / 21

Idle power saving potential Sheet5 180 160 AMD Opteron Intel Xeon 140 120 100 Watt 80 60 40 20 0 CPU Max Freq CPU Min Freq CPU Min Freq + CPU Min Freq + NIC 100 Mbit NIC 100 Mbit + Disk sleep Opteron: up to 11 % power savings Xeon: up to 18 % power savings Page 1 8 / 21

CPU power saving potential for Xeon node Sheet5 400 350 Load Idle 300 250 Watt 200 150 100 50 0 1600 1733 1867 2000 2133 2267 2400 2533 2667 2800 2801 MHz 30 % power savings Page 1 Interesting in phases of busy-waiting or memory-boundness 9 / 21

When to switch which component... Performance/usage prediction Utilization based approach Instructions Per Second (IPS) Other performance counters (e.g. memory bandwidth) Knowledge about future hardware use Application developer Compiler Libraries 10 / 21

Daemon architecture 11 / 21

Daemon design Server daemon Make decision about hardware power states Processor Reduce frequency (P-States) using cpufreq Network card Reduce speed / switch duplex mode using ethtool Harddisk Reduce OS access and enforce standby modes using hdparm Client library Linked to (MPI) application Forwards desired device power state via sockets to server 12 / 21

Test MPI applications partdiff-par PDE solver Computation intensive phases, communication intensive phases and IO phases PEPC Pretty Efficient Parallel Coulomb-solver Computation intensive phases and communication intensive phases 13 / 21

Hardware Details 3 LMG 450 power meter 4 channels each up to 20 samples per second 5 AMD Opteron 6168 Dual socket 24 cores per node 5 Intel Xeon X5560 Dual socket 8 cores per node SMT disabled 14 / 21

Experimental setup Application test setup Instrumented: State switching dependent on phase CPU Max Freq: Processor frequency set to maximum CPU Min Freq: Processor frequency set to minimum CPU Ondemand: Processor governor set to ondemand Results Time-to-Solution (TTS): Total time for test setup Energy-to-Solution (ETS): Total energy for test setup 15 / 21

TTS and ETS for partdiff-par on Xeon nodes Sheet5 450 400 350 300 Time-to-Solution Energy-to-Solution 180 160 140 120 seconds 250 200 150 100 50 100 80 60 40 20 kj 0 0 Instrumented CPU Min Freq CPU Max Freq CPU Ondemand Freq 5 % savings in Energy-to-Solution Time-to-Solution increase of about 4 % Page 1 16 / 21

TTS and ETS for partdiff-par on Opteron nodes Sheet5 1200 1000 Time-to-Solution Energy-to-Solution 600 500 800 400 seconds 600 400 300 200 kj 200 100 0 0 Instrumented CPU Min Freq CPU Max Freq CPU Ondemand Freq 8 % energy savings Runtime increase of 9 % Page 1 17 / 21

TTS and ETS for PEPC on Opteron nodes Sheet5 2500 2000 1500 Time-to-Solution Energy-to-Solution 2000 1800 1600 1400 1200 seconds 1000 500 1000 800 600 400 200 kj 0 0 Instrumented CPU Max Freq CPU Instrumented CPU Min Freq 7 % energy savings, 4 % runtime increase compared to Page 1 4 % energy savings, 2 % runtime increase 18 / 21

Conclusions Power consumption of idle nodes can be reduced by 11 % and 18 % respectively Power consumption can be decreased by more than 30 % in phases with unnecessary high utilization (e.g. busy-waiting) Control device power states from userspace by introducing a hardware management daemon Reduce the Energy-to-Solution by up to 8 % with an Time-to-Solution increase of about 9 % for presented applications 19 / 21

Future work Identify energy saving possibilities (Scalasca enhancement) Benchmark to measure power consumption in various states Enhanced measurements with larger count of applications 20 / 21

CPU power saving potential (C-States enabled) Sheet5 400 350 300 Load, cstates = off Load, cstates = on Idle, cstates = off Idle, cstates = on 250 Watt 200 150 100 50 0 1600 1733 1867 2000 2133 2267 2400 2533 2667 2800 2801 MHz 21 / 21