Impact of Manufacturing Variability in Power Constrained Supercomputing. Koji Inoue. Kyushu University

Size: px

Start display at page:

Download "Impact of Manufacturing Variability in Power Constrained Supercomputing. Koji Inoue. Kyushu University"

Cameron Phelps
5 years ago
Views:

1 Impact of Manufacturing Variability in Power Constrained Supercomputing Koji Inoue Kyushu University

2 Trends of Supercomputing 1 Exa Flops (10 18 Floating-point Operations Per Second) World-Wide Next Target 1 Exa MW 33.9 Peta MW 30X 1.7X Need to improve power efficiency!

3 Overprovisioned Systems Under-provisioned (Conventional) HW Design ü Ensures the PEAK system power does NOT exceed the limit SW Design ü Tries to maximize the activity of HW components ) ( ) ( ) ( ) ( : ) Over-provisioned HW Design: ü Allows to install HWs w/o considering the power limit ü Provides power-performance knobs SW Design: ü Tunes the knobs to maximize the performance based on SW workloads ü Ensures the ACTUAL system power does NOT exceed the limit ( ) ) )

4 Experimental Setup HPC Challenge: star DGEMM, star STREAM(Triad) NPB: BT, SP, EP Magneto Hydro-Dynamics(MHD) simulation Typical stencil app. to simulate space plasma Calculations and communications appear in turn Fiber benchmark suite: mvmc-mini (mvmc) Blue=EP type Red=With Comm. & Sync. Variational Monte-Carlo simulation for strongly correlated electron system Cab(LLNL) Intel E Sandy Bridge 1, RAPL BG/Q Vulcan (LLNL) IBM PowerPC A2 24, (compute) EMON Teller (SNL) AMD A K Piledriver PI HA8K(Kyushu Univ.) Intel E5-2697v2 Ivy Bridge RAPL

5 Terminology Module CPU CPU Module core core core core memory module memory module memory module memory module core core cache core core cache memory module memory module memory module memory module MC MC CPU Processor chip (including cores, cache, MC, etc.) Module A pair of a CPU and DRAMs directly connected to it

6 Impact on CPU Frequency star DGEMM 140"" 120"" Module"(CPU+DRAM)"power" 30% w/ a uniform power constraint 120# No#power#constraint 110# CPU$power$cap CPU#power#cap Power""[W] 100"" 80"" 60"" CPU$power$ CPU#Power#[W] 100# 90# 80# 40"" 20"" DRAM%power% 70# 60# 0"" 50# 0" 300" 600" 900" 1200" 1500" 1800" 1.5## 2.5## Module"IDs CPU#clock#frequency#[GHz] Power variation is translated into CPU frequency variation applying UI uniform Kyudai Taro,2007 power constraint!

7 Impact on Application Performance star DGEMM Module#(CPU+DRAM)#Power#[W] 140# 130# 120# 110# 100# 90# 80# 70# 60# 50# w/ a uniform power constraint No#power#constraint Cm=110W Cm=100W Cm=90W Cm=80W Cm=Target#Average#Power# Constraint#for#Module Cm=70W 64% 40# 0.8## ## 1.6## 2.4## 2.8## Normalized#ExecuIon#Time 3.2##

8 Problem and Goal Power-Constraint Supercomputing will be applied to future HPC systems Manufacturing Variability leads to performance variation under power constraint Our Goal Mitigate the impact of manufacturing variability on performance of HPC apps. under power constraint!

9 Concept Variation-Aware Power Budgeting W/O power-constraint Power W/ power-constraint (Conventional) W/ power-constraint (Proposed) Power variation Mitigate Variability Same total power budget Performance (=CPU Frequency)

Variation-Aware Power Budgeting Strategy Inputs

Insert PMMDs HPC Applica+on with PMMDs Test Runs

Power-performance model for all modules Final App.

10 Variation-Aware Power Budgeting Strategy Inputs Output HPC Applica+on Source Code Analysis to Insert PMMDs HPC Applica+on with PMMDs Test Runs on a Single Module App. Input Data App. Specific Power Profile Module-level Power Alloca+ons Varia+on-Aware Power Budge+ng Algorithm Power-performance model for all modules Final App. Runs Module Alloca+on (Scheduler) App. level Power Constraint predicted measured

Consumption 1 1.0 1 Module 2 k 1.2 k 120W Module 3 N 0.

11 Power Model Calibration Application-independent Power Variation Table (PVT) Estimated Application Specific Power Consumption Module 1 Module ID Normalized Power Module ID Power Consumption Module 2 k 1.2 k 120W Module 3 N 0.8 N Obtained once at system installation Test run on a module! Measured power on single module k Power Module ID Consumption k 120W Module N

12 Power Model Calibration Application-independent Power Variation Table (PVT) Estimated Application Specific Power Consumption Module 1 Module ID Normalized Power Module ID Power Consumption Module 2 k 1.2 k 120W Module 3 N 0.8 N (120W/1.2) x 0.8 application dependent average power Measured power on single module k Power Module ID Consumption k 120W Module N

Consumption 1 1.0 1 Module 2 k 1.2 k 120W Module 3 N 0.8 N 80W (120W/1.2) x 0.

13 Power Model Calibration Application-independent Power Variation Table (PVT) Estimated Application Specific Power Consumption Module 1 Module ID Normalized Power Module ID Power Consumption Module 2 k 1.2 k 120W Module 3 N 0.8 N 80W (120W/1.2) x 0.8 application dependent power on module-n Measured power on single module k Power Module ID Consumption k 120 Module N

14 Options for Power Setting Two options for power settings Power Capping (Pc) using RAPL Frequency Selection (Fs) using CPUFreqlibs Power Capping (Pc) Frequency Selection (Fs) Power Constraint Guaranteed Not guaranteed Performance Equivalence Not guaranteed Guaranteed

15 Tested Power Budgeting Methods Method Name Application Specific Variation Aware Power-Performance Model Pwr. Set. Naive No No Power Cap Pc Yes No Calibration Power Cap VaPc Yes Yes Calibration Power Cap VaFs Yes Yes Calibration Freq. Sel. VaPcOr Yes Yes Oracle Power Cap VaFsOr Yes Yes Oracle Freq. Sel. Va=Variation-Aware, Pc=Power Capping, Fs=Frequency Selection Or=Observed power data are used

16 Speedup Ratios Normalized to Naïve (star DGEMM on 1,920 modules) 1.5## 0.5## 1.5## 6.0## 5.0## 5.0## 140# 4.0## 140# 4.0## 3.0## Cs=ApplicaIon#level#power#constraint 3.0## 130# 130# No#power#constraint 120# 120# Cs=211.2KW 110# 110# Module#(CPU+DRAM)#Power#[W] 100# 90# 80# 70# 60# 50# Cs=192.0kW Cs=172.8kW Cs=153.6kW 64% 0.5## before Cs=134.4kW 40# 0.8## ## 1.6## 2.4## 2.8## 3.2## Normalized#ExecuIon#Time Module#(CPU+DRAM)#power#[W] 100# 90# 80# 70# 60# 50# 3.0## 2.5## 1.5## 0.5## 3.5## 3.0## 2.5## 1.5## 0.5## No#power#constraint Cs=211.2KW Cs=192.0kW# Cs=172.8kW Cs=ApplicaIon#level#power#constraint Cs=153.6kW Cs=134.4kW 12% after 40# 0.8## ## 1.6## 2.4## 2.8## 3.2## Normalized#ExecuIon#Time

17 Speedup Ratios Normalized to Naïve (All results on 1,920 modules) 1.5## 0.5## 6.0## 5.0## 4.0## 3.0## 1.5## 0.5## 5.0## 4.0## 3.0## 3.0## 2.5## 1.5## 0.5## 3.5## 3.0## 2.5## 1.5## 0.5## 5.4X speedup at maximum (NPB-BT) 1.8X speedup in average

18 Conclusions Power constrained computing becomes main-stream! Manufacturing variability causes serious performance issue! Optimize power resource allocation!

19 Acknowledgements This research was supported by JST CREST. Special thanks to Dr. Yuichi Inadomi, Prof. Masaaki Kondo, Dr. Tapasya Patki, Dr. Martin Schulz, and other all members of this project.

Analyzing and Mitigating the Impact of Manufacturing Variability in Power-Constrained Supercomputing

Analyzing and Mitigating the Impact of Manufacturing Variability in Power-Constrained Supercomputing Yuichi Inadomi 1, Tapasya Patki 2, Koji Inoue 1, Mutsumi Aoyagi 1, Barry Rountree 3, Martin Schulz 3,