A Framework for Modeling GPUs Power Consumption

Size: px

Start display at page:

Download "A Framework for Modeling GPUs Power Consumption"

Benedict Farmer
5 years ago
Views:

1 A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January 21,

2 Outline 1 Motivation 2 Power-Simulation Framework: GPUSimPow 3 Case study 4 Validation 5 Results 6 Summary 2

3 Motivation Power is an increasingly important problem for GPUs Where does the power go? Evaluate architectural and programming optimizations We need GPU power simulation 3

4 Outline 1 Motivation 2 Power-Simulation Framework: GPUSimPow 3 Case study 4 Validation 5 Results 6 Summary 4

5 Power-Simulation Framework: GPUSimPow GPUSimPow GPGPUSim GPU cycle-accurate architectural simulation Activity Information GPGPUPow GPU Internal chip model GPGPU Code GPU Configuration Power & Area Results Extracted activity factors from GPU performance simulator McPAT framework as base GPU power modeling Added missing power models for GPU components to McPAT Integrated performance simulator and power model 5

6 Power equation where, n = No. of components α i = Activity factor C = Capacitance V dd = Supply voltage f clk = Clock frequency P i = α i C i Vdd 2 f clk + V }{{} dd I leakage (1) }{{} Dynamic Static n P total = P i (2) i=1 6

7 Performance Simulator: GPGPU-Sim GPGPU-Sim Simulator Cycle-Level accurate Architectures modeled GPU micro-architectures based on NVIDIA GeForce 8x, 9x, and Fermi series Supported Languages CUDA, OpenCL We extended GPGPU-Sim to get activity factors Current version of GPUSimPow uses 40 activity factors 7

8 Performance Simulator: GPGPU-Sim GPU architecture modeled by GPGPU-Sim Cluster SM SM SM Cluster SM SM SM Memory Controller Memory Controller Cluster SM SM SM 8

9 GPGPUPow: Adding GPU components Check if component already exists in McPAT Reuse the component e.g., Data Cache, PCIe-controller Not modeled in McPAT e.g., warp control unit, shared memory Use Cacti basic structures for regular components e.g., caches For irregular components Measurement based power models e.g., FU power model Well published numbers e.g., papers and patents 9

10 Outline 1 Motivation 2 Power-Simulation Framework: GPUSimPow 3 Case study 4 Validation 5 Results 6 Summary 10

11 Case study: Shared memory analytical power model Conflict Checker... Address XBar Bank 0 Bank 1 Bank 2... Bank n Data XBar 11

12 Case study: Shared memory analytical power model Conflict Checker... Address XBar McPAT Flipflops CACTI Crossbar Bank 0 Bank 1 Bank 2... Bank n CACTI SRAM Data XBar CACTI Crossbar 11

13 Case study: Functional units empirical power model Measurement based power model Microbenchmark to stress functional units at different activity levels Measure power differences between different activity levels Calculate energy per operation 12

14 Case study: FUs empirical based power model 35 Slot 12V Slot 3.3V Power [W] Time [s] 12 SMs running at 1.34GHz use W on (GT 240) Energy per Operation is W /( GHz) = 37.88pJ 13

15 Outline 1 Motivation 2 Power-Simulation Framework: GPUSimPow 3 Case study 4 Validation 5 Results 6 Summary 14

16 Validation: Measurement Testbed Key features I I I 15 Direct measurement of GPU power consumption Uses special PCIe riser card High sampling speed (31.5 khz)

17 Outline 1 Motivation 2 Power-Simulation Framework: GPUSimPow 3 Case study 4 Validation 5 Results 6 Summary 16

18 Results Experimental Setup Key features of the GPU architectures Feature GT240 GTX580 #Cores #Threads per core #FUs per core 8 32 Uncore clock 550 MHz 882 MHz #Warps in-flight L2-$ size 768KByte Process node 40nm 40nm Benchmarks NVIDIA CUDA SDK 3.1, rodinia

19 Simulated vs Measured Power for GT240 Simulated power(w) Measured power(w) Average error(%) Dynamic Static

20 Simulated vs Measured Power for GTX580 Simulated power (W) Measured power(w) Average error(%) Dynamic Static

21 Power profiling GPU Core Static [W] Dynamic [W] Percent Overall Cores NoC Memory Controller PCIe Controller Overall Base Power WCU Register File Execution Units LDSTU Undiff. Core Blackscholes power breakdown on GT240 Top: Power breakdown on entire GT240 Bottom: Power breakdown on single SM 20

22 Power profiling GPU Core Static [W] Dynamic [W] Percent Overall Cores NoC Memory Controller PCIe Controller Overall Base Power WCU Register File Execution Units LDSTU Undiff. Core Top: Power breakdown on entire GT240 Bottom: Power breakdown on single SM 21

23 Power profiling GPU Core Static [W] Dynamic [W] Percent Overall Cores NoC Memory Controller PCIe Controller Overall Base Power WCU Register File Execution Units LDSTU Undiff. Core Top: Power breakdown on entire GT240 Bottom: Power breakdown on single SM 22

24 Power profiling GPU Core Static [W] Dynamic [W] Percent Overall Cores NoC Memory Controller PCIe Controller Overall Base Power WCU Register File Execution Units LDSTU Undiff. Core Top: Power breakdown on entire GT240 Bottom: Power breakdown on single SM 23

25 Power profiling GPU Core Static [W] Dynamic [W] Percent Overall Cores NoC Memory Controller PCIe Controller Overall Base Power WCU Register File Execution Units LDSTU Undiff. Core Top: Power breakdown on entire GT240 Bottom: Power breakdown on single SM 24

26 Outline 1 Motivation 2 Power-Simulation Framework: GPUSimPow 3 Case study 4 Validation 5 Results 6 Summary 25

27 Summary GPU power simulation framework Analytical and measurement based Detailed, flexible and accurate power model Design space exploration Publicly available projekte/gpusimpow_simulator 26

28 27 Backup slides

29 Related work Hong et. al Empirical GPU power model Our GPU power model uses empirical and analytical models Measures whole PC power at 2 Hz sampling rate and subtracts PC idle power Our model is validated using GPU only power measurements with 31.5 khz sampling rate Ma et. al Statistical power model based on 5 performance counters Requires existing GPU for performance counters Our model is simulator based and can be used without a GPU Statistical model lacks detail power breakdown Our power simulator provides a detailed breakdown of the power consumption 28

30 Extending McPAT GPU components added to McPAT Warp control unit (Warp status table, Instruction buffers, Reconvergence stacks, Scoreboarding logic, Instruction decoder logic, Schedulers) GPU style register file Execution units (INT, FP32, SFU) Load-store unit (Coalescer, Bank conflict checker, AGU array, Per-core constant cache slice, Shared memory, L2 cache) GDDR Components reused from McPAT Memory controller PCIe-controller Execution units (INT, FP32, SFU) NoC 29

31 Empirically modeled cluster Power for GT Total Power 30 Power [W] Time [s] 12 kernel launches with increasing number of thread blocks 4 TPCs and 3 cores per TPC for GT240 30

32 Experimental Setup Key features of the GPU architectures Summary of experimental setup 31

33 Static Power and Area for GT240 and GTX580 GT240 GTX580 Static [W] Area [mm 2 ] Simulated Real Simulated Real

34 Runtime accuracy for GT240 Simulated runtimes normalized to measured runtimes Simulated time reported by gpgpu-sim and measured time by Nvidia profiler on real hardware 33

How a Single Chip Causes Massive Power Bills

How a Single Chip Causes Massive Power Bills GPUSimPow: A GPGPU Power Simulator Jan Lucas, Sohan Lal, Michael Andersch Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Department TU Berlin,