Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh

Size: px

Start display at page:

Download "Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh"

Britton Logan
5 years ago
Views:

1 Program-Driven Fine-Grained Power Management for the Reconfigurable Mesh Heiner Giefers, Marco Platzner Computer Engineering Group University of Paderborn {hgiefers,

2 Outline 1. Introduction 2. Methods and extensions for power management 3. Power model and measurements 4. Evaluation 5. Conclusion 2

3 Reconfigurable Mesh Principles Massively parallel programming model Tiled architecture Processing elements (PE) tightly connected to local switch elements (SE) Connection autonomy 15 possible patterns Cores execute in lock step 1. Switch configuration 2. Communication 3. (Constant time) computation Thrifty interconnect Low-latency Lightweight Low-power 3

4 Reconfigurable Mesh Implementation on FPGA FPGA prototype 16x16 R-Mesh (Picoblaze PEs) Resource utilization on XCV4LX slices (71%) 272 BRAMs (80%) 100 MHz R-Mesh as a co-processor [FPL 07] Communication Chanels Communication Channels Application mapping [SAMOS 08] Fault-tolerance and scalability [ARCS 09] ARMLang language and compiler [RAW 09] Configurable Switch 4

5 Some Reconfigurable Mesh Algorithms Problem Mesh Size Time EXOR of n bits 2n x 3 Prefix-And of n 1-bit numbers 1 x n Maximum of n (log n)-bit numbers Addition of n k-bit numbers Multiplication of two n-bit numbers n x n n x nk n x n logic, arithmetic, sorting and selection Division of two n-bit numbers n x n Sort of n O(log n) bit numbers n x n Convex Hull of n points n x n Smallest enclosing rectangle of n points n x n Triangulation of n planar points n² x n All-pairs nearest neighbors of n points Two-set dominance counting of n points n x n n x n graph and image algorithms Connected components of an n x n image n x n O(log n) It's all about runtime complexity 5

6 Power Management for the Reconfigurable Mesh Power management is not yet investigated for the reconfigurable mesh Power has never been a research topic as the reconfigurable mesh was mainly a theoretic model The reconfigurable mesh has a high potential for power management! 6

7 Outline 1. Introduction 2. Methods and extensions for power management 3. Power model and measurements 4. Evaluation 5. Conclusion 7

8 Two Strategies for Power Management Sleep-while-unused A significant portion of PEs might not be used during certain algorithm steps Stall PEs when not used Sleep-while-waiting In the theoretic model, any communication phase takes time In practical implementations, a communication phase takes a certain waiting time Stall PEs during communication 8

9 Extensions for Power Management (1) ARMLang includes SLEEP and WAKEUP instructions When a core executes the SLEEP instruction, it gets immediately stalled When a core executes the WAKEUP instruction, all cores are being signaled to return from sleep mode sleep ctrl SLEEP SLEEP SLEEP SLEEP WAKEUP 9

10 Extensions for Power Management (2) Hardware extensions for the sleep mode Standard Picoblaze PE lacks stalling capabilities Factor the stall signal in the write enable of the PC and register file Overhead is one single slice (or ~1%) Picoblaze consumes ~40% less power when stalled (compared to executing NOPs) Program Counter (PC) CALL/RETURN Stack 10

11 Outline 1. Introduction 2. Methods and extensions for power management 3. Power model and measurements 4. Evaluation 5. Conclusion 11

12 Power Model for the Reconfigurable Mesh Goal: Find a power model which delivers good power estimates for arbitrary reconfigurable mesh programs + A node can operate in one of two power modes: P and P + + is the time, node i spends in high power mode P T i Runtime: T m + = Ti + Ti Dynamic consumption: Find good approximations for N P m = and ( ( ) ) T + T P + P + N 1 i 1 i= 0 m T + P P T + i m T m + P m P P 12

13 Power Estimation for FPGA Circuits Power estimation tools do a good job, but reliable estimations are time-consuming Detailed knowledge about switching activities is needed Power estimation with Xilinx Power Analyzer (XPA) for a 16-core reconfigurable mesh bin-tree addition 200µs simulation (Modelsim) on a 2.4 GHz Core 2 Duo, 8GB RAM takes 33 min. and produces 6.2 GB of data Power estimation with Xilinx XPA takes additional 43 min. Estimation Modality Total Power Static estimation (no simulation data) 445 mw Complete execution time 974 mw Reconfigurable mesh kernel (with I/O) Reconfigurable mesh kernel (w/o I/O) 840 mw 754 mw Stimuli selection is time consuming 13

14 Measuring Environment Measurements on a XUP board (xc2vp30) Board allows for separate supplies for 1.5V core and 2.5/3.3V general purpose and I/O rails Execute kernel inside of an infinite loop Compare same program on different architectures (varying #nodes) Estimation/measurement derivation of ~10% 14

15 Measurement Results Every measurement includes static power Measure a configured but not clocked FPGA Static power of the device 21 mw (14 ma) Dynamic power of a stalled Picoblaze PE: mw Power reduction of 39.4% - 43% Dynamic power of a stalled Microblaze host system 135 mw Program pattern Column broadcasts Neighbor communication Maximum load Stalled node Power per node mw mw mw mw + P P 15

16 Outline 1. Introduction 2. Method and extensions for power management 3. Power model and measurements 4. Evaluation 5. Conclusion 16

17 Evaluation: Sleep-while-unused Case study: binary tree simulation algorithm Algorithm for NxN R-Mesh takes 2log(N) steps After each step half of the active nodes remain unused Slightly higher derivation with power management Power model does not include the power overhead caused by the sleep/wake up mechanism Power Model [mw] Measurements [mw] Derivation w/o Power Managememt P with Power Management ( ) m % 3.6% Power Savings 22.17% 19.68% 17

18 Evaluation: Sleep-while-waiting Stall reader and unused PEs during communication phases Active waiting WHERE PID%WIDTH==0 DO WRITE(a) ELSEWHERE WAIT(WIDTH) END; READ(a); Stalled waiting WHERE PID%WIDTH==0 DO WRITE(a); WAIT(WIDTH-1) ELSEWHERE SLEEP END; WAKEUP; READ(a); Measured several communication patterns (4x4 R-Mesh) Active waiting: ~568 mw Stalled waiting: 331 mw 429 mw Power reduction: 26% 44% Applied to a 16x16 R-Mesh sparse matrix multiplication, the sleep-while-waiting method reduces overall energy dissipation by 15.8% % 18

19 Summary and Outlook First study on power management for the reconfigurable mesh Sleep-while-unused Sleep-while-waiting Significant energy savings Power model delivers good power estimates Future Work Automated power management by the ARMLang compiler Comparisons of reconfigurable mesh inspired interconnects to classical circuit- and packet-switched NoCs are of interest 19

20 Thanks for your Attention! Questions? 20

EE 459/500 HDL Based Digital Design with Programmable Logic

EE 459/500 HDL Based Digital Design with Programmable Logic Lecture 17 From special-purpose FSMD to general-purpose microcontroller: Xilinx s PicoBlaze 1 Overview From FSMD to Microcontroller PicoBlaze