An FPGA Architecture Supporting Dynamically-Controlled Power Gating Altera Corporation March 16 th, 2012 Assem Bsoul and Steve Wilton {absoul, stevew}@ece.ubc.ca System-on-Chip Research Group Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C., Canada
What this talk is about An FPGA architecture supporting dynamic power gating: Turn off regions at run-time to save power, with on-chip control ASIC designers do this regularly Challenges for an FPGA: We don t know about application Routing for signals Handling wakeup in a programmable way control System-on-chip with 2 islands 2
Motivation for new FPGA Arch. FPGA leakage power is major component of total power Almost the same amount as dynamic power for 28nm Cyclone V: Static 32%, Dynamic 42%, I/O 26%* High-end FPGAs are power hungry Entering an era where we can t turn it all on at once! Large power leads to heat issues reliability, cooling cost Mobile hand-held applications Many applications have regions with long idle periods *Altera WP-01158-2.0: Meeting the Low Power Imperative at 28nm 3
Some Relevant Work 4
Statically-Controlled Power Gating Available FPGA power gating is statically-controlled Unused FPGA blocks are turned off at configuration time V DD SRAM Logic Cluster Useful if amount of unused resources in FPGA is large! However, designers select the smallest chip that fits their design Frequently idle blocks still dissipate leakage! 5
Commercial FPGAs Programmable Power Technology (Stratix III, IV, V) Change Vth of LABs according to location on critical path Not runtime-controlled! Power gating for embedded blocks Shut down unused I/O, memories, PLLs, transceivers, etc. Supported in both Altera and Xilinx Done at configuration time Flash*Freeze in IGLOO and ProASIC3L from Microsemi Allows sleep mode for the whole chip 6
We propose an FPGA architecture where regions of the chip can be turned on/off at runtime to control their power consumption 7
Outline Power gating architecture Complications: wakeup current System-level evaluation Current work Other issues 8
Outline Power gating architecture Complications: wakeup current System-level evaluation Current work Other issues 9
Our FPGA Architecture Big Picture Divide FPGA device into power-controlled regions Support dynamically-controlled sleep mode Use general-purpose routing fabric for control signals Power control signal Power domain Power gating region 10
Architecture Details 11
Logic Cluster LC power switch V DD 2 0 1 PG_CNTL1 Logic Cluster 12
Routing Channel Routing Channel Routing Channel Routing Channel Connection box In3 In4 Logic Cluster In1 In2 Track isolation buffer Routing channel A routing channel is shared between the two neighboring logic clusters In3 In4 Logic Cluster In1 In2 13
Routing Channel Routing Channel Routing Channel Routing Channel Routing channel power switch V DD 2 0 1 PG_CNTL1 PG_CNTL2 Turn off routing channel only when both neighbors are off. Routing Channel 14
15 Routing Channel Routing Channel Routing Channel Routing Channel We don t gate switch blocks right now focus of current work!
Control Signals Use general-purpose routing fabric to route control signals. Re-used free input pins. - Input pins are expensive! 16
Area Overhead: Region Power Gating Architecture granularity - Share sleep transistor among region of tiles. Control signal from bordering routing channels. RC RC RC RC From bordering connection blocks Interesting tradeoff: - Area vs. CAD difficulty RC RC Power gating region, size 2x2 17
Evaluation The Good: Potential leakage reduction The Bad: Area, Delay, Leakage power overhead 18
Experimental Setup Sweep architecture N (cluster) W (channel) Create HSPICE netlist (automatic LC buffer sizing) R (region) Run HSPICE to measure performance 45 nm PTM Create HSPICE netlist for power gating architecture Architecture parameters (N, I, W, Fcout,...) Three architectures Ungated Static-gating (SG) Resize sleep transistors No Meets performance constraints (10%)? Dynamic-gating (DG) Yes Collect results (leakage power and area) 19
Granularity Results: Leakage Power Compared to ungated, static-gating and dynamic-gating reduce leakage in sleep mode by more than 44%. Dynamic-gating has 0.8% more leakage than static-gating (R=4). Leakage reduction increase with increased R. 20
Granularity Results: Area overhead Area overhead compared to ungated: dynamic-gating 0.75%, static-gating 0.57% (R=4). Area overhead decrease as R increase. 21
Delay overhead is 10% by design: We chose sleep transistors with delay impact no more than 10% Tradeoff: delay overhead vs. area overhead 22
Outline Power gating architecture Complications: wakeup current System-level evaluation Current work Other issues 23
What is the inrush current problem? 24
Inrush Current in Power Gating VDD SLEEP OFF ON Power-gated functional block Current (ma) Inrush current Voltage droop on power grid lines malfunction of the design Voltage on power grid Voltage droop 25
In power-gated ASIC designs, how the inrush current problem is solved? 26
Related work ASIC Domain Daisy chaining: A chain of parallel power switches instead of one large switch Delay the wakeup of each stage to limit inrush current V DD Virtual V DD Power control T 1 T n-1 27
How the problem is different in our FPGA? 28
Inrush current in ASIC vs. FPGA In ASICs Power gating is well known Application is known at fabrication time Inrush current requirements are known at design time In FPGAs Application is not known at fabrication time Sizes and locations of power-gated blocks are not known Inrush current requirements are not known The solution needs to be configurable! 29
How serious the problem is in our FPGA? 30
FPGA Power Grid Modeling A model of the FPGA power grid to evaluate the effect of inrush current: Multiple metal layers that represent the power grid FPGA tiles modeled as a grid of current sources Obtain current values from power analysis of MCNC benchmarks (max average current per tile Imax_tile 400µA) 31
Effect of Inrush Current Voltage drop due to inrush current in our FPGA architecture. Voltage Drop (mv) Finer granularity Voltage drop on power grid is larger than 100mV! 32
What are the possible solutions for our power-gated FPGA? 33
A Possible FPGA Solution Ask the designer to take care of it!! How? Create a power controller that activates multiple signals in sequence to wakeup one functional block: Requires user experience and knowledge. Complicates design process. May result in power controller with large power consumption. 34
The Proposed Architecture A configurable architecture to limit inrush current. Has two levels 1) Fixed intra-region level: Ensures we can turn on individual regions safely. 2) Configurable inter-region level: Safely sequences the wakeup of a power-gated app. 35
Fixed Intra-region Architecture 36
Use parallel sleep transistors (STs) and sequence wakeup using fixed delay elements. Intra-region Architecture Sizes of STs and delay elements is based on maximum allowed current. A Tradeoff exists between area overhead of intra-region level and power grid metal area. 37
Configurable Inter-region Architecture 38
Inter-region Architecture Intra-region level should solve the problem (theoretically!) However In practice more current might be drawn due to: Switching of unrelated signals in routing. Switching of inputs to logic clusters partially turned on. Which is application specific! Instead of turning on a complete power-gated application at once, turn on one power gating region at a time. 39
Inter-region Architecture FPGA chip with power gating regions. Power gating region in FPGA 40
Inter-region Architecture A power-gated app. is mapped to one or more regions. 41
Inter-region Architecture During wakeup: turn on one region at a time. After 1 x T Delay is inserted before the turn on of next region. 42
Inter-region Architecture During wakeup: turn on one region at a time. After 2 x T Delay is inserted before the turn on of next region. 43
Inter-region Architecture During wakeup: turn on one region at a time. After 3 x T Delay is inserted before the turn on of next region. 44
Inter-region Architecture During wakeup: turn on one region at a time. After 4 x T 45
Inter-region Architecture We need to insert delays between the turn-on of regions. However we don t know: the shape, size, and location of a power-gated app. on FPGA. how many delays are required before turning on a region. Use programmable delay elements (PDEs). 46
Inter-region Architecture ST n V DD ST ST Virtual V 2 1 DD T n-1 T 1 0 1 Inter-region P D E MUX Power Gating Region SLEEP From bordering routing channels Programmable delay element (PDE) PDE size (number of inputs) the largest power-gated app. that can be turned on using one control signal. 47
Inter-region Architecture C1 C1 C1 C1 48
Inter-region Architecture C1 C1 49
Inter-region Architecture C1 C1 C2 C1 C1 C2 50
Results: Inrush Current Handling Settings: Region size = 4x4 Typical power grid (not oversized) Worst case temperature PDE size (# of regions controlled using one signal): 50 Leakage power savings (compared to ungated): With inrush handling: 91% Without inrush handling: 97% Area overhead (compared to ungated): Less than 2.2% 51
Outline Power gating architecture Complications: wakeup current System-level evaluation Current work Other issues 52
Potential Leakage Energy Reduction Use a model that relates: Number and size of idle regions in application Proportion of the time idle regions can be turned off Size of the power controller Potential slowdown of application to the energy savings of the architecture 53
Potential Leakage Energy Reduction Leakage energy reduction (%) compared to ungated. 54 Fraction of idle times Leakage reduction (%) Fraction of idle regions
Conditions to Achieve Energy Savings To achieve savings: Esleep < Eidle Use a model to find the minimum idle time required in order to achieve energy savings when in sleep mode. For clma (775 tiles PDE size 49), Tidle 900 ns. Equivalent to 450 cycles on 500 MHz clock! 55
Outline Power gating architecture Complications: wakeup current System-level evaluation Current work Other issues 56
Switch Boxes Power Gating SBs dissipate more than 50% of static power in FPGA tile! Challenges in power gating SBs: Used to route SLEEP signals Unrelated signals could be routed through SBs in power domain SLEEP 57
Switch Boxes Power Gating Solution: partial power gating for SBs Allows off mode for SBs inside power domain Allows partial off mode for shared SBs SLEEP 58
Outline Power gating architecture Complications: wakeup current System-level evaluation Current work Other issues 59
Applications Mapping to Architecture Mapping applications (high level): Automatic detection for power domains in app Automatic generation for power controller What about verification? 60
Evaluation and Benchmarks What we have used so far: HSPICE circuit models and simulations Architecture parameter sweep Application model to find potential savings, min Tidle, etc. What we really need: Real applications with power domains More advanced power model A way to map applications to architecture 61
Summary Summary of dynamically-controlled power gating FPGA: 1. Enables turning off unused logic at config time static power 2. Enables turning off idle logic at runtime static power 3. Control could come from off or on chip (from soft logic) 4. Programmable inrush current handling 5. Routing architecture is not changed (until now) Challenges: 1. Area overhead small though 2. Performance degradation we assume 10% for now 3. Mapping applications to architecture automated? 62