Raven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors

Size: px

Start display at page:

Download "Raven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors"

Laurence Nicholson
5 years ago
Views:

Raven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors Brian Zimmer 1,

Stevo Bailey 1, Milovan Blagojevic 1,2, Pi-Feng Chiu 1, Hanh-Phuc Le 1, Po-Hung Chen

Philippe Flatresse 2, Elad Alon 1, Krste Asanovic 1, and Borivoje Nikolic 1 1

1 Raven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors Brian Zimmer 1, Yunsup Lee 1, Alberto Puggelli 1, Jaehwa Kwak 1, Ruzica Jevtic 1, Ben Keller 1, Stevo Bailey 1, Milovan Blagojevic 1,2, Pi-Feng Chiu 1, Hanh-Phuc Le 1, Po-Hung Chen 1, Nicholas Sutardja 1, Rimas Avizienis 1, Andrew Waterman 1, Brian Richards 1, Philippe Flatresse 2, Elad Alon 1, Krste Asanovic 1, and Borivoje Nikolic 1 1 University of California, Berkeley, USA 2 STMicroelectronics, Crolles, France 6/30/2015, RISC-V Workshop

2 Motivation Dynamic voltage and frequency scaling (DVFS) maximizes energy efficiency Fixed voltage Performance Requirement DVFS voltage Off-chip conversion Slow mode transitions Limited voltage domains Costly -chip components Time On-chip conversion Fast transitions Many domains No -chip components 2

3 Project Goals Energyefficient Fine-grained DVFS High conversion efficiency low-cost Entirely on-chip converter Low area overhead processor Runs Linux Realistic digital load 3

4 Integrated Voltage Regulators Method Linear regulator (eg. Toprak-Deniz ISSCC2014) Buck converter (eg. Kurd ISSCC2014) Efficiency (0.5put) <50%?%-90% Off-chip components Area overhead Interleaved switchedcapacitor (eg. Kim ISSCC2015 1, Clerc ISSCC ) Simultaneous switched-capacitor with adaptive clock (Proposed) 40%-65% 1 65%-82% 2 85% 4

5 V high V ideal V low Proposed Solution Switch all converters simultaneously to avoid charge sharing Traditional Interleaving 1.8V 1.0V Charge sharing losses 2Ghz ref 1.0V To -chip oscilloscope To -chip oscilloscope Simultaneous Switching V high V ideal FSM V low toggle 1.8V 1.0V 1.8V 1.8V V 1.0V switched-capacitor 24 switched-capacitor 24 switched-capacitor 24 switched-capacitor DC-DC unit DC-DC cells unit DC-DC cells Shared unit DC-DC Func. cells Shared Units unit Func. cells Shared int Units int Fu (0.19 mm 2 64-bit Integer Multiplier ) (0.19 mm 2 64-bit Integer Multiplier ) (0.19 mm 2 64-bit Integer ) (0.19 mm 2 ) toggle + toggle V FSM ref out + C RISC-V scalar RISC-V corescalar RISC-V core Vec sc Scalar RF Scalar FPU RF Scalar Vecto FPU RF (16KB Vec int custom 8 C Single-Precision Single-Precision FMA Single-Precisi FMA Double-Precision Double-Precision FMA Double-Precis FMAVector V ref FSM toggle 16KB Scalar 16KB 32KB Scalar 16KB Shared 32K Sca DC-DC controller DC-DC controller DC-DC Inst. controller Cache DC-DC Inst. controller Cache Data Cache Inst. Cac Dat (Custom 8T (Custom (Custom 8T (Custom 8T (C8 SRAM Macros) SRAM Macros) SRAM SRAM Macros) Mac SRA 2Ghz 2Ghz 2Ghz clk Arbiter ref ref clk ref clk clk Async. FIFO/Level Async. shifters FIFO/Level Async. shifters FIFO/Le As 1.0V Adaptive clock 1.0V Adaptive clock 1.0V Adaptive clock Adaptive clock between domains between domains betwee generator generator generator generator int + To -chip oscillosco To -chi V ref FSM No charge sharing! Clock frequency adapts to track the voltage ripple Digital IO pads Digital to wire-bonded IO pads Digital to chip-on wire-bo IO p To/from -chip To/from FPGA -chip FSB To/from FPGA and -chip DRAM FSB To/from FPGA and -chip DRAM FSB FPG and int + V ref 5

6 Chip Architecture CORE (1.19 mm 2 ) 6

7 Floorplan ST 28nm FDSOI Die area: 2.37mm 2 Core area: 1.19mm 2 Converter area: 0.19mm 2 (16%) MOS+MOM (to M3) density: 11fF/µm 2 Core 7

8 Reconfigurable SC Converters Four output voltages, single unit cell Mode 1.8V 1/2 Mode 2/3 Mode 1/2 Mode (~) (~0.9V) (~0.67V) (~0.5V) Simple lower bound control V ref 1.8V 1.8V + FSM 2GHz comparator toggle Simultaneous switching V ref toggle V i V lo 8

9 Self-Adjusting Clock Generator Block Diagram DLL 2Ghz input, 16 phases Tunable Replica Path Controller Select closest edge Replica tracks critical path with voltage ripple Controller quantizes clock edge (Tunable Replica Path) Adaptive system clock 9

10 Test setup Host: Zedboard (ARM+FPGA, network accessible) Motherboard: Programmable supplies Daughterboard: Chip-on-board 10

11 Vout Measurements Fast converter mode transitions enable extremely fine-grain DVFS algorithms 11

12 GFLOPS/W Double-precision matrix-multiplication used as energy-efficiency metric GFLOPS/W inverse of energy/operation 28nm FDSOI ers high energy efficiency and 4 converter modes cover wide operating range 12

13 Conclusion Energy-efficient 28nm processor featuring: 1. Fine-grained and wide-range DVFS 20ns transitions to 0.45V 2. Entirely on-chip voltage conversion Simultaneous switched-capacitor 3. High efficiency 80% across wide voltage range 4. Extreme energy efficiency 26 GFLOPS/W with on-chip conversion 13

Acknowledgements Funding: BWRC members, ASPIRE members, DARPA PERFECT Award Number HR0011-12-2-0016, Intel ARO, AMD, GRC, Marie Curie

14 Acknowledgements Funding: BWRC members, ASPIRE members, DARPA PERFECT Award Number HR , Intel ARO, AMD, GRC, Marie Curie FP7, NSF GRFP, NVIDIA Fellowship Fabrication donation by STMicroelectronics. Tom Burd, James Dunn, Olivier Thomas, and Andrei Vladimirescu 14

Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking

Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking Yunsup Lee, Brian Zimmer, Andrew Waterman, Alberto Puggelli, Jaehwa Kwak,