RISC-V Rocket Chip SoC Generator in Chisel. Yunsup Lee UC Berkeley

Size: px
Start display at page:

Download "RISC-V Rocket Chip SoC Generator in Chisel. Yunsup Lee UC Berkeley"

Transcription

1 RISC-V Rocket Chip SoC Generator in Chisel Yunsup Lee UC Berkeley

2 What is the Rocket Chip SoC Generator?! Parameterized SoC generator written in Chisel! Generates Tiles - (Rocket) Core + Private Caches! Generates Uncore (Outer Memory System) - Coherence Agent - Shared Caches - DMA Engines - Memory Controllers! Glues all the pieces together 2

3 Rocket Chip SoC Generator Rocket Core FPU L1 Inst ROCC Accel. Tile L1 Data Rocket Core FPU L1 Inst L1 Network Coherence Manager ROCC Accel. Tile L1 Data HTIF! Generates n Tiles - (Rocket) Core - RoCC Accelerator - L1 I$ - L1 D$! Generates HTIF - Host DMA Engine! Generates Uncore - L1 Crossbar - Coherence Manager - Exports MemIO Interface TileLink / MemIO Converter 3

4 Why SoC Generators?! Helps tune the design under different performance, power, area constraints, and diverse technology nodes! Parameters include: - number of cores - instantiation of floating-point units, vector units - cache sizes, associativity, number of TLB entries, cache-coherence protocol - number of floating-point pipeline stages - width of off-chip I/O, and more 4

5 Why Chisel?! RTL generator written in Chisel - HDL embedded in Scala! Full power of Scala for writing generators - object-oriented programming - functional programming C++ code Chisel Program Scala/JVM FPGA Verilog ASIC Verilog C++ Compiler Software Simulator FPGA Tools FPGA Emulation ASIC Tools GDS Layout 5

6 Rocket Scalar Core PC IF ID EX MEM WB ITLB Int.RF DTLB PC Gen. I$ Inst. Int.EX D$ Commit Access Decode Access bypass paths omitted for simplicity Rocket Pipeline To RoCC to Hwacha Accelerator FP.RF FP.EX1 FP.EX2 FP.EX3-64-bit 5-stage single-issue in-order pipeline - Design minimizes impact of long clock-to-output delays of compiler-generated RAMs - 64-entry BTB, 256-entry BHT, 2-entry RAS - MMU supports page-based virtual memory - IEEE compliant FPU - Supports SP, DP FMA with hw support for subnormals 6

7 ARM Cortex-A5 vs. RISC-V Rocket Category ARM Cortex-A5 RISC-V Rocket ISA 32-bit ARM v7 64-bit RISC-V v2 Architecture Single-Issue In-Order Single-Issue In-Order 5-stage Performance 1.57 DMIPS/MHz 1.72 DMIPS/MHz Process TSMC 40GPLUS TSMC 40GPLUS Area w/o Caches 0.27 mm mm 2 Area with 16K Caches 0.53 mm mm 2 Area Efficiency 2.96 DMIPS/MHz/mm DMIPS/MHz/mm 2 Frequency >1GHz >1GHz Dynamic Power <0.08 mw/mhz mw/mhz - PPA reporting conditions - 85% utilization, use Dhrystone for benchmark, frequency/power at TT 0.9V 25C, all regular VT transistors - 10% higher in DMIPS/MHz, 49% more area-efficient 7

8 HTIF: Host-Target Interface! UC Berkeley specific block mainly used to emulate devices for simple test chips - Emulates system calls, console, block devices, frame buffer, network devices - No need for this block once the SoC has actual devices on the target machine! Consider it as a host DMA engine! A port for for host system to read/write - Core CSRs (control and status registers) - Target Memory 8

9 Important Interfaces in the Rocket Chip Rocket Core FPU L1 Inst client mngr client mngr ROCCIO TileLinkIO TileLinkIO ROCC Accel. Tile L1 Data Rocket Core FPU L1 Inst L1 Network Coherence Manager TileLink / MemIO Converter ROCC Accel. Tile L1 Data HTIF client client client client TileLinkIO TileLinkIO MemIO HTIFIO arb arb HostIO TileLink TileLink! ROCCIO - Interface between Rocket/Accelerator! HTIFIO - Read/Write CSRs! TileLinkIO - Coherence Fabric! MemIO - Simple AXI-like memory interface! HostIO - Host Interface to HTIF 9

10 Probe Grant Probe Grant TileLinkIO Client Client Cache Cache Acquire Release Finish Acquire Release Finish Manager - TileLinkIO consists of Acquire, Probe, Release, Grant, Finish 10

11 Probe Grant Grant UncachedTileLinkIO Client Client Cache Acquire Release Finish Acquire Finish Manager - UncachedTileLinkIO consists of Acquire, Grant, Finish - Convertors for TileLinkIO/UncachedTileLinkIO in uncore library 11

12 MemIO Master MemReqCmd MemReqCmd.valid MemReqCmd.ready Slave Decoupled(MemData) Decoupled(MemResp) - MemReqCmd consists of addr, rw (write=true), tag - MemData consists of 128 bit data payload - MemResp consists of 128 bit data payload, tag - Decoupled(interface) means an interface with ready/ valid signals 12

13 ROCCIO Rocket Decoupled(Cmd) Decoupled(Resp) CacheIO busy IRQ supervisor bit UncachedTileLinkIO PTWIO exception ROCC Accel.! Rocket sends coprocessor instruction via the Cmd interface! Accelerator responds through Resp interface! Accelerator sends memory requests to L1D$ via CacheIO! busy bit for fences! IRQ, S, exception bit used for virtualization! UncachedTileLinkIO for instruction cache on accelerator! PTWIO for page-table walker ports on accelerator 13

14 HTIFIO HTIF reset core_id Decoupled(CSRReq) Decoupled(CSRResp) Decoupled(IPIReq) Decoupled(IPIResp) Tile - reset signal and core_id routed from HTIF (historical reasons nothing technical) - CSR Read/Write requests go through CSRReq/CSRResp - IPI Requests go through IPIReq/IPIResp - HTIFIO likely to be modified in the near future 14

15 Rocket Chip C++ Emulator Setup Verilog Simulator pthread Rocket Chip HostIO RISC-V Frontend Server pthread MemIO DRAMSim2 15

16 ZYNQ FPGA Rocket Chip FPGA Setup Rocket Chip MemIO HostIO HostIO/AXI Convertor MemIO/AXIHP Convertor AXI HP AXI AXI Master AXI HP Slave ARM Processing System DDR3 DRAM RISC-V Frontend Server 16

17 Rocket Chip Berkeley Test Chip Setup Test Chip Rocket Chip MemIO MemIO Serializer HostIO MemSerialIO ZYNQ FPGA HostIO/AXI Convertor MemIO/AXIHP Convertor MemIO MemIO Deserializer AXI HP AXI AXI Master AXI HP Slave ARM Processing System RISC-V Frontend Server DDR3 DRAM 17

18 Rocket Chip SoC Setup Interrupts Rocket Chip TiileLinkIO Uncached TiileLinkIO Devices MemIO MemIO LPDDR3 Memory Controller LPDDR3 DRAM 18

19 Rocket Core FPU L1 Inst client mngr ROCCIO TileLinkIO Who should use the Rocket Chip Generator? People who would like to develop ROCC Accel. Tile L1 Data Rocket Core FPU L1 Inst L1 Network Coherence Manager ROCC Accel. Tile L1 Data HTIF client client client client TileLinkIO TileLinkIO HTIFIO arb HostIO TileLink! A RISC-V SoC - Look into Chisel parameters! New Accelerators - Drop in at ROCCIO level! Own RISC-V Core - Drop in at TileLinkIO level or MemIO level! Own Device - Drop in at TileLinkIO or UncachedTileLinkIO client arb mngr TileLinkIO TileLink / MemIO Converter TileLink MemIO 19

20 New Features: L2$ with Directory Bits Rocket Core FPU L1 Inst client L2Cache mngr client ROCC Accel. Tile L1 Data Rocket Core FPU L1 Inst L1 Network ROCC Accel. Tile L1 Data HTIF client client client client L2CacheCoherence L2Cache ManagerL2Cache L2Cache mngr mngr mngr mngr arb client client client client TileLink! Shared L2$ with multiple banks! Each L2$ will act as a coherence manager with directory bits (snoop filter)! These caches can be composed to build outer-level caches such as an L3$ mngr TileLink / MemIO Converter TileLink 20

21 New Features: ROCC interfaces with L2$ Rocket Core FPU L1 Inst ROCC Accel. Tile L1 Data Rocket Core FPU L1 Inst L1 Data ROCC Accel. Tile HTIF! ROCC talks directly to the L2$ to address more data client client client client client L2Cache mngr L1 Network arb L2CacheCoherence L2Cache ManagerL2Cache L2Cache mngr mngr mngr mngr TileLink client mngr client client client client TileLink / MemIO Converter TileLink 21

22 New Features on the Deck! Dual-issue Rocket Core! Hwacha Vector Unit (checkout hwacha.org)! Dump MemIO and use AXI 22

A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W RISC-V Processor with Vector Accelerators"

A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W RISC-V Processor with Vector Accelerators A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W ISC-V Processor with Vector Accelerators" Yunsup Lee 1, Andrew Waterman 1, imas Avizienis 1,! Henry Cook 1, Chen Sun 1,2,! Vladimir Stojanovic 1,2, Krste Asanovic

More information

Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking

Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking Yunsup Lee, Brian Zimmer, Andrew Waterman, Alberto Puggelli, Jaehwa Kwak,

More information

RISC- V Rocket Chip Tutorial. Colin Schmidt UC Berkeley

RISC- V Rocket Chip Tutorial. Colin Schmidt UC Berkeley RISC- V Rocket Chip Tutorial Colin Schmidt UC Berkeley colins@eecs.berkeley.edu 2 Outline What can Rocket Chip do? How do I change what Rocket Chip generates? - What are chisel parameters and how do they

More information

RISC V - Architecture and Interfaces

RISC V - Architecture and Interfaces RISC V - Architecture and Interfaces The RocketChip Moritz Nöltner-Augustin Institut für Technische Informatik Lehrstuhl für Rechnerarchitektur Universität Heidelberg February 6, 2017 Table of Contents

More information

Untethered lowrisc, Memory Mapped IO and TileLink/AXI. Wei Song 27/07/2015

Untethered lowrisc, Memory Mapped IO and TileLink/AXI. Wei Song 27/07/2015 Untethered lowrisc, Memory Mapped IO and /AXI Wei Song 27/07/2015 Time Line expected Nov. 2014 Apr. 2015 Now Oct. 2015 -Chip release from Berkeley First lowrisc release. Memeory Mapped IO. Untethered lowrisc

More information

Diplomatic Design Patterns

Diplomatic Design Patterns Diplomatic Design Patterns Henry Cook Wesley Terpstra Yunsup Lee A TileLink Case Study 10/14/2017 Agenda Rocket-Chip Ecosystem Diplomacy TileLink Design Patterns DRYing out Parameterization Generation

More information

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, Krste Asanović

More information

CS 152 Laboratory Exercise 5

CS 152 Laboratory Exercise 5 CS 152 Laboratory Exercise 5 Professor: John Wawrzynek TA: Martin Maas Department of Electrical Engineering & Computer Science University of California, Berkeley November 19, 2016 1 Introduction and goals

More information

Energy-Efficient RISC-V Processors in 28nm FDSOI

Energy-Efficient RISC-V Processors in 28nm FDSOI Energy-Efficient RISC-V Processors in 28nm FDSOI Borivoje Nikolić Department of Electrical Engineering and Computer Sciences University of California, Berkeley bora@eecs.berkeley.edu 26 September 2017

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

A Retrospective on Par Lab Architecture Research

A Retrospective on Par Lab Architecture Research A Retrospective on Par Lab Architecture Research Par Lab SIGARCH Krste Asanovic, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, Sarah Bird, Alex Bishara, Chris Celio, Henry Cook, Ben Keller, Yunsup

More information

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces Zvonimir Z. Bandic, Sr. Director Robert Golla, Sr. Fellow Dejan Vucinic,

More information

CS 152 Laboratory Exercise 5 (Version B)

CS 152 Laboratory Exercise 5 (Version B) CS 152 Laboratory Exercise 5 (Version B) Professor: Krste Asanovic TA: Yunsup Lee Department of Electrical Engineering & Computer Science University of California, Berkeley April 9, 2012 1 Introduction

More information

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips Yunsup Lee Co-Founder and CTO High Upfront Cost Has Killed Innovation Our industry needs a fundamental change Total SoC Development Cost Design

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Free Chips Project: a nonprofit for hosting opensource RISC-V implementations, tools, code. Yunsup Lee SiFive

Free Chips Project: a nonprofit for hosting opensource RISC-V implementations, tools, code. Yunsup Lee SiFive Free Chips Project: a nonprofit for hosting opensource RISC-V implementations, tools, code Yunsup Lee SiFive SiFive Open Source We Open-Sourced the Freedom E310 Chip! 3 We Open-Sourced the Freedom E310

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Hwacha V4: Decoupled Data Parallel Custom Extension. Colin Schmidt, Albert Ou, Krste Asanović UC Berkeley

Hwacha V4: Decoupled Data Parallel Custom Extension. Colin Schmidt, Albert Ou, Krste Asanović UC Berkeley Hwacha V4: Decoupled Data Parallel Custom Extension Colin Schmidt, Albert Ou, Krste Asanović UC Berkeley Introduction Data-parallel, custom ISA extension to RISC-V with a configurable ASIC-focused implementation

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, Krste Asanovic CARRV 2017 10/14/2017 Evaluation Methodologies For Computer

More information

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay

Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay Hardware and Architectural Support for Security and Privacy (HASP 18), June 2, 2018, Los Angeles, CA, USA Ming Ming Wong Jawad Haj-Yahya Anupam Chattopadhyay Computing and Engineering (SCSE) Nanyang Technological

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 이웅재부장 Application Engineering Group 2014 The MathWorks, Inc. 1 Agenda Introduction ZYNQ Design Process Model-Based Design Workflow Prototyping and Verification Processor

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

RISC-V Core IP Products

RISC-V Core IP Products RISC-V Core IP Products An Introduction to SiFive RISC-V Core IP Drew Barbier September 2017 drew@sifive.com SiFive RISC-V Core IP Products This presentation is targeted at embedded designers who want

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Copyright 2014 Xilinx

Copyright 2014 Xilinx IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

OPENSPARC T1 OVERVIEW

OPENSPARC T1 OVERVIEW Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share

More information

Next Generation Multi-Purpose Microprocessor

Next Generation Multi-Purpose Microprocessor Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009 www.aeroflex.com/gaisler OUTLINE NGMP key requirements Development schedule Architectural Overview LEON4FT features

More information

«Real Time Embedded systems» Multi Masters Systems

«Real Time Embedded systems» Multi Masters Systems «Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can

More information

Building Custom SoCs with RocketChip. Howard Mao

Building Custom SoCs with RocketChip. Howard Mao Building Custom SoCs with RocketChip Howard Mao What is RocketChip? A library of reusable SoC components A highly parameterizable SoC generator Memory protocol converters Arbiters and Crossbar generators

More information

Agile Hardware Design: Building Chips with Small Teams

Agile Hardware Design: Building Chips with Small Teams 2017 SiFive. All Rights Reserved. Agile Hardware Design: Building Chips with Small Teams Yunsup Lee ASPIRE Graduate 2016 Co-Founder and CTO 2 2017 SiFive. All Rights Reserved. World s First Single-Chip

More information

Managing On-Chip Memory Hierarchies

Managing On-Chip Memory Hierarchies Managing On-Chip Memory Hierarchies Sagar Karandikar, Albert Magyar, and Howard Mao Department of Electrical Engineering and Computer Sciences University of California, Berkeley {sagark, magyar, zhemao}@eecs.berkeley.edu

More information

Introducing the Latest SiFive RISC-V Core IP Series

Introducing the Latest SiFive RISC-V Core IP Series Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance

More information

CS 152 Laboratory Exercise 5

CS 152 Laboratory Exercise 5 CS 152 Laboratory Exercise 5 Professor: Krste Asanovic TA: Christopher Celio Department of Electrical Engineering & Computer Science University of California, Berkeley April 11, 2012 1 Introduction and

More information

Custom Silicon for all

Custom Silicon for all Custom Silicon for all Because Moore s Law only ends once Who is SiFive? Best-in-class team with technology depth and breadth Founders & Execs Key Leaders & Team Yunsup Lee CTO Krste Asanovic Chief Architect

More information

Jack Kang ( 剛至堅 ) VP Product June 2018

Jack Kang ( 剛至堅 ) VP Product June 2018 Jack Kang ( 剛至堅 ) VP Product June 2018 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance 64-bit Application Cores High Performance

More information

CS 152 Laboratory Exercise 5 (Version C)

CS 152 Laboratory Exercise 5 (Version C) CS 152 Laboratory Exercise 5 (Version C) Professor: Krste Asanovic TA: Howard Mao Department of Electrical Engineering & Computer Science University of California, Berkeley April 9, 2018 1 Introduction

More information

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier

More information

The Hwacha Microarchitecture Manual, Version 3.8.1

The Hwacha Microarchitecture Manual, Version 3.8.1 The Hwacha Microarchitecture Manual, Version 3.8.1 Yunsup Lee Albert Ou Colin Schmidt Sagar Karandikar Howard Mao Krste Asanović Electrical Engineering and Computer Sciences University of California at

More information

From Gates to Compilers: Putting it All Together

From Gates to Compilers: Putting it All Together From Gates to Compilers: Putting it All Together CS250 Laboratory 4 (Version 111814) Written by Colin Schmidt Adapted from Ben Keller Overview In this lab, you will continue to build upon the Sha3 accelerator,

More information

PACE: Power-Aware Computing Engines

PACE: Power-Aware Computing Engines PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious

More information

Your First Accelerator: Matrix Sum

Your First Accelerator: Matrix Sum Your First Accelerator: Matrix Sum CS250 Laboratory 3 (Version 100913) Written by Ben Keller Overview In this lab, you will write your first coprocessor, a matrix sum accelerator. Matrices are common data

More information

SoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public

SoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public SoC FPGAs Your User-Customizable System on Chip Embedded Developers Needs Low High Increase system performance Reduce system power Reduce board size Reduce system cost 2 Providing the Best of Both Worlds

More information

RISC-V, Rocket, and RoCC Spring 2017 James Mar2n

RISC-V, Rocket, and RoCC Spring 2017 James Mar2n RISC-V, Rocket, and RoCC Spring 2017 James Mar2n What s new in Lab 2: In lab 1, you built a SHA3 unit that operates in isola2on We would like Sha3Accel to act as an accelerator for a processor Lab 2 introduces

More information

RA3 - Cortex-A15 implementation

RA3 - Cortex-A15 implementation Formation Cortex-A15 implementation: This course covers Cortex-A15 high-end ARM CPU - Processeurs ARM: ARM Cores RA3 - Cortex-A15 implementation This course covers Cortex-A15 high-end ARM CPU OBJECTIVES

More information

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Wrap Krste Asanovic RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Team Graduate Students Zhangxi Tan Andrew Waterman Rimas Avizienis Yunsup Lee Henry Cook Sarah Bird Faculty Krste Asanovic

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of

More information

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT 1 Lecture 5: Computing Platforms Asbjørn Djupdal ARM Norway, IDI NTNU 2013 2 Lecture overview Bus based systems Timing diagrams Bus protocols Various busses Basic I/O devices RAM Custom logic FPGA Debug

More information

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013 Getting the Most out of Advanced ARM IP ARM Technology Symposia November 2013 Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block are now Sub-Systems Cortex

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

From Gates to Compilers: Putting it All Together

From Gates to Compilers: Putting it All Together From Gates to Compilers: Putting it All Together CS250 Laboratory 4 (Version 102413) Written by Ben Keller Overview In this lab, you will build upon the matrix sum accelerator you wrote in Lab 3. You will

More information

Yet Another Implementation of CoRAM Memory

Yet Another Implementation of CoRAM Memory Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS

More information

Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team

Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team 2015 The MathWorks, Inc. 1 Agenda Integrated Hardware / Software Top down Workflow for SoC

More information

Design AXI Master IP using Vivado HLS tool

Design AXI Master IP using Vivado HLS tool W H I T E P A P E R Venkatesh W VLSI Design Engineer and Srikanth Reddy Sr.VLSI Design Engineer Design AXI Master IP using Vivado HLS tool Abstract Vivado HLS (High-Level Synthesis) tool converts C, C++

More information

ASIC Design of Shared Vector Accelerators for Multicore Processors

ASIC Design of Shared Vector Accelerators for Multicore Processors 26 th International Symposium on Computer Architecture and High Performance Computing 2014 ASIC Design of Shared Vector Accelerators for Multicore Processors Spiridon F. Beldianu & Sotirios G. Ziavras

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Introduction to Embedded System Design using Zynq

Introduction to Embedded System Design using Zynq Introduction to Embedded System Design using Zynq Zynq Vivado 2015.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture. Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

RISECREEK: From RISC-V Spec to 22FFL Silicon

RISECREEK: From RISC-V Spec to 22FFL Silicon RISECREEK: From RISC-V Spec to 22FFL Silicon Vinod Ganesan, Gopinathan Muthuswamy Group CSE Dept RISE Lab IIT Madras Amudhan, Bharath, Alagu - HCL Tech Konala Varma, Ang Boon Chong, Harish Villuri - Intel

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

Superscalar Processor

Superscalar Processor Superscalar Processor Design Superscalar Architecture Virendra Singh Indian Institute of Science Bangalore virendra@computer.orgorg Lecture 20 SE-273: Processor Design Superscalar Pipelines IF ID RD ALU

More information

Getting to Work with OpenPiton. Princeton University. OpenPit

Getting to Work with OpenPiton. Princeton University.   OpenPit Getting to Work with OpenPiton Princeton University http://openpiton.org OpenPit Princeton Parallel Research Group Redesigning the Data Center of the Future Chip Architecture Operating Systems and Runtimes

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

Contents of this presentation: Some words about the ARM company

Contents of this presentation: Some words about the ARM company The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features

More information

Effective System Design with ARM System IP

Effective System Design with ARM System IP Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera

More information

Bluespec s RISC-V Factory

Bluespec s RISC-V Factory Bluespec s RISC-V Factory (components and development/verification environment) Rishiyur S. Nikhil with Charlie Hauck, Darius Rad, Julie Schwartz, Niraj Sharma, Joe Stoy 3 rd RISC-V Workshop January 6,

More information

Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC

Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC Integrated Workflow to Implement Embedded Software and FPGA Designs on the Xilinx Zynq Platform Puneet Kumar Senior Team Lead - SPC 2012 The MathWorks, Inc. 1 Agenda Integrated Hardware / Software Top

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc. Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor Sanjiv Kapil Gemini Architect Sun Microsystems, Inc. Design Goals Designed for compute-dense, transaction oriented systems (webservers,

More information

EECS150 - Digital Design Lecture 13 - Accelerators. Recap and Outline

EECS150 - Digital Design Lecture 13 - Accelerators. Recap and Outline EECS150 - Digital Design Lecture 13 - Accelerators Oct. 10, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof. John Wawrzynek)

More information

C152 Laboratory Exercise 4 (Version B)

C152 Laboratory Exercise 4 (Version B) C152 Laboratory Exercise 4 (Version B) Professor: Krste Asanovic TA: Yunsup Lee Department of Electrical Engineering & Computer Science University of California, Berkeley March 19, 2012 1 Introduction

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Final Review Shuai Wang Department of Computer Science and Technology Nanjing University Computer Architecture Computer architecture, like other architecture, is the art

More information

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware. Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline

More information

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface

More information

The ARM Cortex-A9 Processors

The ARM Cortex-A9 Processors The ARM Cortex-A9 Processors This whitepaper describes the details of the latest high performance processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore

More information

Keystone Architecture Inter-core Data Exchange

Keystone Architecture Inter-core Data Exchange Application Report Lit. Number November 2011 Keystone Architecture Inter-core Data Exchange Brighton Feng Vincent Han Communication Infrastructure ABSTRACT This application note introduces various methods

More information

Xtensa 7 Configurable Processor Core

Xtensa 7 Configurable Processor Core FEATURES 32-bit synthesizable RISC architecture with 5-stage pipeline, 16/24-bit instruction encoding with modeless switching Designer-configurable processor options (MMU/MPU, local memory types and sizes,

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

Design of Parallel and High-Performance Computing Fall 2017 Lecture: Cache Coherence & Memory Models

Design of Parallel and High-Performance Computing Fall 2017 Lecture: Cache Coherence & Memory Models Design of Parallel and High-Performance Computing Fall 2017 Lecture: Cache Coherence & Memory Models Motivational video: https://www.youtube.com/watch?v=zjybff6pqeq Instructor: Torsten Hoefler & Markus

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory

Announcement. ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining. Edward Suh Computer Systems Laboratory ECE475/ECE4420 Computer Architecture L4: Advanced Issues in Pipelining Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcement Lab1 is released Start early we only have limited computing

More information

RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi

RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi Power Matters. TM RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi chowdhary.musunuri@microsemi.com RIC217 1 Agenda A brief introduction

More information

Midterm Exam. Solutions

Midterm Exam. Solutions Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a complex design in software Software vs. Hardware Trade-offs Improve Performance Improve Energy Efficiency

More information

Power, Performance and Area Implementation Analysis.

Power, Performance and Area Implementation Analysis. ARM Cortex -R Series: Power, Performance and Area Implementation Analysis. Authors: Neil Werdmuller and Jatin Mistry, September 2014. Summary: Power, Performance and Area (PPA) implementation analysis

More information

Experiences Using the RISC- V Ecosystem to Design an Accelerator- Centric SoC in TSMC 16nm

Experiences Using the RISC- V Ecosystem to Design an Accelerator- Centric SoC in TSMC 16nm Experiences Using the RISC- V Ecosystem to Design an Accelerator- Centric SoC in TSMC 16nm Tutu Ajayi 2, Khalid Al- Hawaj 1, Aporva Amarnath 2, Steve Dai 1, Scott Davidson 4, Paul Gao 4, Gai Liu 1, Anuj

More information

Experiences Using the RISC-V Ecosystem to Design an Accelerator-Centric SoC in TSMC 16nm

Experiences Using the RISC-V Ecosystem to Design an Accelerator-Centric SoC in TSMC 16nm Experiences Using the RISC-V Ecosystem to Design an Accelerator-Centric SoC in TSMC 16nm Tutu Ajayi 2, Khalid Al-Hawaj 1, Aporva Amarnath 2, Steve Dai 1, Scott Davidson 4, Paul Gao 4, Gai Liu 1, Anuj Rao

More information

Niagara-2: A Highly Threaded Server-on-a-Chip. Greg Grohoski Distinguished Engineer Sun Microsystems

Niagara-2: A Highly Threaded Server-on-a-Chip. Greg Grohoski Distinguished Engineer Sun Microsystems Niagara-2: A Highly Threaded Server-on-a-Chip Greg Grohoski Distinguished Engineer Sun Microsystems August 22, 2006 Authors Jama Barreh Jeff Brooks Robert Golla Greg Grohoski Rick Hetherington Paul Jordan

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

NS115 System Emulation Based on Cadence Palladium XP

NS115 System Emulation Based on Cadence Palladium XP NS115 System Emulation Based on Cadence Palladium XP wangpeng 新岸线 NUFRONT Agenda Background and Challenges Porting ASIC to Palladium XP Software Environment Co Verification and Power Analysis Summary Background

More information

RISC-V Updates Krste Asanović krste@berkeley.edu http://www.riscv.org 3 rd RISC-V Workshop Oracle, Redwood Shores, CA January 5, 2016 Agenda UC Berkeley updates RISC-V transition out of Berkeley Outstanding

More information

All About the Cell Processor

All About the Cell Processor All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,

More information