Improving the Performance Per Area Factor of RISC-V Based Multi-Core Systems

Similar documents
Running Identical Threads in C-Slow Retiming based Designs for Functional Failure Detection

Designing RISC-V from the.pdf / A baseband processor extension to the ISA

The Nios II Family of Configurable Soft-core Processors

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs

SOCS BASED OPENRISC AND MICROBLAZE SOFT PROCESSORS COMPARISON STUDY CASES: AUDIO IMPLEMENTATION AND NETWORK IMPLEMENTATION BASED SOCS

Early Models in Silicon with SystemC synthesis

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

The lowrisc project Alex Bradbury

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

RISC-V Core IP Products

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

AES1. Ultra-Compact Advanced Encryption Standard Core AES1. General Description. Base Core Features. Symbol. Applications

CS150 Project Final Report

ReconOS: An RTOS Supporting Hardware and Software Threads

NS115 System Emulation Based on Cadence Palladium XP

ORCA FPGA- Optimized VectorBlox Computing Inc.

TSEA44 - Design for FPGAs

Contents Part I Basic Concepts The Nature of Hardware and Software Data Flow Modeling and Transformation

Midterm Exam. Solutions

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Creating Computers from (almost) scratch using FPGAs, VHDL and FORTH. Recreative explorations of the hardware/software co-design space

LEON4: Fourth Generation of the LEON Processor

footprint Ways to reduce RISC-V soft processor 万瑞罡 (Ruigang Wan) Chengdu University GitHub account: rgwan

RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

Efficient Hardware Context- Switch for Task Migration between Heterogeneous FPGAs

EE 459/500 HDL Based Digital Design with Programmable Logic

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Lecture 41: Introduction to Reconfigurable Computing

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Embedded Systems. 7. System Components

Jack Kang ( 剛至堅 ) VP Product June 2018

Atlys (Xilinx Spartan-6 LX45)

Hardware/Software Codesign of Schedulers for Real Time Systems

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management

An FPGA Host-Multithreaded Functional Model for SPARC v8

Field Programmable Gate Array (FPGA) Devices

Copyright 2016 Xilinx

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

SP3Q.3. What makes it a good idea to put CRC computation and error-correcting code computation into custom hardware?

Dr. Yassine Hariri CMC Microsystems

Hardware Software Codesign of Embedded Systems

Embedded Systems: Hardware Components (part I) Todor Stefanov

System Verification of Hardware Optimization Based on Edge Detection

ESL design with the Agility Compiler for SystemC

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

Field Programmable Gate Array (FPGA)

Parallel graph traversal for FPGA

DESIGN OF A SOFT-CORE PROCESSOR BASED ON OPENCORES WITH ENHANCED PROCESSING FOR EMBEDDED APPLICATIONS

A Streaming Multi-Threaded Model

FPGAs: FAST TRACK TO DSP

Industrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

Introduction to RISC-V

DoCD IP Core. DCD on Chip Debug System v. 6.02

An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin

CS 250 VLSI Design Lecture 11 Design Verification

Performance Imrovement of a Navigataion System Using Partial Reconfiguration

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

SoC Platforms and CPU Cores

INT G bit TCP Offload Engine SOC

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

Programmable Logic Devices HDL-Based Design Flows CMPE 415

ECE 448 Lecture 15. Overview of Embedded SoC Systems

FreeBSD/RISC-V project

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation

C O M P A N Y O V E R V I E W

Accelerator Spectrum

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010


FPGAs: High Assurance through Model Based Design

Design of AHB Arbiter with Effective Arbitration Logic for DMA Controller in AMBA Bus

RISECREEK: From RISC-V Spec to 22FFL Silicon

Hardware Implementation of TRaX Architecture

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

RISC-V as a basis for ASIP design A Quantum-Resistant IoT Security Implementation

TAIGA: A CONFIGURABLE RISC-V SOFT-PROCESSOR FRAMEWORK FOR HETEROGENEOUS COMPUTING SYSTEMS RESEARCH. Eric Matthews and Lesley Shannon

Implementation of Optimized ALU for Digital System Applications using Partial Reconfiguration

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo

FPGA RAM (C1) Young Won Lim 5/13/16

Introduction to reconfigurable systems

Fast architecture prototyping on FPGAs: frameworks, tools, and challenges

XPU A Programmable FPGA Accelerator for Diverse Workloads

Lecture #1: Introduction

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

The CompSOC Design Flow for Virtual Execution Platforms

Cover TBD. intel Quartus prime Design software

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs

Multicycle-Path Challenges in Multi-Synchronous Systems

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

RED: A Reconfigurable Datapath

Trying to design a simple yet efficient L1 cache. Jean-François Nguyen

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems

Reviving Bit-slice Technology in a Programmable Fashion

Transcription:

Improving the Performance Per Area Factor of RISC-V Based Multi-Core Systems Tobias Strauch R&D, EDAptix, Munich, Germany tobias@edaptix.com 4th RISC-V Workshop at MIT, Boston, USA July 13, 2016

Agenda C-Slow Retiming System Hyper Pipelining Basic Idea Performance Balancing Deep Pipelining Extended Pipelining Performance per Area Factor microrisc Project minirisc Project Miscellaneous

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming

C-Slow Retiming Known since the 60 s, Barrel processors Leiserson [1] Berkeley summer class in 2001 CSR on FPGAs (LUT level) by Weaver et al. [2] Millions of engineers

C-Slow Retiming (Personal Work) Diploma Thesis in 1998 on CSR of FSM LSI Logic (Milpitas, CA), RISC Processor, CSR on gate level 2001: Timing estimation on RTL, RTL code modification,... 2010: Automatically apply CSR on RTL 2010: AVR Core (VHDL) on opencores.org 2010: OpenRISC Core (Verilog) on opencores.org Papers on CSR on RTL, CSR in safety critical designs,...

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining

System Hyper Pipelining Based on CSR Replaces original registers with memories (D). Adds thread stalling and therefore thread bypassing features. Cycle accurate performance balancing Late read / early write [3]

SHP, Performance Balancing Predictive runtime of specific threads Sync. start/execution, Dynamic Length Instruction Words

SHP, Deep Pipelining, T0 in Beast Mode Detect instruction dependency, LSB RISC-V ISA 1) Instruction LSB-sniffing 2) Enhanced PM read 3) Consider stall/flush signal No back-to-back, turned on/off on the fly

SHP, Extended Pipelining, Frequency-Over-Scaling Less registers per path than necessary Multi-cycle path for example in datapath section Re-execution of thread with valid multi-cycle path

SHP, Performance per Area Factor CHStone

SHP, System Level Performance (frequency) over area curve System level performance improvements!!! microrisc (Vscale), microcontroller, virtual peripherals,... minirisc (lowrisc, Rocket), SoC, OS, FPU, accelerators,...

microrisc Vscale @80MHz async DDR3, SHP-ed @250MHz synchron Based on RV32IM subset (no FENCE and no SI, RV32B ) Individual threads handle interrupts (less stack activity). Main focus on virtual peripherals TC with event handler (complex timer)

microrisc UARTs PWMs... Standard toolchain compatible (C++,...) Demo: How to run timing critical software peripherals based on a highly dynamic SHP-ed core.

lowrisc / University of Cambridge Diagram: Wei Song, lowrisc / University of Cambridge Wei: personal view

minirisc (based on lowrisc) Minions part of Rocket core, heterogeneous multi-core SHP impact on performance per area and SoC architecture Arbiter/multiplexer: blocking Multilayer: complex SHP: time sliced usage at higher speed

Miscellaneous Simple programming model Fork-join operations, OpenMP Estimated ASIC numbers in paper [3] Altera Hyper Pipelining technology looks promising More on-fpga memory (memory wall) Power consumption (work ongoing) Source code of projects in PDVL (VHDL, Verilog)

References [1] C. Leiserson and J. Saxe, Retiming Synchronous Circuitry, Algorithmica, vol. 6, no. 1, pp. 5-35, 1991. [2] N. Weaver, Y. Markovskiy, Y. Patel and J. Wawrzynek, Post-Placement C-slow Retiming for the Xilinx Virtex FPGA, FPGA 2003, February 23-25, 2003, Monterey, CA, USA. [3] T. Strauch, The Effects of System Hyper Pipelining on Three Computational Benchmarks Using FPGAs, 11th International Symposium in Applied Reconfigurable Computing, ARC 2015, 13-17 April 2015, Bochum, Germany, pp. 280-290.

You made it!!! Thank you @arduissimo www.cloudx.cc tobias@cloudx.cc Call for cooperation: SHP-ed CPU in an ASIC technology