Neural Networks as Function Primitives

Similar documents
Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators

Towards General-Purpose Neural Network Computing

Free Chips Project: a nonprofit for hosting opensource RISC-V implementations, tools, code. Yunsup Lee SiFive

BOSTON UNIVERSITY COLLEGE OF ENGINEERING. Dissertation NEURAL NETWORK COMPUTING USING ON-CHIP ACCELERATORS SCHUYLER ELDRIDGE

Strober: Fast and Accurate Sample-Based Energy Simulation Framework for Arbitrary RTL

Phone: (510) Homepage:

Simulating Multi-Core RISC-V Systems in gem5

Chisel: Constructing Hardware In a Scala Embedded Language

CS152 Laboratory Exercise 2 (Version 1.0.2)

Raven: A CHISEL designed 28nm RISC-V Vector Processor with Integrated Switched- Capacitor DC-DC Converters & Adaptive Clocking

Workshop on Open Source Hardware Development Tools and RISC-V

Space-Time Adventures on Novena Introducing: Balboa

A Retrospective on Par Lab Architecture Research

Raven3: 28nm RISC-V Vector Processor with On-Chip DC/DC Convertors

Industrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

CS 152 Laboratory Exercise 5 (Version C)

RISC V - Architecture and Interfaces

Dr. Yassine Hariri CMC Microsystems

A 45nm 1.3GHz 16.7 Double-Precision GFLOPS/W RISC-V Processor with Vector Accelerators"

Grid Computing. Grid Computing 2

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

The lowrisc project Alex Bradbury

Bridging Analog Neuromorphic and Digital von Neumann Computing

Major Information Session ECE: Computer Engineering

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018

From Gates to Compilers: Putting it All Together

Chisel to Chisel 3.0.0

Q: Are power supply attacks in scope for SSITH? A: The hacker team will not have physical access to the power supply.

Git Introduction CS 400. February 11, 2018

OpenPrefetch. (in-progress)

Tessellation: Space-Time Partitioning in a Manycore Client OS

A Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović

Q: Are power supply attacks in scope for SSITH? A: The hacker team will not have physical access to the power supply.

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation

Wu Zhiwen.

RISC-V. Palmer Dabbelt, SiFive COPYRIGHT 2018 SIFIVE. ALL RIGHTS RESERVED.

Big Data Meets High-Performance Reconfigurable Computing

A New Era of Silicon Prototyping in Computer Architecture Research

A Streaming Multi-Threaded Model

Implementation of Direct Segments on a RISC-V Processor

Capriccio : Scalable Threads for Internet Services

Real-Time Dynamic Energy Management on MPSoCs

CS 152 Laboratory Exercise 5 (Version B)

HeMPS Platform v7.3. Marcelo Ruaro, Eduardo Wachter, Guilherme Madalozzo, Guilherme Castilhos, André del Mestre Fernando G. Moraes

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters

Hardware Support for Real-Time Embedded Multiprocessor System-on-a-Chip Memory Management

Enable AI on Mobile Devices

Catbook Workshop 1: Client Side JS. Danny Tang

Embedded Systems. Embedded Programmer. Duration: 2 weeks Rs Language and Tools. Embedded System Introduction. Embedded C programming

Higher Level Programming Abstractions for FPGAs using OpenCL

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

GreenDroid: An Architecture for the Dark Silicon Age

Re-architecting Virtualization in Heterogeneous Multicore Systems

FULL STACK FLEX PROGRAM

RISC-V Rocket Chip SoC Generator in Chisel. Yunsup Lee UC Berkeley

INET

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

«Real Time Embedded systems» Multi Masters Systems

Cyber Security School

EbbRT: A Framework for Building Per-Application Library Operating Systems

Produced by. Mobile Application Development. Eamonn de Leastar

Deep Learning Accelerators

Outline. Debug Working Group Intro Why does RISC-V need an External Debug Spec? Overview of the Debug Spec v. 013 Future tasks of Debug Working Group

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Experiences Using the RISC-V Ecosystem to Design an Accelerator-Centric SoC in TSMC 16nm

Vinnie Saini Cloud Solution Architect Big Data & AI

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

Tomasz Włostowski Beams Department Controls Group Hardware and Timing Section. Developing hard real-time systems using FPGAs and soft CPU cores

CNS Update. José F. Martínez. M 3 Architecture Research Group

Energy-Efficient RISC-V Processors in 28nm FDSOI

XPU A Programmable FPGA Accelerator for Diverse Workloads

Building Custom SoCs with RocketChip. Howard Mao

VERIFICATION OF RISC-V PROCESSOR USING UVM TESTBENCH

Master Course in Computer Science Orientation day

A Time-Composable Operating System for the Patmos Processor

CS 152 Computer Architecture and Engineering

Development of Complex KNX Devices

Dynamic Memory Management for Real-Time Multiprocessor System-on-a-Chip

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

While waiting for the lecture to begin, please complete. the initial course questionnaire.

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

SDA: Software-Defined Accelerator for general-purpose big data analysis system

Architecture Design Space Exploration Using RISC-V

Bringing up cycle-accurate models of RISC-V cores.


An 80-core GRVI Phalanx Overlay on PYNQ-Z1:

Grassroots ASPLOS. can we still rethink the hardware/software interface in processors? Raphael kena Poss University of Amsterdam, the Netherlands

Technical Committee Update

Dynamic Dataflow. Seminar on embedded systems

ECEN 449 Microprocessor System Design. Hardware-Software Communication. Texas A&M University

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017

Advanced Parameterization Manual

Welcome to BCB/EEOB546X! Computational Skills for Biological Data. Instructors: Matt Hufford Tracy Heath Dennis Lavrov

CSX600 Runtime Software. User Guide

GPUfs: Integrating a file system with GPUs

DESIGN OF A SOFT-CORE PROCESSOR BASED ON OPENCORES WITH ENHANCED PROCESSING FOR EMBEDDED APPLICATIONS

Transcription:

Neural Networks as Function Primitives Software/Hardware Support with X-FILES/DANA Schuyler Eldridge 1 Tommy Unger 2 Marcia Sahaya Louis 1 Amos Waterland 3 Margo Seltzer 3 Jonathan Appavoo 2 Ajay Joshi 1 1 Boston University Department of Electrical and Computer Engineering 2 Boston University Department of Computer Science 3 Harvard University School of Engineering and Applied Sciences Boston Area Architecture Workshop 16 BARC 16 1/8

Neural Networks as Function Primitives Motivation Neural networks and machine learning are everywhere (again) Broad use in high tech and big data, e.g., Google s Tensorflow [1] Enable automatic parallelization, e.g., ASC [3] Provide a means for approximate computing, e.g., NPU [2] Our vision Neural networks are a new functional primitive useful at various scales of computation [4] With that in mind, we ve developed software and hardware for the use of accelerator-backed neural network computation [1] Google TensorFlow, https://github.com/tensorflow/tensorflow [2] H. Esmaeilzadeh, A. Sampson et al., Neural acceleration for general-purpose approximate programs, in Proc. MICRO, 2012. [3] A. Waterland, E. Angelino, R. P. Adams, J. Appavoo, and M. Seltzer, Asc: Automatically scalable computation, in Proc. ASPLOS, 2014. [4] S. Eldridge, A. Waterland et al., Towards general-purpose neural network computing, in Proc. PACT, 2015. BARC 16 2/8

Our Contributions Towards this Vision X-FILES: Software/Hardware Extensions Extensions for the Integration of Machine Learning in Everyday Systems A defined user and supervisor interface for neural networks This includes supervisor architectural state (hardware) DANA: An Example Multi-Transaction Accelerator Dynamically Allocated Neural Network Accelerator An accelerator aligning with our multi transaction vision Neural Network Transactions A transaction encapsulates a request by a process to compute the output of a specific neural network for a provided input BARC 16 3/8

Or: A Drop in Accelerator for a RISC-V Microprocessor What does that mean? 1 Grab a Rocket Chip RISC-V Microprocessor [1] 2 Build a RISC-V toolchain 3 Grab a copy of our X-FILES/DANA accelerator [2] Implemented in Chisel [3] 4 Build an FPGA configuration for Rocket + X-FILES/DANA 5 User processes can safely throw transactions at X-FILES hardware With support for feedforward and learning computation [1] Rocket Chip git repository, UC Berkeley, Online: github.com/ucb-bar/rocket-chip [2] X-FILES/DANA git repository, Boston University, Online (soon!): github.com/bu-icsg/xfiles-dana [3] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis et al., Chisel: Constructing hardware in a scala embedded language, in Proc. DAC, 2012, pp. 1216 1225. BARC 16 4/8

X-FILES Software Components Supervisor API Establishes sets of processes that can access neural network hardware Defines the neural networks that processes are allowed to access More details on the poster! User API Works at the level of transactions A complete request for access to neural network resources, communication of inputs, processing, and communication of outputs Initiating a new transaction Writing data Reading data BARC 16 5/8

X-FILES/DANA Hardware Components Rocket Core L1$ ASID ASID-NNID Table Walker Local Storage Num ASIDs ASID-NNID Table Pointer Transaction Table ASID TID NNID State X-Files Hardware Arbiter PE RR Arbiter PE Table State Configuration Cache ASID NNID State Control Cache Memory DANA Accelerator Figure: X-FILES/DANA hardware architecture X-FILES Hardware Arbiter maintaining transaction state DANA to move transactions towards completion With support for feedforward or learning computation BARC 16 6/8

Open Source Plans Remaining Items Linux kernel integration Support for asynchronous data transfer Open Source Availability Should be ready by the end of February On GitHub: github.com/bu-icsg/xfiles-dana BARC 16 7/8

Acknowledgments This work was supported by the following: A NASA Space Technology Research Fellowship An NSF Graduate Research Fellowship NSF CAREER awards A Google Faculty Research Award BARC 16 8/8