High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks. Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge

Size: px
Start display at page:

Download "High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks. Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge"

Transcription

1 High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge International Workshop on Highly Efficient Neural Processing October 4th 2018

2 CNN Models : Accuracy, Operations and Parameters A. Canziani, E. Culurciello, A. Paszke, An Analysis of Deep Neural Network Models for Practical Applications, 2017 Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

3 CNN Models : Accuracy, Operations and Parameters Challenges in Embedded Neural Networks Limit number of parameters (weight values) Limit number of bits of weights and activations Integrate many memory cuts with processing elements Integrate computation into the memory itself Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

4 CNN Models : Accuracy, Operations and Parameters Challenges in Embedded Neural Networks Limit number of parameters (weight values) Limit number of bits of weights and activations Integrate many memory cuts with processing elements Integrate computation into the memory itself Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

5 CNN Models : Accuracy, Operations and Parameters Challenges in Embedded Neural Networks In other words,... Limit number of parameters (weight values) Limit number of bits of weights and activations Integrate many memory cuts with processing elements Integrate computation into the memory itself Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

6 CNN Models : Accuracy, Operations and Parameters Challenges in Embedded Neural Networks Limit number of parameters (weight values) Limit number of bits of weights and activations Integrate many memory cuts with processing elements Integrate computation into the memory itself In other words,... K. Usher, The Dwarf in the Dirt, Bones, 2009 Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

7 CNN Models : Accuracy, Operations and Parameters Challenges in Embedded Neural Networks Limit number of parameters (weight values) Limit number of bits of weights and activations Integrate many memory cuts with processing elements Integrate computation into the memory itself In other words,... K. Usher, The Dwarf in the Dirt, Bones, 2009 Let s Use Ternary { 1, 0, 1} weights and activations on FPGA FPGA : Great digital PIM Hardwiring ANN too risky New and better ANN every other day Ternarization Classification Error Rates on NN-64 (%) CIFAR-10 SVHN GTRSB Float Ternary H. Alemdar et al, Ternary neural networks for resource-efficient AI applications, IJCNN 17 Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

8 Why Ternary Convolutional Neural Networks? Objectives Energy efficient inference for AI tasks Without sacrifiying too much accuracy (Valid at a point in time : learning methods improve continuously) Solution Ternarize { 1, 0, 1} weights and activations a Sweet spot between resource usage and accuracy a. Perhaps the prettiest number system of all is the balanced ternary notation. Donald Knuth, The Art of Computer Programming, Volume 2 : Seminumerical algorithms. Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

9 Training Ternary Neural Networks : Teacher-Student Approach NN Teacher Student parameters { 1, 0, 1} neuron input activation function any any with ( 1, 1) 2-threshold step stochastic firing neuron output { 1, 0, 1} { 1, 0, 1} Teacher ρ = tanh(y i ) 1 with prob. ρ if ρ < 0 n i = 1 with prob. ρ if ρ > 0 0 otherwise Student 1 if y i < b lo n i = 1 if y i > b hi 0 otherwise Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

10 Ternary Neural Networks : Teacher-Student Individual Training Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

11 Experiments with Ternary Networks Multiple networks VGG-like networks two geometries : NN-64 and NN-128 multiple acceleration factors inside network (ranging 1 to 256) tradeoff area/throughput/energy Automation (kind of...) handmade generic hardware building blocks (vhdl) automatically generated networks customizable home-made tools (old school C and tcl) Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

12 Overview 12 theoretical layers 29 physical layers (+30 glue fifos) NN64 : 1930 neurons, 3.5 mega parameters NN128 : 3850 neurons, 14 mega parameters Goal of ternary : have parameters fit in FPGA distributed memories Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

13 Overview Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

14 FPGA Design for Ternary Convolutional Neural Networks Neurones and Parallelism Max NN-64 acceleration factor that fits on a VC709 : 128 Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

15 Acceleration factors Base implementation : at most 1 activation transfered between layers in 1 cycle But some layers take more time than others to compute stalls Use parallelism at layer level : Transfering several activations in 1 cycle in/out of bootleneck layers NN Acc. Parallelism per layer (in/out) size factor NL1 NL2 MPL1 NL3 NL4 MPL2 NL5 NL6 MPL3 NL7 NL8 NL / / / / 1-2 / 1 4 / / / 2 16 / 2 2 / 1 4 / 1 8 / 1-2 / 1 4 / / 4 32 / 4 4 / 1 8 / 2 16 / 2 2 / 1 4 / 1 8 / / 8 64 / 8 8 / 2 16 / 4 32 / 4 4 / 1 8 / 2 16 / 2 2 / / / / 4 32 / 8 64 / 8 8 / 2 16 / 4 32 / 4 4 / / / / 8 64 / / / 4 32 / 8 64 / 8 8 / 2 2/1 - - Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

16 Squeezing High-Efficiency TCNN in FPGA : Adder Trees Ternary Adders Sum of trits Sum of bits With (x, y) { 1, 0, 1} 2, x + y { 2, 1, 0, 1, 2} LUT savings with optimized ternary adder tree Number of inputs Generic 2-bit radix-2 adder tree (LUT) Optimized ternary adder (LUT) Savings 33.3% 57.1% 52.3% 51.1% 50.3% 51.6% 51.8% 51.0% Overall LUT savings when using optimized ternary adder trees Acc. factor Savings for NN % 1.38% 4.61% 10.9% 17.3% 24.1% 32.0% Savings for NN % 1.71% 5.63% 12.9% 19.6% 25.7% Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

17 Squeezing High-Efficiency TCNN in FPGA : Weight Compression Trits encoding Naïve encoding : 2 bits to encode 1 trit suboptimal Ex. : 3 trits encoded on 6 bits while 3 3 = 27 combinations encodable on 5 bits Optimal number of bits per trits : b = log 2 ( 3 T ) = T log 2 (3) Minimal number of bits per trits : b/t log 2 (3) bits Maximum saving % (Shannon limit) Interesting cases 3 trits / 5 bits 16 % saving 5 trits / 8 bits 20 % saving Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

18 Squeezing High-Efficiency TCNN in FPGA : Weight Compression Compression of weights Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

19 Squeezing High-Efficiency TCNN in FPGA : Weight Compression Trading-off BRAM vs logic : Ressources Breakdown and Power Analysis NN-64 with compression 3t5b NN-64 with compression 5t8b y-axis : % of change wrt same degree of parallelism without compression Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

20 Measured Results for NN-64 on Xilinx MHz (VC709) Acc. Resource usage factor LUT (logic) LUTRAM BRAM 18k FF (39.4%) (21.47%) 1410 (48.0%) (37.1%) 256* (69.9%) (60.9%) 2920 (96.7%) (74.0%) NN-64 with parallelism degree 128 Uses half of the FPGA resources, reaches max throughput of 60.2k fps (32 32) (LUT+B)RAM throughput of 18.7 Tb/s (290 Gb/s for FC layers only a ) End to end latency including PCIe + RIFFA : 135 µs Max performance : 18.7 T(T)OP/s (9.33 T(T)MAC/s) max performance : 11.5 W (Idle FPGA 2 W) Peak efficiency of 5226 fps per Watt 1.62 T(T)OP/s/W or 810 G(T)MAC/s/W a. VC709 DRAM throughput 204 Gb/s Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

21 Take away FPGAs are a very good fit for ANN if weights fit in internal memory Extreme quantization needed Huge weight access throughput possible FPGAs are reconfigurable! Who would be mad enough to hardwire a given ANN architecture anyway? ASICs follow a very different (but equally useful) architectural path Creative low level optimizations help squeeze-in high-efficiency networks (NN-64 ) High-throughput : up to 60.2k fps Low latency : k fps High power efficiency : 1.62 T(T)OP/s/W or k fps Pétrot, Prost-Boucle, Bourge HENP 18 October 4 th / 16

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (XIR & NTNU), Nick Fraser (XIR & USydney), Giulio Gambardella (XIR), Michaela Blott (XIR), Philip Leong (USydney),

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (NTNU & Xilinx Research Labs Ireland) in collaboration with N Fraser, G Gambardella, M Blott, P Leong, M Jahre and

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

arxiv: v2 [cs.ar] 15 May 2018

arxiv: v2 [cs.ar] 15 May 2018 [DL] A Survey of FPGA Based Neural Network Accelerator arxiv:1712.08934v2 [cs.ar] 15 May 2018 KAIYUAN GUO, SHULIN ZENG, JINCHENG YU, YU WANG AND HUAZHONG YANG, Tsinghua University, China Recent researches

More information

Profiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang

Profiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang Profiling the Performance of Binarized Neural Networks Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang 1 Outline Project Significance Prior Work Research Objectives Hypotheses Testing Framework

More information

direct hardware mapping of cnns on fpga-based smart cameras

direct hardware mapping of cnns on fpga-based smart cameras direct hardware mapping of cnns on fpga-based smart cameras Workshop on Architecture of Smart Cameras Kamel ABDELOUAHAB, Francois BERRY, Maxime PELCAT, Jocelyn SEROT, Jean-Charles QUINTON Cordoba, June

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign

More information

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning, Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

DeepLearning on FPGAs

DeepLearning on FPGAs DeepLearning on FPGAs Introduction to FPGAs Sebastian Buschjäger Technische Universität Dortmund - Fakultät Informatik - Lehrstuhl 8 October 24, 2017 1 Recap: Convolution Observation 1 Even smaller images

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy Optimize Deep Convolutional Neural Network with Ternarized Weights and High Zhezhi He Department of ECE University of Central Florida Elliot.he@knights.ucf.edu Boqing Gong Tencent AI Lab Bellevue, WA 98004

More information

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy Optimize Deep Convolutional Neural Network with Ternarized Weights and High Zhezhi He, Boqing Gong, and Deliang Fan Department of Electrical and Computer Engineering, University of Central Florida, Orlando,

More information

arxiv: v1 [cs.cv] 11 Feb 2018

arxiv: v1 [cs.cv] 11 Feb 2018 arxiv:8.8v [cs.cv] Feb 8 - Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms ABSTRACT Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir,

More information

arxiv: v1 [cs.lg] 17 Jan 2019

arxiv: v1 [cs.lg] 17 Jan 2019 CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar Department of Electrical and Computer Engineering, University of California

More information

A Lightweight YOLOv2:

A Lightweight YOLOv2: FPGA2018 @Monterey A Lightweight YOLOv2: A Binarized CNN with a Parallel Support Vector Regression for an FPGA Hiroki Nakahara, Haruyoshi Yonekawa, Tomoya Fujii, Shimpei Sato Tokyo Institute of Technology,

More information

Neural Computer Architectures

Neural Computer Architectures Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations

More information

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and

More information

An FPGA Based Adaptive Viterbi Decoder

An FPGA Based Adaptive Viterbi Decoder An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

Lecture 12: Model Serving. CSE599W: Spring 2018

Lecture 12: Model Serving. CSE599W: Spring 2018 Lecture 12: Model Serving CSE599W: Spring 2018 Deep Learning Applications That drink will get you to 2800 calories for today I last saw your keys in the store room Remind Tom of the party You re on page

More information

arxiv: v3 [cs.lg] 27 Mar 2018

arxiv: v3 [cs.lg] 27 Mar 2018 ReBNet: Residual Binarized Neural Network Mohammad Ghasemzadeh, Mohammad amragh, and Farinaz Koushanfar Department of Electrical and Computer Engineering, University of California an Diego {mghasemzadeh,

More information

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications Jongsoo Park Facebook AI System SW/HW Co-design Team Sep-21 2018 Team Introduction

More information

Implementation of 14 bits floating point numbers of calculating units for neural network hardware development

Implementation of 14 bits floating point numbers of calculating units for neural network hardware development International Conference on Recent Trends in Physics 206 (ICRTP206) Journal of Physics: Conference Series 755 (206) 000 doi:0.088/742-6596/755//000 Implementation of 4 bits floating numbers of calculating

More information

Utilizing SDSoC to Port Convolutional Neural Network to a Space-grade FPGA

Utilizing SDSoC to Port Convolutional Neural Network to a Space-grade FPGA Utilizing SDSoC to Port Convolutional Neural Network to a Space-grade FPGA Josh Anderson joshua.anderson@swri.org Southwest Research Institute 1 Objective Compress MASPEX instrument data Produces ~80MB

More information

Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.

Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al. Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.) Andreas Kurth 2017-12-05 1 In short: The situation Image credit:

More information

Semantic Image Search. Alex Egg

Semantic Image Search. Alex Egg Semantic Image Search Alex Egg Inspiration Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing

More information

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua

More information

SpaceWire Technologies deliver multi-gigabit data rates for on-board Spacecraft. SpaceTech Expo Gregor Cranston Business Development Manager

SpaceWire Technologies deliver multi-gigabit data rates for on-board Spacecraft. SpaceTech Expo Gregor Cranston Business Development Manager SpaceWire Technologies deliver multi-gigabit data rates for on-board Spacecraft SpaceTech Expo 2013 Gregor Cranston Business Development Manager 1 Introducing SpaceFibre A very high-speed serial data-link

More information

RFNoC Neural-Network Library using Vivado HLS (rfnoc-hls-neuralnet) EJ Kreinar Team E to the J Omega

RFNoC Neural-Network Library using Vivado HLS (rfnoc-hls-neuralnet) EJ Kreinar Team E to the J Omega RFNoC Neural-Network Library using Vivado HLS (rfnoc-hls-neuralnet) EJ Kreinar Team E to the J Omega Overview An RFNoC out-of-tree module that can be used to simulate, synthesize, and run a neural network

More information

Fast Flexible FPGA-Tuned Networks-on-Chip

Fast Flexible FPGA-Tuned Networks-on-Chip This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Xilinx ML Suite Overview

Xilinx ML Suite Overview Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

A New Era of Hardware Microservices in the Cloud. Doug Burger Distinguished Engineer, Microsoft UW Cloud Workshop March 31, 2017

A New Era of Hardware Microservices in the Cloud. Doug Burger Distinguished Engineer, Microsoft UW Cloud Workshop March 31, 2017 A New Era of Hardware Microservices in the Cloud Doug Burger Distinguished Engineer, Microsoft UW Cloud Workshop March 31, 2017 Moore s Law Dennard Scaling has been dead for a decade Moore s La is o er

More information

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer

More information

Tracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers

Tracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers Tracking Acceleration with FPGAs Future Tracking, CMS Week 4/12/17 Sioni Summers Contents Introduction FPGAs & 'DataFlow Engines' for computing Device architecture Maxeler HLT Tracking Acceleration 2 Introduction

More information

Exploring Automatically Generated Platforms in High Performance FPGAs

Exploring Automatically Generated Platforms in High Performance FPGAs Exploring Automatically Generated Platforms in High Performance FPGAs Panagiotis Skrimponis b, Georgios Zindros a, Ioannis Parnassos a, Muhsen Owaida b, Nikolaos Bellas a, and Paolo Ienne b a Electrical

More information

Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis

Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis Resolve: Generation of High Performance Sorting Architectures from High Level Synthesis Janarbek Matai, Dustin Richmond, Dajung Lee, Zac Blair, Qiongzhi Wu, Amin Abazari, Ryan Kastner Department of Computer

More information

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

How to Estimate the Energy Consumption of Deep Neural Networks

How to Estimate the Energy Consumption of Deep Neural Networks How to Estimate the Energy Consumption of Deep Neural Networks Tien-Ju Yang, Yu-Hsin Chen, Joel Emer, Vivienne Sze MIT 1 Problem of DNNs Recognition Smart Drone AI Computation DNN 15k 300k OP/Px DPM 0.1k

More information

efpga for Neural Network based Image Recognition

efpga for Neural Network based Image Recognition efpga for Neural Network based Image Recognition June 26, 2018 Yoan Dupret Managing Director Menta yoan.dupret@menta-efpga.com Copyright @ 2018 Menta S.A.S. Menta Overview 17 employees 11 years of R&D

More information

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray GRVI Phalanx A Massively Parallel RISC-V FPGA Accelerator Accelerator Jan Gray jan@fpga.org Introduction FPGA accelerators are hot MSR Catapult. Intel += Altera. OpenPOWER + Xilinx FPGAs as computers Massively

More information

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS. Exploring Convolutional Neural Networks on the

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands   MSc THESIS. Exploring Convolutional Neural Networks on the Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2018 MSc THESIS Exploring Convolutional Neural Networks on the ρ-vex architecture Jonathan Tetteroo Abstract As machine

More information

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA J.Jayalakshmi 1, S.Ali Asgar 2, V.Thrimurthulu 3 1 M.tech Student, Department of ECE, Chadalawada Ramanamma Engineering College, Tirupati Email

More information

Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization

Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization Mohd Amiruddin Zainol Department of Electrical and Electronic Engineering, University

More information

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 High Performance Scalable Deep Learning Accelerator

More information

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

DEEP NEURAL NETWORKS FOR OBJECT DETECTION DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional

More information

ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA

ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA Song Han 1,2, Junlong Kang 2, Huizi Mao 1, Yiming Hu 3, Xin Li 2, Yubin Li 2, Dongliang Xie 2, Hong Luo 2, Song Yao 2, Yu Wang 2,3, Huazhong

More information

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

OpenCL on FPGAs - Creating custom accelerated solutions

OpenCL on FPGAs - Creating custom accelerated solutions OpenCL on FPGAs - Creating custom accelerated solutions Manuel Greisinger Channel Manager, Central & Eastern Europe Oct 13 th, 2015 ESSEI Technology Day, Gilching, Germany Industry Trends Increasing product

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

On the Capability and Achievable Performance of FPGAs for HPC Applications "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies

More information

An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA

An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA N. Sireesha 1 & P.Malleswari 2 1PG Scholar, Dept of ECE, Narsaraopeta Institute of Technology, Yellamanda, Narsaraopeta, Guntur district, Andhra

More information

User Manual for FC100

User Manual for FC100 Sundance Multiprocessor Technology Limited User Manual Form : QCF42 Date : 6 July 2006 Unit / Module Description: IEEE-754 Floating-point FPGA IP Core Unit / Module Number: FC100 Document Issue Number:

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Reconfigurable Acceleration of 3D-CNNs for Human Action Recognition with Block Floating-Point Representation

Reconfigurable Acceleration of 3D-CNNs for Human Action Recognition with Block Floating-Point Representation Reconfigurable Acceleration of 3D-CNNs for Human Action Recognition with Block Floating-Point Representation Hongxiang Fan, Ho-Cheung Ng, Shuanglong Liu, Zhiqiang Que, Xinyu Niu, Wayne Luk Dept. of Computing,

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just

More information

ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator

ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator ICS 28 ComPEND: Computation Pruning through Early Negative Detection for ReLU in a Deep Neural Network Accelerator June 3, 28 Dongwoo Lee, Sungbum Kang, Kiyoung Choi Neural Processing Research Center (NPRC)

More information

The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1].

The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1]. Lekha IP 3GPP LTE FEC Encoder IP Core V1.0 The Lekha 3GPP LTE FEC IP Core meets 3GPP LTE specification 3GPP TS 36.212 V 10.5.0 Release 10[1]. 1.0 Introduction The Lekha IP 3GPP LTE FEC Encoder IP Core

More information

In Live Computer Vision

In Live Computer Vision EVA 2 : Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

More information

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna, Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

Introduction to Field Programmable Gate Arrays

Introduction to Field Programmable Gate Arrays Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.

More information

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural

More information

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

Deep Learning Processing Technologies for Embedded Systems. October 2018

Deep Learning Processing Technologies for Embedded Systems. October 2018 Deep Learning Processing Technologies for Embedded Systems October 2018 1 Neural Networks Architecture Single Neuron DNN Multi Task NN Multi-Task Vehicle Detection With Region-of-Interest Voting Popular

More information

Simple and Powerful Animation Compression. Nicholas Fréchette Programming Consultant for Eidos Montreal

Simple and Powerful Animation Compression. Nicholas Fréchette Programming Consultant for Eidos Montreal Simple and Powerful Animation Compression Nicholas Fréchette Programming Consultant for Eidos Montreal Contributors Frédéric Zimmer, co-designer Luke Mamacos, consultant Thank you Eidos Montreal! Presentation

More information

Introduction to Neural Networks

Introduction to Neural Networks ECE 5775 (Fall 17) High-Level Digital Design Automation Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering Rise of the Machines Neural networks have

More information

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters

More information

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}@imperial.ac.uk

More information

Resource Efficient Real-Time Processing of Contrast Limited Adaptive Histogram Equalization

Resource Efficient Real-Time Processing of Contrast Limited Adaptive Histogram Equalization Resource Efficient Real-Time Processing of Contrast Limited Adaptive Histogram Equalization Burak Ünal, Ali Akoglu Reconfigurable Computing Lab Department of Electrical and Computer Engineering The University

More information

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,

More information

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Nick Weaver & John Wawrzynek http://inst.eecs.berkeley.edu/~cs61c/sp18 3/16/18 Spring 2018 Lecture #17

More information

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items (ULFFT) November 3, 2008 Product Specification Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E-mail: info@dilloneng.com URL: www.dilloneng.com Core

More information

Scaling Throughput Processors for Machine Intelligence

Scaling Throughput Processors for Machine Intelligence Scaling Throughput Processors for Machine Intelligence ScaledML Stanford 24-Mar-18 simon@graphcore.ai 1 MI The impact on humanity of harnessing machine intelligence will be greater than the impact of harnessing

More information

FPGA BASED IMPLEMENTATION OF DEEP NEURAL NETWORKS USING ON-CHIP MEMORY ONLY. Jinhwan Park and Wonyong Sung

FPGA BASED IMPLEMENTATION OF DEEP NEURAL NETWORKS USING ON-CHIP MEMORY ONLY. Jinhwan Park and Wonyong Sung FPGA BASED IMPLEMENTATION OF DEEP NEURAL NETWORKS USING ON-CHIP MEMORY ONLY Jinhwan Park and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul 151-744 Korea

More information

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

More information

Industry Collaboration and Innovation

Industry Collaboration and Innovation Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance

More information

Xilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs

Xilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs ilinx DNN Proceor An Inference Engine, Network Compiler Runtime for ilinx FPGA Rahul Nimaiyar, Brian Sun, Victor Wu, Thoma Branca, Yi Wang, Jutin Oo, Elliott Delaye, Aaron Ng, Paolo D'Alberto, Sean Settle,

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Parallel Reconfigurable Hardware Architectures for Video Processing Applications

Parallel Reconfigurable Hardware Architectures for Video Processing Applications Parallel Reconfigurable Hardware Architectures for Video Processing Applications PhD Dissertation Presented by Karim Ali Supervised by Prof. Jean-Luc Dekeyser Dr. HdR. Rabie Ben Atitallah Parallel Reconfigurable

More information

Inference Engine compiler and SDK FWDNXT 2018

Inference Engine compiler and SDK FWDNXT 2018 Inference Engine compiler and SDK 1 Deep Learning processor Best performance per power Best utilization Efficient use of memory bandwidth Low latency Scalability: IoT to cloud 2 Deep Learning processor

More information

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information