Neural Computer Architectures
|
|
- Stanley Hines
- 5 years ago
- Views:
Transcription
1 Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date:
2 Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations Neuromorphic
3 Biological Neural Networks 2
4 Biological Neural Networks 3 Presynaptic neuron Postsynaptic neuron Cell body Synapses
5 Perceptron Model (1957) 4 Feed forward processing Tuning the weights by learning Non-linear separability (1969) y b xi w i i step x[1] x[2] x[3] x[k] w[1] w[2] w[3] w[k] K-1 Σ = p φ(p) y k=1 bias sigmoid
6 Convergence of different domains Neurobiology Applications 5 Constraints Machine Learning Technology Innovations Neuromorphic
7 Multi Layer Perceptron (1979) 6 Training is done by error back-propagation Target Input Layer Hidden Layer Output Layer 0 0
8 The Hype Curve of Neural Networks 7 level of interest Non-Linear Separability 1969 SVM 1998 Today Perceptron 1957 Multi Layer Perceptron time
9 Deep Big Neural Networks 8 Deep Big Neural networks outperform SVM ANNs are now state-of-the-art classifiers again 5 layers 1000s of nodes connection constraints Big Deep Network H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation, ICML 2007
10 Convergence of different domains Neurobiology Applications 9 Constraints Machine Learning Technology Innovations Neuromorphic
11 Classification: Face detection 10
12 Intelligent Vision Applications 11 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive
13 Intelligent Vision Applications 12 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive
14 Intelligent Vision Applications 13 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive Old man Breathing Heart beat No action
15 Intelligent Vision Applications 14 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive
16 Intelligent Vision Applications 15 Emerging field of research Applications in many domains Examples: Security, Industrial, Medical, Automotive
17 Classical recognition systems are stupid 16 Design is based on knowledge of the task Carefully tuned pipeline of algorithms Really complex for real world problems Design must be redone if the task changes light correction histogram stretch colour thresholding edge detection corner detection shape recognition hough transform matching neural networks
18 Train a Neural Network for the task 17 Focus on data instead of algorithm complexity Pre-process data to generate more examples Use a test set to verify generalization 30 km/h 50 km/h 60 km/h 70 km/h 80 km/h 90 km/h 100 km/h Background images hard to suppress Random background image patches
19 Biologically inspired object recognition 18 Convolutional Neural Network A deep and big neural network input 32 x 32 C 1 feature maps 28 x 28 C 2 feature maps S 1 10 x 10 feature maps 14 x 14 S 2 feature maps 5 x 5 n 1 n 2 output sign x5 convolution 2x2 subsampling 5x5 convolution feature extraction 2x2 5x5 subsampling convolution 100 1x1 convolution classification
20 Detection and Recognition Application 19
21 Detection and Recognition Application 20
22 Detection and Recognition Application 21
23 Detection and Recognition Application 22
24 Speed Sign Detection and Recognition 23
25 Advantage of flexibility 24 Extend existing trained network Add new road signs and restart training New weight file is new functionality Send new weight file to users (100 KB)
26 Advantage of flexibility 25 Extend existing trained network Add new road signs and restart training New weight file is new functionality Send new weight file to users (100 KB)
27 Major road detection 26
28 What can these NN further do 27 Classification Approximation Optimization Clustering
29 Function Approximation 28 Stock market prediction: Black Scholes
30 Placement Optimization 29 Chip routing: Canneal Minimize wire length Hopfield Neural Network
31 Convergence of different domains Neurobiology Applications 30 Constraints Machine Learning Technology Innovations Neuromorphic
32 Technology Constraints 31 Dark Silicon Defect tolerance
33 Dark Silicon 32 What to do with chips that are too hot? Reduce clock frequency Go multi-core If chip is still too hot? Turn parts of the chip off! Generates Dark Silicon
34 Energy Efficiency 33 Super Computer (K computer, Fujitsu) 8.2 billion Megaflops => 9.9 million watts ~ 800 Megaflops / watt ipad Megaflops => 2.5 watts ~ 68 Megaflops / watt Human Brain 2.2 billion Megaops => 20 watts ~ 110 Teraops / watt
35 Toward Heterogeneous Systems 34 Efficient accelerators Multi-purpose ASICs ANN is a candidate Flexible functionality State-of-the-art results Parallelism
36 Developing ANN Accelerators 35 for i = 1:N Y[i] = Bias[i] for k = 1:K Y[i] += X[k] * W[i][k] Y[i] = Sigmoid(Y[i]) y b x w i i k ik k
37 Time-Multiplexed Accelerator 36 for i = 1:N Y[i] = Bias[i] for k = 1:K Y[i] += X[k] * W[i][k] Y[i] = Sigmoid(Y[i]) y b x w i i k ik k 1 () v 1 exp( a x) Load Bias X[1:N] = 1 W[i][1] = Bias[i] Perform MACC Sigmoid Approximate
38 Analog Intel ETANN Electrically Trainable Analog Neural Network Analog Gilbert- Multiplier Circuits Sum differential currents from synapses and convert to voltage Weights stored as Electrical Charge on floating gates Analog sigmoid activation function
39 Digital Implementation 38 Sigmoid Function Look Up Table Use linear approximation ( x) bi ai x Multiply Accumulate
40 SIMD design Adaptive Solutions N
41 Conversion to vector operations 40 y [ n] b x [ n] w k i i k ik y[ n] b x[ n] W Y b X W
42 Systolic Matrix Multiplication 41 Siemens MA16 High efficiency Low flexibility
43 An example state-of-the-art accelerator 42
44 Systolic 2D Convolution 43
45 Convolutional Neural Network 44 Data reuse input 32 x 32 C 1 feature maps 28 x 28 C 2 feature maps S 1 10 x 10 feature maps 14 x 14 S 2 feature maps 5 x 5 n 1 n 2 output sign x5 convolution 2x2 subsampling 1x1 convolution 5x5 convolution 2x2 5x5 subsampling convolution
46 Reduce Memory Accesses 45 Configurable Number of Input Maps Configurable Number of Output Maps
47 Is it worth the effort? 46 More important the energy efficiency
48 More Flexibility and Better Memory Behaviour? 47
49 Energy for Data Transfer [J] The performance bottleneck 48 Huge data transfer requirements (3.4 billion per layer) Exploit data reuse with local memories DRAM Cache Total On-Chip Cache Size [Words]
50 Accelerator Template 49 FPGA prototyping platform: Xilinx Virtex 6 Designed with Vivado High Level Synthesis (HLS) MACC in_img weight * + acc bias out_img FSLs In Ctrl in_img weight bias MACC MACC MACC MACC MACC MACC MACC MACC DDR Out Ctrl out_img Activation LUT MACC Select saturate MACC
51 Programmable Buffers 50 Image Coefficients addr select wr demux Input FSLs X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr rotate mux sigmoid LUT out-img BRAM Output FSLs weight BRAM addr select rd/wr addr select rd
52 Programmable Buffers 51 Image Coefficients addr select wr demux Input FSLs X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr rotate mux sigmoid LUT out-img BRAM Output FSLs weight BRAM addr select rd/wr addr select rd
53 Programmable Buffers 52 buffer address x 00, x 01, x 02, x 03 x 50 x 51 x 52 x 53 x 54 x 55 x 56 x 57 x 58 x x 00 x 01 x 02 x 03 x 04 x 05 x 06 x 07 x 08 x x 10 x 11 x 12 x 13 x 14 x 15 x 16 x 17 x 18 x x 01, x 02, x 03, x 04 x 02, x 03, x 04, x 05 x 03, x 04, x 05, x 06 x 04, x 05, x 06, x 07 x 10, x 11, x 12, x 13 addr select wr demux Input FSLs X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr rotate mux sigmoid LUT out-img BRAM Output FSLs weight BRAM addr select rd/wr addr select rd
54 Flexible Reuse Buffers input 720 x 1280 Layer 1 6x358x addr select wr 6x6 conv. with 2x2 subsample demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd
55 Flexible Reuse Buffers 54 Image Coefficients addr select wr demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd
56 Flexible Reuse Buffers 55 Image Coefficients addr select wr demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd
57 Flexible Reuse Buffers 56 x 05 x 15 x 25 x 35 buffer address x 04, x 14, x 24, x 34 x x 00 x 40 x 01 x 41 x 02 x 42 x 03 x 43 x 04 x 44 y x 14, x 24, x 34, x 44 x 55 x 10 x 50 x 11 x 51 x 12 x 52 x 13 x 53 x 14 x 54 y 10 x 24, x 34, x 44, x 54 x 65 x x 34, x 44, x 54, x x 60 x 21 x 61 x 22 x 62 x 23 x 63 x 24 x 64 y 20 x 75 x 44, x 54, x 64, x x x 70 x 31 x 71 x 32 x 72 x 33 x 73 x 34 x 74 y 30 y 00, y 10, y 20, y 30 addr select wr demux Input FSLs demux X 0 BRAM X 1 BRAM X 2 BRAM X 3 BRAM addr select rd addr select wr weight BRAM weight BRAM rotate addr select rd
58 Support for Subsampling 57 Image Coefficients x 05 x 15 x 25 x 35 x 45 x 55 x 65 x 75 buffer address x 00 x 10 x 80 x 90 x 01 x 11 x 81 x 20 x 30 x a0 x 21 x 31 x a1 x 40 x 60 x 50 x 70 x 41 x 61 x 51 x x 91 x 02 x 22 x 42 x 62 x 00, x 20, x 40, x 60 x 10, x 30, x 50, x 70 x 20, x 40, x 60, x 80 x 30, x 50, x 70, x 90 x 40, x 60, x 80, x a0 x 01, x 21, x 41, x 61
59 Support for Subsampling 58 Image Coefficients x 05 x 15 x 25 x 35 x 45 x 55 x 65 x 75 buffer address x 00 x 10 x 80 x 90 x 01 x 11 x 81 x 20 x 30 x a0 x 21 x 31 x a1 x 40 x 60 x 50 x 70 x 41 x 61 x 51 x x 91 x 02 x 22 x 42 x 62 x 00, x 20, x 40, x 60 x 10, x 30, x 50, x 70 x 20, x 40, x 60, x 80 x 30, x 50, x 70, x 90 x 40, x 60, x 80, x a0 x 01, x 21, x 41, x 61
60 What would be the best compute order? 59 Small memories have low energy per access Area and Latency advantage Big memories can exploit more data reuse
61 Improve by locality driven synthesis 60 Loop Transformations Interchange Tiling Reduce reuse distance A huge design space! Use a framework with: Reuse detection Model utilized reuse Model required buffer size Optimize for buffer size Cost models
62 Compared to manually optimized order 61 Up to 13x resource reduction Up to 11x performance increase
63 Memory bandwidth requirements? 62 Data layout transformation Bandwidth up to 150 MB/s Better than an optimized Intel implementation
64 What do we achieve? 63 Flexible architecture template HLS vision cores Iteration reordering models to minimize data transfer Small but flexible accelerators Up to 13x smaller Up to 11x faster XPower Analyzer 4.5 Watt External RAM 0.5 Watt
65 Beyond Energy: Defects-Tolerant Accelerators? 64 Growing number of defect Design of micro-architectures Homogeneous Architectures Core redundancy Switch-off the defect cores How about Heterogeneous designs? A little story Defect tolerant accelerators
66 Defects-Tolerant ANNs 65 Memory decoder Spatially unfolding a network Power reduction, Memory BW Time-Multiplexing
67 Hardware ANN Robustness 66 ANN 90 inputs 10 outputs Olivier Temam: A Defect-Tolerant Accelerator for Emerging High-Performance Applications, ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2012
68 Convergence of different domains Neurobiology Applications 67 Constraints Machine Learning Technology Innovations Neuromorphic
69 Beyond ANNs: Biological NNs 68 Understand the mind by simulating the brain Model perception Model memory Etc. Understand brain diseases Parkinson Alzheimer Etc. Software simulators Emergent NEURON Neurons Synapses Hz
70 Can computers do the same? 69 Blue Brain Project IBM/EPFL Molecular level 10 4 neurons 10 3 cores Spinnaker Integrate & fire 10 9 neurons 10 4 Arm9 cores
71 Spinnaker Chip Architecture 70
72 Spinnaker interconnect 71 Connection Hierarchy Group neurons to reduce inter-chip communication 128 MB SDRam Small Packets bit Routing tables
73 Convergence of different domains Neurobiology Applications 72 Constraints Machine Learning Technology Innovations Neuromorphic
74 Size 73 Digital CMOS Technology available Implementation of useful accelerators Not dense enough for largest bio-inspired networks Analog Much more dense implementation Recall Biological Neuron
75 Analog Spiking Neurons 74 Kirchhoff s law Capacitive integration Leakage ~14 transistors
76 Architecture Facets Project 75 Facets Integrate & Fire neurons wafer 60 million synapses Most area used for synapses Storage of connection strength Interconnect 2-D
77 Convergence of different domains Neurobiology Applications 76 Constraints Machine Learning Technology Innovations Neuromorphic
78 Synapses as Memristors Intel (2012) 77 Memristor can be used as switch Also analog storage of memristance
79 Beyond Silicon 78 Infineon NeuroChip (2003) Directly uses biological networks Difficult to connect to other devices
80 Convergence of different domains Neurobiology Applications 79 Constraints Machine Learning Technology Innovations Neuromorphic
Neural Computer Architectures
Neural Computer Architectures Accelerating Deep Learning Applications By: Maurice Peemen Date: 19-12-2018 Background Maurice 1 Masters Electrical Engineering at TU/e PhD work at TU/e Thesis work with Henk
More informationMemory-Centric Accelerator Design for Convolutional Neural Networks
Memory-Centric Accelerator Design for Convolutional Neural Networks Maurice Peemen, Arnaud A. A. Setio, Bart Mesman and Henk Corporaal Department of Electrical Engineering, Eindhoven University of Technology,
More informationNeural Network based Energy-Efficient Fault Tolerant Architect
Neural Network based Energy-Efficient Fault Tolerant Architectures and Accelerators University of Rochester February 7, 2013 References Flexible Error Protection for Energy Efficient Reliable Architectures
More informationBrainchip OCTOBER
Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History
More informationSpiNNaker - a million core ARM-powered neural HPC
The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK
More informationBiologically-Inspired Massively-Parallel Architectures - computing beyond a million processors
Biologically-Inspired Massively-Parallel Architectures - computing beyond a million processors Dave Lester The University of Manchester d.lester@manchester.ac.uk NeuroML March 2011 1 Outline 60 years of
More informationM.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 High Performance Scalable Deep Learning Accelerator
More informationSpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017
SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary
More informationDEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA
DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA J.Jayalakshmi 1, S.Ali Asgar 2, V.Thrimurthulu 3 1 M.tech Student, Department of ECE, Chadalawada Ramanamma Engineering College, Tirupati Email
More informationTETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationAn Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA
An Adaptable Deep Learning Accelerator Unit (DLAU) for FPGA N. Sireesha 1 & P.Malleswari 2 1PG Scholar, Dept of ECE, Narsaraopeta Institute of Technology, Yellamanda, Narsaraopeta, Guntur district, Andhra
More informationESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder
ESE532: System-on-a-Chip Architecture Day 8: September 26, 2018 Spatial Computations Today Graph Cycles (from Day 7) Accelerator Pipelines FPGAs Zynq Computational Capacity 1 2 Message Custom accelerators
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant
More informationTowards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A
More informationIntroduction to Neural Networks
ECE 5775 (Fall 17) High-Level Digital Design Automation Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering Rise of the Machines Neural networks have
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationNeuromorphic Hardware. Adrita Arefin & Abdulaziz Alorifi
Neuromorphic Hardware Adrita Arefin & Abdulaziz Alorifi Introduction Neuromorphic hardware uses the concept of VLSI systems consisting of electronic analog circuits to imitate neurobiological architecture
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationOpening the Black Box Data Driven Visualizaion of Neural N
Opening the Black Box Data Driven Visualizaion of Neural Networks September 20, 2006 Aritificial Neural Networks Limitations of ANNs Use of Visualization (ANNs) mimic the processes found in biological
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationA Large-Scale Spiking Neural Network Accelerator for FPGA Systems
A Large-Scale Spiking Neural Network Accelerator for FPGA Systems Kit Cheung 1, Simon R Schultz 2, Wayne Luk 1 1 Department of Computing, 2 Department of Bioengineering Imperial College London {k.cheung11,
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationCNP: An FPGA-based Processor for Convolutional Networks
Clément Farabet clement.farabet@gmail.com Computational & Biological Learning Laboratory Courant Institute, NYU Joint work with: Yann LeCun, Cyril Poulet, Jefferson Y. Han Now collaborating with Eugenio
More informationA 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation
A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,
More informationApproximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip
Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip Mantas Mikaitis, PhD student @ University of Manchester, UK mantas.mikaitis@manchester.ac.uk 25 th IEEE Symposium
More informationNeurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University
Neurmorphic Architectures Kenneth Rice and Tarek Taha Clemson University Historical Highlights Analog VLSI Carver Mead and his students pioneered the development avlsi technology for use in neural circuits
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationFINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (XIR & NTNU), Nick Fraser (XIR & USydney), Giulio Gambardella (XIR), Michaela Blott (XIR), Philip Leong (USydney),
More informationAll Programmable: from Silicon to System
All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4
More informationDNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,
More informationNeuromorphic Computing: Our approach to developing applications using a new model of computing
Neuromorphic Computing: Our approach to developing applications using a new model of computing David J. Mountain Senior Technical Director Advanced Computing Systems Research Program Background Info Outline
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationMNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System
MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System Lixue Xia 1, Boxun Li 1, Tianqi Tang 1, Peng Gu 12, Xiling Yin 1, Wenqin Huangfu 1, Pai-Yu Chen 3, Shimeng Yu 3, Yu Cao 3,
More informationFPGA Programming Technology
FPGA Programming Technology Static RAM: This Xilinx SRAM configuration cell is constructed from two cross-coupled inverters and uses a standard CMOS process. The configuration cell drives the gates of
More informationPower dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.
The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults
More informationMulti-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture
The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationCS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016
CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationDesign Methodologies
Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationCourse Overview Revisited
Course Overview Revisited void blur_filter_3x3( Image &in, Image &blur) { // allocate blur array Image blur(in.width(), in.height()); // blur in the x dimension for (int y = ; y < in.height(); y++) for
More informationECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University
ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationLab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm
ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationDNN Accelerator Architectures
DNN Accelerator Architectures ISCA Tutorial (2017) Website: http://eyeriss.mit.edu/tutorial.html Joel Emer, Vivienne Sze, Yu-Hsin Chen 1 2 Highly-Parallel Compute Paradigms Temporal Architecture (SIMD/SIMT)
More informationHigh Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx
High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More informationDeep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper
Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationThe Memory Hierarchy 1
The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow
More informationA Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models
A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating Michael Price*, James Glass, Anantha Chandrakasan MIT, Cambridge, MA * now at Analog Devices, Cambridge,
More informationESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?
ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationPipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning Presented by Nils Weller Hardware Acceleration for Data Processing Seminar, Fall 2017 PipeLayer: A Pipelined ReRAM-Based Accelerator for
More informationBack propagation Algorithm:
Network Neural: A neural network is a class of computing system. They are created from very simple processing nodes formed into a network. They are inspired by the way that biological systems such as the
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationImage Compression: An Artificial Neural Network Approach
Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and
More informationDeep Learning with Intel DAAL
Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration
More informationEyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1, Joel Emer 1, 2, Vivienne Sze 1 1 MIT 2 NVIDIA 1 Contributions of This Work A novel energy-efficient
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationMohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu
Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly
More informationMicroprocessor Trends and Implications for the Future
Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationComputer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal
Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationAdvanced Synthesis Techniques
Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationINTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)
INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) Bill Jason P. Tomas Dept. of Electrical and Computer Engineering University of Nevada Las Vegas FIELD PROGRAMMABLE ARRAYS Dominant digital design
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationA VARIETY OF ICS ARE POSSIBLE DESIGNING FPGAS & ASICS. APPLICATIONS MAY USE STANDARD ICs or FPGAs/ASICs FAB FOUNDRIES COST BILLIONS
architecture behavior of control is if left_paddle then n_state
More informationStacked Silicon Interconnect Technology (SSIT)
Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation
More informationOptimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms
Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}@imperial.ac.uk
More informationVLSI Design Automation
VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,
More informationDeep (1) Matthieu Cord LIP6 / UPMC Paris 6
Deep (1) Matthieu Cord LIP6 / UPMC Paris 6 Syllabus 1. Whole traditional (old) visual recognition pipeline 2. Introduction to Neural Nets 3. Deep Nets for image classification To do : Voir la leçon inaugurale
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationYuki Osada Andrew Cannon
Yuki Osada Andrew Cannon 1 Humans are an intelligent species One feature is the ability to learn The ability to learn comes down to the brain The brain learns from experience Research shows that the brain
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationCOMPUTATIONAL INTELLIGENCE
COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationReduce Your System Power Consumption with Altera FPGAs Altera Corporation Public
Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary
More informationPower Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas
Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic
More informationIntroduction to Neural Networks
Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationCAD for VLSI. Debdeep Mukhopadhyay IIT Madras
CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationBridging Analog Neuromorphic and Digital von Neumann Computing
Bridging Analog Neuromorphic and Digital von Neumann Computing Amir Yazdanbakhsh, Bradley Thwaites Advisors: Hadi Esmaeilzadeh and Doug Burger Qualcomm Mentors: Manu Rastogiand Girish Varatkar Alternative
More informationCluster-based approach eases clock tree synthesis
Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More information