Introduction to Neural Networks
|
|
- Horace Garrett
- 6 years ago
- Views:
Transcription
1 ECE 5775 (Fall 17) High-Level Digital Design Automation Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering
2 Rise of the Machines Neural networks have revolutionized the world Self-driving vehicles Advanced image recognition Game AI 1
3 Rise of the Machines Neural networks have revolutionized research 2
4 A Brief History 1950 s: First artificial neural networks created based on biological structures in the human visual cortex 1980 s 2000 s: NNs considered inferior to other, simpler algorithms (e.g. SVM, logistic regression) Mid 2000 s: NN research considered dead, machine learning conferences outright reject most NN papers : NNs begin winning large-scale image and document classification contests, beating other methods 2012 Now: NNs prove themselves in many industrial applications (web search, translation, image analysis) 3
5 Classification Problems We ll discuss neural networks for solving supervised classification problems Inputs Labels Cat Bird Given a training set consisting of labeled inputs Learn a function which maps inputs to labels This function is used to predict the labels of new inputs Inputs Predictions Person??? Training Set Test Set 4
6 A Simple Classification Problem Inputs: Pairs of numbers (x 1, x 2 ) Labels: 0 or 1 x 1 x 2 Label This is a binary decision problem Real-life analogue: Label = Raining or Not Raining x 1 = relative humidity x 2 = cloud coverage Training Set 5
7 Visualizing the Data x 1 x 2 Label x 1 x 2 Plot of the data points Training Set 6
8 Decision Function In this case, the data points can be classified with a linear decision boundary Decision boundary 7
9 The Perceptron The perceptron is the simplest possible neural network, containing only one neuron Described by the following equation:. y = σ(* w, x, + b) w, b σ,/0 = weights = bias = activation function 8
10 Visualizing the Perceptron A 2-input, 1-output perceptron: x 0 w 0 x 3 w 3 + b σ y Output Probability that the label should be 1 Linear function of x = x 0 x 3 Wx + b Non-linear function squashes output to the interval (0,1) 9
11 Activation Function The activation function σ is non-linear: Unit Step Sigmoid ReLU e 9: max (0, x) Discontinuous at origin, zero derivative everywhere Continuous and differentiable everywhere Continuous but non-differentiable at origin First perceptrons used Unit Step. Early neural networks used Sigmoid. Modern deep networks use ReLU. 10
12 The Perceptron Decision Boundary Let s plot the 2-input perceptron (sigmoid activation) y = σ(w 0 x 0 + w 3 x 3 + b) w 0 = 10 w 3 = 10 w 0 = 5 w 3 = 20 w 0 = 20 w 3 = 5 Ratio of weights change the direction of the decision boundary 11
13 The Perceptron Decision Boundary Let s plot the 2-input perceptron (sigmoid activation) y = σ(w 0 x 0 + w 3 x 3 + b) w 0 = 10 w 3 = 10 w 0 = 5 w 3 = 5 w 0 = 2 w 3 = 2 Magnitude of weights change the steepness of the decision boundary 12
14 The Perceptron Decision Boundary Let s plot the 2-input perceptron (sigmoid activation) y = σ(w 0 x 0 + w 3 x 3 + b) b = 0 b = 5 b = 5 Bias moves boundary away the from origin 13
15 Finding the Weights and Bias With the right weights and bias we can create any (linear) decision boundary we want! But how do we compute the weights and bias? Solution: backpropagation training 14
16 Backpropagation Let s take the 2-input perceptron and initialize the weights and bias randomly x w 0 + σ y x 3 w b 1.4 Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 15
17 Backpropagation Send one input pair through and calculate the output x 1 x 2 Label 0.7 x w Output: 0.95 σ y Target: x 3 w b 1.4 Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 16
18 Backpropagation Training How to change {w 0, w 3, b} to move y towards the target? Answer: partial derivatives JK JL M, JK JL N, JK JO 0.7 x w Output: 0.95 σ y Target: x 3 w b 1.4 Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 17
19 Backpropagation We can calculate the derivatives function-by-function Green = partial derivative x 0 x w 0 w b 1.4 s y σ y = σ(s) = y s = e 9S y e 9S (1 + e 9S ) 3 = y y = 1.0 Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 18
20 Backpropagation We can calculate the derivatives function-by-function Propagate partial derivatives backwards x 0 x w 0 w b 1.4 s y σ 1.0 y s = w 0 x 0 + w 3 x 3 + b s b = 1 s w 0 = x 0 s w 3 = x 3 Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 19
21 Backpropagation What we ve done is apply the chain rule of derivatives x 0 x w 0 w b 1.4 s y σ 1.0 y y = y w 0 s s w 0 y s = s w 0 = 0.7 Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 20
22 Backpropagation In general, any operator in a neural network defines a forwards pass and a backwards pass x E x = E f f x y f f(x, y) E E y = E f f y f Backpropagation allows us to build complex networks and still be able to update the weights Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University 21
23 Weight Update We now know how to compute JK JL M, JK JL N, JK JO Parameter update rule (for minimizing y): w = w η JK JL η = learning rate or step size This is optimization by gradient descent, as we move in the direction of the steepest gradient 22
24 Practical Differences For simplicity we minimized y, since label t = 0 In practice we use a loss function L(y, t) Square Loss: L y, t = (y t) 3 Cross Entropy: L y, t = t ln y 1 t ln (1 y) For simplicity we dealt with 1 training sample In practice we process multiple training samples (a minibatch) and sum the gradients before updating the weights 23
25 Deep Neural Network The perceptron s linear decision boundary is not very useful for complex problems A deep neural network (DNN) consists of many layers, each containing multiple neurons This network is fully connected Input Layer Hidden Layers Output Layer Image credit: 24
26 What Happens in a DNN? Each hidden unit learns a hidden feature Later features depends on earlier features in the network, so they will get more complex Simple Features More Complex Features Image credit: 25
27 Deep Neural Network A deep neural network creates highly non-linear decision boundaries 2 units 3 units 4 units Image credit: 26
28 Convolutional Neural Network Convolutional Layers Fully-connected Layers Convolutional neural network (CNN) is a neural network specifically built to analyze images State-of-the-art in image classification Key difference: the layers at the front are convolutional layers Image credit: 27
29 The Convolutional Layer Fully-connected Layer 1. An output neuron is connected to each input neuron Convolutional Layer 1. Inputs and outputs are vectors of 2D feature maps 2. An output pixel is connected to its neighboring region on each input feature map 3. All pixels on an output feature map use the same weights Image credit: 28
30 The Convolutional Layer Input Feature Map Output Feature Map The conv layer moves a weight filter across the input map, doing a dot product at each location The weight filter detects local features in the input map Image credit: 29
31 Intuition on CNNs Convolutional layers detect visual features This shows features in the input image which cause high activity, across 3 different conv layers in a CNN Image credit: H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks, CACM Oct
32 Accelerating CNNs on FPGA Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks Cheng Zhang 1, Peng Li 2, Guangyu Sun 1,3, Yijin Guan 1, Bingjun Xiao 2, Jason Cong 2,3,1 1 Center for Energy-Efficient Computing and Applications, Peking University 2 Computer Science Department, University of California, Los Angeles 3 PKU/UCLA Joint Research Institute in Science and Engineering FPGA 15, Feb
33 Convolutional Layer Code N K C M Filters R Input Feature Maps Output Feature Maps 1 for(row=0; row<r; row++) { 2 for(col=0; col<c; col++) { Loop over output pixels 3 for(to=0; to<m; to++) { Loop over output feature maps 4 for(ti=0; ti<n; ti++) { Loop over input feature maps 5 for(i=0; i<k; i++) { 6 for(j=0; j<k; j++) { Loop over filter pixels output_fm[to][row][col] += weights[to][ti][i][j]*input_fm[ti][s*row+i][s*col+j]; }}}}}} 32
34 Challenges for FPGA We can t just unroll all the loops due to limited FPGA resources Must choose the right pragmas and code transformations Computation Resource... PE-1 PE-2 PE-n Limited compute and routing resources Solution: Loop pipelining Interconnect On-chip memory buffer1 buffer2 Limited on-chip storage Solution: Loop tiling Off-chip Bus External Memory Limited off-chip bandwidth Solution: Data reuse, compute-memory balance 33
35 Loop Tiling N K C M Filters R Input Feature Maps Output Feature Maps 1 for(row=0; row<r; row++) { 2 for(col=0; col<c; col++) { Loop over pixels in an output map 3 for(to=0; to<m; to++) { 4 for(ti=0; ti<n; ti++) { 5 for(i=0; i<k; i++) { 6 for(j=0; j<k; j++) { output_fm[to][row][col] += weights[to][ti][i][j]*input_fm[ti][s*row+i][s*col+j]; }}}}}} 34
36 Loop Tiling N K C M Tr Filters Tc R Input Feature Maps Output Feature Maps 1 for(row=0; row<r; row+=tr) { 2 for(col=0; col<c; col+=tc) { Loop over different tiles in output map 3 for(to=0; to<m; to++) { 4 for(ti=0; ti<n; ti++) { 5 for(trr=row; trr<min(row+tr, R); trr++) { Loops over pixels 6 for(tcc=col; tcc<min(tcc+tc, C); tcc++) { in each tile 7 for(i=0; i<k; i++) { 8 for(j=0; j<k; j++) { output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[tii][s*trr+i][s*tcc+j]; }}}}}} }}}} 35
37 Code with Loop Tiling 1 for(row=0; row<r; row+=tr) { 2 for(col=0; col<c; col+=tc) { 3 for(to=0; to<m; to+=tm) { 4 for(ti=0; ti<n; ti+=tn) { // software: write output feature map // software: write input feature map + filters Software Portion Maximize memory reuse 5 for(trr=row; trr<min(row+tr, R); trr++) { 6 for(tcc=col; tcc<min(tcc+tc, C); tcc++) { FPGA Portion 7 for(too=to; too<min(to+tm, M); too++) { Maximize 8 for(tii=ti; tii<(ti+tn, N); tii++) { computational 9 for(i=0; i<k; i++) { performance 10 for(j=0; j<k; j++) { output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[tii][s*trr+i][s*tcc+j]; }}}}}} }}}} // software: read output feature map 36
38 Optimizing for On-Chip Performance 95 for(i=0; for(trr=row; i<k; trr<min(row+tr, i++) { R); trr++) { Reorder these two loops to the top level 10 6 for(j=0; for(tcc=col; j<k; tcc<min(tcc+tc, j++) { C); tcc++) { 57 for(trr=row; for(too=to; too<min(to+tm, trr<min(row+tr, M); R); too++) trr++) { 68 for(tcc=col; for(tii=ti; tii<(ti+tn, tcc<min(tcc+tc, N); tii++) C); { tcc++) { 79 for(too=to; for(i=0; #pragma i<k; HLS too<min(to+tm, i++) pipeline { M); too++) { Generated Hardware for(too=to; for(j=0; for(tii=ti; j<k; too<min(to+tm, tii<(ti+tn, Tn j++) { N); tii++) M); too++) { { 8 Performance determined for(tii=ti; #pragma output_fm[too][trr][tcc] HLS tii<(ti+tn, unroll N); tii++) += { 5 8 by tile for(trr=row; factors Tn for(tii=ti; trr<min(row+tr, and weights[too][tii][i][j]*input_fm[tii][s*trr+i][s*tcc+j]; output_fm[too][trr][tcc] weights[too][tii][i][j]*input_fm[ti][s*trr+i][s*tcc+j]; Tm tii<(ti+tn, R); N); trr++) tii++) += { { 6 }}}}}} for(tcc=col; weights[too][tii][i][j]*input_fm[tii][s*trr+i][s*tcc+j]; output_fm[too][trr][tcc] #pragma tcc<min(tcc+tc, HLS unroll C); tcc++) += { 7 Amount }}}}}} of for(too=to; data transfer weights[too][tii][i][j]*input_fm[ti][s*trr+i][s*tcc+j]; output_fm[too][trr][tcc] too<min(to+tm, M); += too++) { 8 determined }}}}}} by for(tii=ti; Tr weights[too][tii][i][j]*input_fm[ti][s*trr+i][s*tcc+j]; and tii<(ti+tn, Tc N); tii++) { 9 }}}}}} for(i=0; i<k; i++) { 10 for(j=0; j<k; j++) { output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[ti][s*trr+i][s*tcc+j]; }}}}}} Tm 37
39 Design Space Complexity Challenge: Number of available optimizations present a huge space of possible designs Optimal loop order? Tile size for each loop? Implementing and testing each design by hand will be slow and error-prone Some designs exceed the on-chip resources Solution: Performance modeling + automated design space exploration 38
40 Performance Modeling We calculate the following design metrics: Total number of operations (FLOP) Depends on the CNN model parameters Total external memory access (Byte) Depends on the CNN weight and activation size Total execution time (S) Depends on the hardware architecture (e.g. tile factors Tm and Tn) Ignore resource constrains for now Execution Time = Number of Cycles Clock Period Number of Cycles ^ _` a _ b R C K 3 39
41 Performance Modeling Computation Resource... PE-1 PE-2 PE-n Interconnect On-chip memory buffer1 buffer2 Computational Throughput Total number of operations = Total execution time GFLOP/S Computation To Communication (CTC) Ratio Total number of operations = Total external memory access FLOP / (Byte Accessed) GByte S Off-chip Bus External Memory GFLOP/s = FLOP/(Byte Accessed) = Required Bandwidth Computational Throughput CTC Ratio GByte/S 40
42 Roofline Method GFLOP/S Computational Throughput Design Required Bandwidth Gbyte/S Computation To Communication (CTC) Ratio FLOP / Byte 41
43 Computational Throughput Roofline Method GFLOP/S Computational Roof (Max Exceeds device throughput) On-Chip Resources Bandwidth-Bottlenecked Theoretical Throughput Real Throughput Computation To Communication (CTC) Ratio Bandwidth Roof (Device memory bandwidth) The Roofline FLOP / Byte 42
44 Design Space Exploration 120 Bandwidth Roof Computational Throughput Computational Roof A B Computation To Communication (CTC) Ratio 43
45 Experimental Evaluation The accelerator design is implemented with Vivado HLS on the VC707 board containing a Virtex7 FPGA Hardware resource utilization Resource DSP BRAM LUT FF Used Available Utilization 80% 50% 61% 34% Comparison of theoretical and real performance Layer 2 Theoretical Measured Difference 5.11ms 5.25ms ~5% 44
46 Experimental Results Energy Speedup x x x x thread -O3 16 threads -O3 FPGA 0 1 thread -O3 16 threads -O3 FPGA CPU Xeon E (32nm) 16 cores 2.2 GHz FPGA Virtex7-485t (28nm) 448 PEs 100MHz gcc O3 OpenMP 3.0 Vivado Vivado HLS
47 FIN 46
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Network Chen Zhang 1, Peng Li 3, Guangyu Sun 1,2, Yijin Guan 1, Bingjun Xiao 3, Jason Cong 1,2,3 1 Peking University 2 PKU/UCLA Joint
More informationMachine Learning on FPGAs
Machine Learning on FPGAs Jason Cong Chancellor s Professor, UCLA Director, Center for Domain-Specific Computing cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong 1 Impacts of deep learning for many applications
More informationDesign Exploration of FPGA-based Accelerators for Deep Neural Networks
Design Exploration of FPGA-based Accelerators for Deep Neural Networks Guangyu Sun CECA@Peking University gsun@pku.edu.cn 1 Applications of Deep Neural Networks Unmanned Vehicle Speech & Audio Text & Language
More informationECE 5775 High-Level Digital Design Automation Fall DNN Acceleration on FPGAs
ECE 5775 High-Level Digital Design Automation Fall 2018 DNN Acceleration on FPGAs Announcements HW 2 due Friday at 11:59pm First student-led seminar is on Tuesday 10/16 Format: 18-min talk plus 2-min Q&A
More informationLab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm
ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University
ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationSODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou
SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou University of California, Los Angeles 1 What is stencil computation? 2 What is Stencil Computation? A sliding
More informationOptimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks Chen Zhang 1 chen.ceca@pku.edu.cn Yijin Guan 1 guanyijin@pku.edu.cn Peng Li 2 pengli@cs.ucla.edu Bingjun Xiao 2 xiao@cs.ucla.edu
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationDeep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.
Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationC-Brain: A Deep Learning Accelerator
C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, Xiaowei Li State Key Laboratory
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationAccelerating Convolutional Neural Nets. Yunming Zhang
Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationMinimizing Computation in Convolutional Neural Networks
Minimizing Computation in Convolutional Neural Networks Jason Cong and Bingjun Xiao Computer Science Department, University of California, Los Angeles, CA 90095, USA {cong,xiao}@cs.ucla.edu Abstract. Convolutional
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationMulti-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University
Multi-layer Perceptron Forward Pass Backpropagation Lecture 11: Aykut Erdem November 2016 Hacettepe University Administrative Assignment 2 due Nov. 10, 2016! Midterm exam on Monday, Nov. 14, 2016 You are
More informationDeep neural networks II
Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationBHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques
BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationNeural Computer Architectures
Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations
More informationTowards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,
More informationA Survey of FPGA-based Accelerators for Convolutional Neural Networks
1 A Survey of FPGA-based Accelerators for Convolutional Neural Networks Sparsh Mittal Abstract Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationDesign Space Exploration of FPGA-Based Deep Convolutional Neural Networks
Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Abstract Deep Convolutional Neural Networks (DCNN) have proven to be very effective in many pattern recognition applications, such
More informationIntel HLS Compiler: Fast Design, Coding, and Hardware
white paper Intel HLS Compiler Intel HLS Compiler: Fast Design, Coding, and Hardware The Modern FPGA Workflow Authors Melissa Sussmann HLS Product Manager Intel Corporation Tom Hill OpenCL Product Manager
More informationDNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationLearning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation
Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationGlobal Optimality in Neural Network Training
Global Optimality in Neural Network Training Benjamin D. Haeffele and René Vidal Johns Hopkins University, Center for Imaging Science. Baltimore, USA Questions in Deep Learning Architecture Design Optimization
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationAccelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors
20th International Conference on Artificial Neural Networks (ICANN), Thessaloniki, Greece, September 2010 Accelerating Large-Scale Convolutional Neural Networks with Parallel Graphics Multiprocessors Dominik
More informationNeural Networks (Overview) Prof. Richard Zanibbi
Neural Networks (Overview) Prof. Richard Zanibbi Inspired by Biology Introduction But as used in pattern recognition research, have little relation with real neural systems (studied in neurology and neuroscience)
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationProfiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang
Profiling the Performance of Binarized Neural Networks Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang 1 Outline Project Significance Prior Work Research Objectives Hypotheses Testing Framework
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationOptimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms
Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}@imperial.ac.uk
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationINTRODUCTION TO DEEP LEARNING
INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional
More informationDeep (1) Matthieu Cord LIP6 / UPMC Paris 6
Deep (1) Matthieu Cord LIP6 / UPMC Paris 6 Syllabus 1. Whole traditional (old) visual recognition pipeline 2. Introduction to Neural Nets 3. Deep Nets for image classification To do : Voir la leçon inaugurale
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationDesign Space Exploration of FPGA-Based Deep Convolutional Neural Networks
Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Mohammad Motamedi, Philipp Gysel, Venkatesh Akella and Soheil Ghiasi Electrical and Computer Engineering Department, University
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant
More informationCaffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. X, DEC 2016 1 Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationEnergy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
Energy-Efficient CNN Implementation on a Deeply Pipelined Cluster Chen Zhang 1, Di Wu, Jiayu Sun 1, Guangyu Sun 1,3, Guojie Luo 1,3, and Jason Cong 1,,3 1 Center for Energy-Efficient Computing and Applications,
More informationTwo FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters
Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationComputer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal
Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationKnowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationDeep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?
Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationSu et al. Shape Descriptors - III
Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that
More informationNeural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationDEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA
DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA J.Jayalakshmi 1, S.Ali Asgar 2, V.Thrimurthulu 3 1 M.tech Student, Department of ECE, Chadalawada Ramanamma Engineering College, Tirupati Email
More informationM.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 High Performance Scalable Deep Learning Accelerator
More informationElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer
More informationConvolutional Neural Networks
Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
More informationDEEP NEURAL NETWORKS FOR OBJECT DETECTION
DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional
More informationAttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)
AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon. State-of-the-art frameworks for object
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationCombined Weak Classifiers
Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationCOMPUTATIONAL INTELLIGENCE
COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More information