PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
|
|
- Clare Hunt
- 5 years ago
- Views:
Transcription
1 PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning Presented by Nils Weller Hardware Acceleration for Data Processing Seminar, Fall 2017
2 PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning Purpose: - Processing-in-Memory (PIM) architecture to accelerate Convolutional Neural Networks (CNNs) - Based on novel resistive memory (ReRAM) technology - Incremental improvement on prior works
3 Background: CNNs
4 Background: CNNs Goal: Classify image contents Not shown: Nonlinear activation function after convolution Image:
5 Background: CNNs Goal: Classify image contents Main layer type: Convolution Image:
6 Convolution operation Filter matrix Dot product Input image Output feature map Image: Burger, W. (2016): Digital Image Processing. An Algorithmic Introduction Using Java.
7 Convolution operation Traditional: Fixed e.g. vertical Sobel: Filter matrix Dot product Input image Output feature map Image: Burger, W. (2016): Digital Image Processing. An Algorithmic Introduction Using Java.
8 Convolution operation Traditional: Fixed e.g. vertical Sobel: Filter matrix CNNs: Learned weights for kernel: Dot product Input image Output feature map Image: Burger, W. (2016): Digital Image Processing. An Algorithmic Introduction Using Java.
9 Background: CNNs Goal: Classify image contents Image:
10 Background: CNNs Goal: Classify image contents Two phases: 1. Training 2. Testing (= first half of training) Image:
11 Background: CNNs Phase 1: Training Process image Label: boat Image:
12 Background: CNNs Phase 1: Training Process image True value (label): dog (0) cat (0) boat (1) bird (0) Label: boat Image: E(output)
13 Background: CNNs Phase 1: Training Process image True value (label): dog (0) cat (0) boat (1) bird (0) Label: boat E(output) Backpropagate error, gradient descent method - Calculate error contribution for layers - Update weights to reduce error Image:
14 Background: CNNs Phase 1: Training... Image:
15 Background: CNNs Summary: - Large amounts of data - Acceleration desirable - Particularly for training - Simple core operations (matrix/dot product) - Opportunities for parallelization (single- or multi-image) - Non-trivial training process - Error computations - Dependencies on intermediate results
16 Background: Resistive RAM (ReRAM)
17 Background: Resistive RAM (ReRAM) 1971: Theory of Fourth Fundamental Circuit Element (Leon Chua) Resistor Capacitor Indctor Memristor = Memory + Resistance: - Passive element - Resistance depends on charge passed through it - Enabling inherent computational capabilities No separate processing units Electrical network theory Image: Wikipedia
18 Background: Resistive RAM (ReRAM) 1971: Theory of Fourth Fundamental Circuit Element (Leon Chua) Resistor Capacitor Indctor Memristor = Memory + Resistance: - Passive element - Resistance depends on charge passed through it - Enabling inherent computational capabilities No separate processing units 2008: Electrical network theory Image: Wikipedia Strukov et al. (HP Labs): The missing memristor found. In: Nature Discovery in molecular electronics: - Memristor-like behavior through metal-oxide structures - Enabled through flow of oxygen atoms
19 Background: Resistive RAM (ReRAM) 1971: Theory of Fourth Fundamental Circuit Element (Leon Chua) Resistor Capacitor Indctor Memristor = Memory + Resistance: - Passive element - Resistance depends on charge passed through it - Enabling inherent computational capabilities No separate processing units 2008: Electrical network theory Image: Wikipedia Strukov et al. (HP Labs): The missing memristor found. In: Nature Discovery in molecular electronics: - Memristor-like behavior through metal-oxide structures - Enabled through flow of oxygen atoms Since then: - Resistive memory designs and prototypes - Research in Processing-in-Memory with resistive memories
20 Background: Resistive RAM (ReRAM) Hu et al. (2016): Dot-Product Engine for Neuromorphic Computing: Programming 1T1M Crossbar to Accelerate Matrix-Vector Multiplication
21 Background: Resistive RAM (ReRAM) Hu et al. (2016): Dot-Product Engine for Neuromorphic Computing: Programming 1T1M Crossbar to Accelerate Matrix-Vector Multiplication - Accumulation of voltages (Kirchoff s Law) - Resistance of memristors acts as weight - Parallel processing! Conductance matrix Feedback resistance
22 Background: Resistive RAM (ReRAM) Hu et al. (2016): Dot-Product Engine for Neuromorphic Computing: Programming 1T1M Crossbar to Accelerate Matrix-Vector Multiplication e v i a N
23 Background: Resistive RAM (ReRAM) Hu et al. (2016): Dot-Product Engine for Neuromorphic Computing: Programming 1T1M Crossbar to Accelerate Matrix-Vector Multiplication e v i a N - Assumes linear memristor conductance - Ignores circuit pararistics More things to consider, but the basic idea is sound
24 ReRAM-based PIM architecture
25 ReRAM-based PIM architecture Building a complete ReRAM system from building blocks: - HW structures for real CNN processing - Programmable for different CNNs - Process real benchmarks
26 ReRAM-based PIM architecture Building a complete ReRAM system from building blocks: - HW structures for real CNN processing - Programmable for different CNNs - Process real benchmarks
27 ReRAM-based PIM architecture Building a complete ReRAM system from building blocks: - HW structures for real CNN processing - Programmable for different CNNs - Process real benchmarks a r t No u s g n i in - claim: pipeline design not suitable for training due to stalls - claim: ADC/DAC overhead could be improved t r o pp - doesn t do CNNs
28 ReRAM-based PIM architecture Building a complete ReRAM system from building blocks: - HW structures for real CNN processing - Programmable for different CNNs - Process real benchmarks a r t No u s g n i in - claim: pipeline design not suitable for training due to stalls - claim: ADC/DAC overhead could be improved t r o pp - doesn t do CNNs
29 Side note Full CNN processing introduces further practical issues: 1. Computations are analog errors will occur 2. Some CNN layers cannot be computed with ReRAM AlexNet, 2012:
30 Side note Full CNN processing introduces further practical issues: Empirical results: NNs are resilient to errors 1. Computations are analog errors will occur 2. Some CNN layers cannot be computed with ReRAM AlexNet, 2012: 2015: CNNs without LCN shown to work just as well
31 PipeLayer: Architecture Main considerations: 1. Training support 2. Intra-Layer Parallelism 3. Inter-Layer Parallelism
32 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training
33 PipeLayer: Architecture 1. Training support Intermediate memory (memory subarray) Computation and weight storage (morphable subarray) Partial derivative for weight (averaged) Training label Figure 3: PipeLayer configured for training
34 PipeLayer: Architecture 1. Training support Intermediate memory (memory subarray) Computation and weight storage (morphable subarray) Partial derivative for weight (averaged) Training label Figure 3: PipeLayer configured for training Concept of batching: - Process batch of images with fixed weights - Update weights after batch Reduce update overhead
35 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training Process image 1 of 2-sized batch (ignoring parallelism)
36 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training Process image 1 of 2-sized batch (ignoring parallelism)
37 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training Process image 1 of 2-sized batch (ignoring parallelism)
38 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training Process image 2 of 2-sized batch (ignoring parallelism)
39 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training Process image 2 of 2-sized batch (ignoring parallelism)
40 PipeLayer: Architecture 1. Training support Figure 3: PipeLayer configured for training Batch complete - Weight update
41 PipeLayer: Architecture 1. Training support Image unclear: - Weight update path not shown - Text references nonexistent b derivatives Figure 3: PipeLayer configured for training Batch complete - Weight update
42 PipeLayer: Architecture 2. Intra-layer parallelism
43 PipeLayer: Architecture 2. Intra-layer parallelism Without parallelism: Added complexity: - Process batch of images in one go - Use multiple kernels Basic crossbar array matrix-vector computation scheme
44 PipeLayer: Architecture 2. Intra-layer parallelism Without parallelism: With parallelism: - Duplicate processing structure for parallelism - Break up computation arrays due to HW size constraints
45 PipeLayer: Architecture 3. Inter-layer parallelism
46 PipeLayer: Architecture 3. Inter-layer parallelism Conceptually: img2 img1
47 PipeLayer: Architecture 3. Inter-layer parallelism Conceptually: img3 img2 img1
48 PipeLayer: Architecture 3. Inter-layer parallelism Conceptually: img4 img3 img2 img1
49 PipeLayer: Architecture 3. Inter-layer parallelism Conceptually: img4 Implications: img3 img2 img1 - Need to buffer multiple intermediate results for later use
50 PipeLayer: Architecture 3. Inter-layer parallelism Conceptually: img4 Implications: img3 img2 img1 - Need to buffer multiple intermediate results for later use - Weight update requires pipeline flush (does it really?)
51 PipeLayer: Architecture 3. Inter-layer parallelism Paper seems to agree on flush/stall: Last image before update (gap of 2L+1 cycles) Update looks larger, but is only 1 cycle
52 PipeLayer: Architecture 3. Inter-layer parallelism Paper seems to agree on flush/stall: but: Last image before update (gap of 2L+1 cycles) Update looks larger, but is only 1 cycle How is this pipeline design superior to ISAAC s?
53 PipeLayer: Implementation
54 PipeLayer: Implementation Spike coding: analog input to Spike coding driver (for energy/area reduction): Input to weighted spikes conversion digital spike sequence without ADC. Output spike count = accumulated input*weight Activation function component Typical division into memory-only + memory/computation areas details like error propagation not visualized
55 PipeLayer: Discussion - Limited ReRAM precision - Previous works showed NNs to take errors well
56 PipeLayer: Evaluation - Large improvements vs. reference GPU - Architecture is simulated (could results be impaired?)
57 Summary The work: - Successful design of ReRAM-based memory architecture for PIM - Good improvements in test setup - Support for training is new (but not a groundbreaking idea) The paper: - Sensibly structured - Appropriate drawings - Many implicit assumptions; reasoning for claims often missing - Many grammatical errors
58 Take-aways 1. The work is made possible by progress in an interesting combination of fields 1990s: Initial PIM concepts 1971: Memristor 2008: Molecular electronics 2012: AlexNet CNN ReRAM-based CNN accelerators 2015: Good CNNs without contrast normalization layer 2. Various optimization techniques mentioned in this seminar are used - Hardware acceleration / PIM - Various layers of parallelism - Precision-speed trade-offs
59 Thanks for your time! Questions?
PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong
More informationC-Brain: A Deep Learning Accelerator
C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, Xiaowei Li State Key Laboratory
More informationPipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen University of Pittsburgh, University of Southern California {linghao.song, hal66, yiran.chen}@pitt.edu,
More informationIndex. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,
Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110
More informationThe Mathematics Behind Neural Networks
The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationNeural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators
Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators Yulhwa Kim, Hyungjun Kim, and Jae-Joon Kim Dept. of Creative IT Engineering, Pohang University of Science and Technology (POSTECH),
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationNeural Network with Binary Activations for Efficient Neuromorphic Computing. Weier Wan and Ling Li
Neural Network with Binary Activations for Efficient Neuromorphic Computing Weier Wan and Ling Li Abstract In this paper, we proposed techniques to train and deploy a multi-layer neural network using softmax
More informationDeep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers
Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationCOMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units
COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationSpeeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns
March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations
More informationECE 571 Advanced Microprocessor-Based Design Lecture 18
ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 November 2014 Homework #4 comments Project/HW Reminder 1 Stuff from Last
More informationSwitched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network
Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua
More informationTraining Deep Convolutional Neural Networks with Resistive Cross-Point Devices
Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices Authors: Tayfun Gokmen,* O. Murat Onen, Wilfried Haensch Affiliations IBM T.J. Watson Research Center, Yorktown Heights, NY
More informationEnergy-Efficient SQL Query Exploiting RRAM-based Process-in-Memory Structure
Energy-Efficient SQL Query Exploiting RRAM-based Process-in-Memory Structure Yuliang Sun, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua National Laboratory for Information Science and Technology (TNList),
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationFuzzy Set Theory in Computer Vision: Example 3, Part II
Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationScaling Neural Network Acceleration using Coarse-Grained Parallelism
Scaling Neural Network Acceleration using Coarse-Grained Parallelism Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2018 Neural Networks (NNs)
More informationLecture 24 Near Data Computing II
EECS 570 Lecture 24 Near Data Computing II Winter 2018 Prof. Satish Narayanasamy http://www.eecs.umich.edu/courses/eecs570/ EECS 570 Lecture 23 Slide 1 Readings ISAAC: A Convolutional Neural Network Accelerator
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationSpeculations about Computer Architecture in Next Three Years. Jan. 20, 2018
Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization
More informationSoftware Defined Hardware
Software Defined Hardware For data intensive computation Wade Shen DARPA I2O September 19, 2017 1 Goal Statement Build runtime reconfigurable hardware and software that enables near ASIC performance (within
More informationPoseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters Hao Zhang Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jianliang Wei, Pengtao Xie,
More informationOptimization in the Big Data Regime 5: Parallelization? Sham M. Kakade
Optimization in the Big Data Regime 5: Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 21 Announcements...
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationFault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems
Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems Lixue ia 1, Mengyun Liu 2, uefei Ning 1, Krishnendu Chakrabarty 2, Yu Wang 1 1 Dept. of E.E., Tsinghua National
More informationBrainchip OCTOBER
Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History
More informationBoolean Algebra & Digital Logic
Boolean Algebra & Digital Logic Boolean algebra was developed by the Englishman George Boole, who published the basic principles in the 1854 treatise An Investigation of the Laws of Thought on Which to
More informationNewton: Gravitating Towards the Physical Limits of Crossbar Acceleration
Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration Anirban Nag, Ali Shafiee, Rajeev Balasubramonian, Vivek Srikumar, Naveen Muralimanohar School of Computing, University of Utah,
More informationDimensionality reduction as a defense against evasion attacks on machine learning classifiers
Dimensionality reduction as a defense against evasion attacks on machine learning classifiers Arjun Nitin Bhagoji and Prateek Mittal Princeton University DC-Area Anonymity, Privacy, and Security Seminar,
More informationA performance comparison of Deep Learning frameworks on KNL
A performance comparison of Deep Learning frameworks on KNL R. Zanella, G. Fiameni, M. Rorro Middleware, Data Management - SCAI - CINECA IXPUG Bologna, March 5, 2018 Table of Contents 1. Problem description
More informationAccelerating Convolutional Neural Nets. Yunming Zhang
Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune
More informationLecture: Deep Convolutional Neural Networks
Lecture: Deep Convolutional Neural Networks Shubhang Desai Stanford Vision and Learning Lab 1 Today s agenda Deep convolutional networks History of CNNs CNN dev Architecture search 2 Previously argmax
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationConvolutional Neural Networks for Object Classication in CUDA
Convolutional Neural Networks for Object Classication in CUDA Alex Krizhevsky (kriz@cs.toronto.edu) April 16, 2009 1 Introduction Here I will present my implementation of a simple convolutional neural
More informationSuggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from
More informationThe OpenVX Computer Vision and Neural Network Inference
The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos
More informationTransfer Learning. Style Transfer in Deep Learning
Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More informationGray-Code Input DAC Architecture for Clean Signal Generation
Nov. 9 NA-L2 8:30-9:50 Gray-Code Input DAC Architecture for Clean Signal Generation Richen.Jiang, G.Adhikari, Yifei.Sun, Dan.Yao, R.Takahashi, Y.Ozawa, N.Tsukiji, H.Kobayashi, R.Shiota Gunma University,
More informationHandwritten Mathematical Expression Recognition
Handwritten Mathematical Expression Recognition Group 27 - Abhyãsa Abhishek Gunda abhigun@iitk.ac.in Krishna Karthik jkrishna@iitk.ac.in Harsha Nalluru harshan@iitk.ac.in Aravind Reddy arareddy@iitk.ac.in
More informationPhase Change Memory An Architecture and Systems Perspective
Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,
More informationTwo FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters
Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed
More informationMNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System
MNSIM: A Simulation Platform for Memristor-based Neuromorphic Computing System Lixue Xia 1, Boxun Li 1, Tianqi Tang 1, Peng Gu 12, Xiling Yin 1, Wenqin Huangfu 1, Pai-Yu Chen 3, Shimeng Yu 3, Yu Cao 3,
More informationDECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS
DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between
More informationLecture 37: ConvNets (Cont d) and Training
Lecture 37: ConvNets (Cont d) and Training CS 4670/5670 Sean Bell [http://bbabenko.tumblr.com/post/83319141207/convolutional-learnings-things-i-learned-by] (Unrelated) Dog vs Food [Karen Zack, @teenybiscuit]
More informationHigh Performance Computing
High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic
More informationImplementation of Deep Convolutional Neural Net on a Digital Signal Processor
Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationENGR 210 Lab1. Ohm's Law
ENGR 210 Lab1 Ohm's Law Background In the class lectures we have discussed the fundamental electrical quantities of voltage, current and resistance. Since these quantities are so important there are specialized
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationSu et al. Shape Descriptors - III
Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that
More informationNeural Computer Architectures
Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations
More informationDeep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things
Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things Koji Maruhashi An important problem in information and communications technology (ICT) is classifying
More informationBHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques
BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationDeep Learning with R. Francesca Lazzeri Data Scientist II - Microsoft, AI Research
with R Francesca Lazzeri - @frlazzeri Data Scientist II - Microsoft, AI Research Agenda with R What is Demo Better understanding of R DL tools Fundamental concepts in Forward Propagation Algorithm Activation
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationA Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationFault-Tolerant Training Enabled by On-Line Fault Detection for RRAM-Based Neural Computing Systems
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1109/TCAD.2018.2855145,
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationCaffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications
Caffe2C: A Framework for Easy Implementation of CNN-based Mobile Applications Ryosuke Tanno and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1. INTRODUCTION Deep
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationRobotic Systems ECE 401RB Fall 2006
The following notes are from: Robotic Systems ECE 401RB Fall 2006 Lecture 13: Processors Part 1 Chapter 12, G. McComb, and M. Predko, Robot Builder's Bonanza, Third Edition, Mc- Graw Hill, 2006. I. Introduction
More informationFPGA Power Management and Modeling Techniques
FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining
More informationSECTION 1 INTRODUCTION. Walt Kester
SECTION 1 INTRODUCTION Walt Kester This book deals with sensors and associated signal conditioning circuits. The topic is broad, but the focus of this book is to concentrate on circuit and signal processing
More informationEyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1, Joel Emer 1, 2, Vivienne Sze 1 1 MIT 2 NVIDIA 1 Contributions of This Work A novel energy-efficient
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationPRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Ping Chi, Shuangchen
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationBridging Analog Neuromorphic and Digital von Neumann Computing
Bridging Analog Neuromorphic and Digital von Neumann Computing Amir Yazdanbakhsh, Bradley Thwaites Advisors: Hadi Esmaeilzadeh and Doug Burger Qualcomm Mentors: Manu Rastogiand Girish Varatkar Alternative
More informationNEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints
NEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints Yu Ji, YouHui Zhang, ShuangChen Li,PingChi, CiHang Jiang,PengQu,YuanXie, WenGuang Chen Email: zyh02@tsinghua.edu.cn,
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationDRISA: A DRAM-based Reconfigurable In-Situ Accelerator
DRI: A DRAM-based Reconfigurable In-Situ Accelerator Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, Yuan Xie University of California, Santa Barbara Memory Solutions Lab, Samsung
More informationSCIENCE & TECHNOLOGY
Pertanika J. Sci. & Technol. 5 (S): 35-4 (017) SCIENCE & TECHNOLOGY Journal homepage: http://www.pertanika.upm.edu.my/ Transmission Lines Modelling based on RLC Passive and Active Filter Design Ahmed Qasim
More informationTowards Scalable Machine Learning
Towards Scalable Machine Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Fraunhofer Center Machnine Larning Outline I Introduction
More informationDeep Learning for Embedded Security Evaluation
Deep Learning for Embedded Security Evaluation Emmanuel Prouff 1 1 Laboratoire de Sécurité des Composants, ANSSI, France April 2018, CISCO April 2018, CISCO E. Prouff 1/22 Contents 1. Context and Motivation
More informationMemristive stateful logic
Memristive stateful logic Eero Lehtonen, Jussi Poikonen 2 University of Turku, Finland 2 Aalto University, Finland January 22, 24 Outline Basic principle of memristive stateful logic 2 Generalized memristive
More informationDEEP NEURAL NETWORKS FOR OBJECT DETECTION
DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional
More informationTETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural
More informationRAPIDNN: In-Memory Deep Neural Network Acceleration Framework
RAPIDNN: In-Memory Deep Neural Network Acceleration Framework Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar and Tajana Rosing Computer Science and Engineering Department,
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationEMERGING NON VOLATILE MEMORY
EMERGING NON VOLATILE MEMORY Innovative components for neuromorphic architecture Leti, technology research institute Contact: leti.contact@cea.fr Neuromorphic architecture Brain-inspired computing has
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More information