Low-Power Neural Processor for Embedded Human and Face detection
|
|
- Benjamin Ball
- 6 years ago
- Views:
Transcription
1 Low-Power Neural Processor for Embedded Human and Face detection Olivier Brousse 1, Olivier Boisard 1, Michel Paindavoine 1,2, Jean-Marc Philippe, Alexandre Carbon (1) GlobalSensing Technologies (GST) Dijon, France (2) LEAD Université de Bourgogne CNRS, Dijon, France (3) DACLE - CEA LIST Nano-innov, Palaiseau, France June 23th 2016 NeuroSTIC O. Brousse 1
2 Introduction An optimization of performance vs complexity consists in bio-inspired Human Vision performances in words of detection and recognition: Simple Tasks with Human Brain vs Von Neuman Computer (like PC): - Recognizes in less than one second this image: - But Calculates in less than one second ( x =?) Artificial vision model proposal for embedded systems: - Arithmetic calculations used in image filtering for example: -> Von Neuman (or Harvard) architectures - Object recognition from natural images: ->Neuro-inspired Human intelligence: Artificial Intel. on Silicon June 23th 2016 NeuroSTIC O. Brousse 2
3 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 3
4 Deep Neural Network Models ImageNet classification (Hinton s team, hired by Google) 1.2 million high res images, 1,000 different classes Top-5 17% error rate (huge improvement) Learned features on first layer Facebook s DeepFace Program (labs head: Y. LeCun) 4 million images, 4,000 identities 97.25% accuracy, vs % human performance June 23th 2016 NeuroSTIC O. Brousse 4
5 State-of-the-art in Recognition Database # Images # Classes Best score MNIST Handwritten digits 60, , % [3] GTSRB Traffic sign CIFAR-10 airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck ~ 50, % [4] 50, ,000 State-of-the-art are Deep Neural Networks every time % [5] Caltech-101 ~ 50, % [6] ImageNet ~ 1,000,000 1,000 Top-5 83% [1] DeepFace ~ 4,000,000 4, % [2] June 23th 2016 NeuroSTIC O. Brousse 5 INCREASING COMPLEXITY
6 CNNs Organization Deep = number of layers >> 1 June 23th 2016 NeuroSTIC O. Brousse 6
7 State-of-the-art CNN Example The German Traffic Sign Recognition Benchmark (GTSRB) 43 traffic sign types > 50,000 images Neurons: 287,843 Synapses: 1,388,800 Total memory: 1.5MB (with 8 bits synapses) Connections: 124,121,800 [3] D. Ciresan, U. Meier, J. Masci, J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural Networks (32), pp , 2012 Near human recognition (> 98%) [3] June 23th 2016 BioInspried Low-Power - O. Brousse 7
8 An other Neuro-Inspired Model: Hmax (a NeuroScience Approach) Hmax Model: Serre et al, IEEE PAMI 2007 Poggio et al., J Neurophysiol, 2007 June 23th 2016 NeuroSTIC O. Brousse 8
9 Neuro-Inspired Models: The Hmax S1 layer using Gabor filters June 23th 2016 NeuroSTIC O. Brousse 9
10 Neuro-Inspired Models: The Hmax Original Image Gabor Filters June 23th 2016 NeuroSTIC O. Brousse 10
11 Original Image Gabor Filters BioInspried Low-Power - O. Brousse 11 June 23th 2016
12 Neuro-Inspired Models: The Hmax Hmax Model performances June 23th 2016 NeuroSTIC O. Brousse 12
13 Hmax accelerator: Complexity 64 Gabor Filters 1 Mpixels Image complexity: S1: Optimized Gabor Filters: 2.9 GMAC C1: Max: 0.13 GOP RBF Neural Network : 0.4 GOP One IP camera 1M 30 fps: 103 GOP/sec Total: 3.43 GMAC & OP June 23th 2016 NeuroSTIC O. Brousse 13
14 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 14
15 Pneuro accelerator (Joint Laboratory CEA & GST initiated in 2013) Objective: Designing a processor integrating within the same chip signal processing functions and neuronal functions: Hmax, CNN Data In (Signals, Images) Cluster NeuroCores Cluster NeuroCores Cluster NeuroCores Classification Result From Previous NeuroDSP PNeuro: A Cascadable Parallel Architecture To Next NeuroDSP June 23th 2016 NeuroSTIC O. Brousse 15
16 PNeuro accelerator overview June 23th 2016 NeuroSTIC O. Brousse 16
17 PNeuro accelerator: Main Specifications - Programmable NeuroCores, each can perform image/signal processing and neural functions - Optimized for MAC and Neural operations - Signal processing: convolution filters, etc. - Neural functions: weighted inputs sum - Can perform non-linear operations (maximas, tangh, ) - 1 NeuroCore represents 1 neuron - NeuroCores can be time multiplexed for implementing bigger networks - Optimized memory accesses for data locality and reuse Variable number of clusters to accommodate different application domains and related performances June 23th 2016 NeuroSTIC O. Brousse 17
18 PNeuro accelerator: Performances Profiling result: based on FDSOI 28 nm technology One cluster of 4 1GHz: 32 GMAC/sec with 70mW power consumption, including memories and the controller 32 1GHz: 1024 GMAC/sec 2.2W Energy Efficiency: 465 GMAC.s -1 /W Full Hmax One IP camera 1M 30 fps: 103 GOP/sec Needs 4 clusters of 4 Neuro-Cores (sup[103/32]) 280mW June 23th 2016 NeuroSTIC O. Brousse 18
19 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 19
20 Face Detection Application Example (1/2) June 23th 2016 NeuroSTIC O. Brousse 20
21 Face Detection Application Example (2/2) Complexity Calculation divided by 8 (merge 8 scales) For one camera 1M 30 fps: 12.9 GOP.sec -1 (103 GOP.sec -1 /8) Needs One Cluster with 2 NeuroCores: Power consumption < 35mW For a VGA 30 fps only 1 NeuroCore: < 20 mw June 23th 2016 NeuroSTIC O. Brousse 21
22 Human detection: Hmin 64 Gabor Filters (7x7 to 37x37) + Original Image Local Maxima (C1) C1 Output Classification with RBF Neural Network (S2, C2) June 23th 2016 NeuroSTIC O. Brousse 22
23 Human Detection Application Example June 23th 2016 NeuroSTIC O. Brousse 23
24 Human Detection Application Example June 23th 2016 NeuroSTIC O. Brousse 24
25 Human Detection Application Example S1 Layer Gabor Filters C1 Layer Max Pooling RBF Classification Human Detected In order to reduce complexity, optimization from Masquelier et al (Plos Computational Biology 2007): 5 images scales (1, 0.7, 0.5, 0.35 and 0.25) 4 orientations One Gabor filter (15x15) per scale and per orientation 20 Gabor Filters 1 Mpixels Image complexity: S1: Optimized Gabor Filters: 2.9 GMAC 0.6 GMAC C1: Max: 0.1 GOP RBF Neural Network : 0.3 GOP Total: 3.43 GMAC & OP 1 GMAC & OP June 23th 2016 NeuroSTIC O. Brousse 25
26 Human Detection Application Example Complexity Calculation divided by 3.43: Original Hmax One IP camera 1M 30 fps: 103 GOP/sec Optimized Hmax One IP camera 1M 30 fps: 30 GOP/sec Using FDSOI28 technology, one cluster of 4 1GHz: 32 GMAC/sec with 70 mw power consumption Optimized Hmax needs 4 Neuro-Cores for one IP camera 1M 30fps: Power consumption 70mW For a VGA 30 fps only 2 Neuro-Cores : 35mW June 23th 2016 NeuroSTIC O. Brousse 26
27 Human Detection Application Examples June 23th 2016 NeuroSTIC O. Brousse 27
28 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 28
29 Conclusion PNeuro architecture optimized for NeuroInspired algorithms: Hmax, Convolutional Neural Network and more generally Deep Neural Networks PNeuro: A cascadable parallel architecture Performances in FDSOI 28nm allow to consider embedded applications with a very low power consumption: Face Detection needs only 20mW for a VGA image@30fps Human Detection needs only 35mW for a VGA image@30fps PNeuro implemented also on FPGA June 23th 2016 NeuroSTIC O. Brousse 29
30 PNeuro on FPGA First demonstration on a FPGA-based PNeuro Single cluster configuration (4 Neuro-Cores) Embedded CNN application (60 neurons on the hidden layer, 450 KOps) Faces extraction, images on the database, 96% recognition rate Same application ported on 5 different architectures Embedded CPU: Raspberry PI 2 B, Odroid Xu3 Embedded GPU: NVidia Tegra K1 (batch) Desktop CPU: Intel I7 PNeuro, Quad Neuro-Cores Using a in-house prototyping board Target Frequency Energy efficiency Intel I7 (CPU) 3400 MHz 160 images/w Quad ARM A15 (CPU) 2000 MHz 350 images/w Quad ARM A7 (CPU) 900 MHz 380 images/w Tegra K1 (GPU) 850 MHz 600 images/w PNeuro (FPGA) 100 MHz 2000 images/w FPGA approach is already competitive with existing CPU & GPU solutions First FPGA product developed for early 2017 by GST Embedded FPGA: Artix 100 (~1W), 17.6cm² for the board, including one cluster June 23th 2016 NeuroSTIC O. Brousse 30
31 Article in EETimes Embedded WORLD demonstration (feb 2016) June 23th 2016 NeuroSTIC O. Brousse 31
32 Merci! June 23th 2016 NeuroSTIC O. Brousse 32
GST, from Sensor to Decision
GST, from Sensor to Decision Videos Signals Mul2Media sound, images 2D, 3D,.. GST: Created in september 2011 12 persons (10 in R&D and 2 in Business) Technology based on neuro-inspired algorithms: E m
More informationBrainchip OCTOBER
Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History
More informationDEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationA new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology
Dr.-Ing Jens Benndorf (DCT) Gregor Schewior (DCT) A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Tensilica Day 2017 16th
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu
Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationIs Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th
Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th Today s Story Why does CNN matter to the embedded world? How to enable CNN in
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More informationdirect hardware mapping of cnns on fpga-based smart cameras
direct hardware mapping of cnns on fpga-based smart cameras Workshop on Architecture of Smart Cameras Kamel ABDELOUAHAB, Francois BERRY, Maxime PELCAT, Jocelyn SEROT, Jean-Charles QUINTON Cordoba, June
More informationComputer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal
Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs
More informationTraffic Sign Localization and Classification Methods: An Overview
Traffic Sign Localization and Classification Methods: An Overview Ivan Filković University of Zagreb Faculty of Electrical Engineering and Computing Department of Electronics, Microelectronics, Computer
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationDeep Face Recognition. Nathan Sun
Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an
More informationTHE NVIDIA DEEP LEARNING ACCELERATOR
THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural
More informationOutline GF-RNN ReNet. Outline
Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationDIGITS DEEP LEARNING GPU TRAINING SYSTEM
DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection, Localization,
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More informationDeep Learning & Neural Networks
Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural
More informationDNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationDeep Neural Network Hyperparameter Optimization with Genetic Algorithms
Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,
More informationDEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM
DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for
More informationMachine Learning on VMware vsphere with NVIDIA GPUs
Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology
More informationImplementation of Deep Convolutional Neural Net on a Digital Signal Processor
Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm
More informationDeep Learning Processing Technologies for Embedded Systems. October 2018
Deep Learning Processing Technologies for Embedded Systems October 2018 1 Neural Networks Architecture Single Neuron DNN Multi Task NN Multi-Task Vehicle Detection With Region-of-Interest Voting Popular
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationIn-memory computing with emerging memory devices
In-memory computing with emerging memory devices Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico di Milano daniele.ielmini@polimi.it Emerging memory devices 2 Resistive switching
More informationNeurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University
Neurmorphic Architectures Kenneth Rice and Tarek Taha Clemson University Historical Highlights Analog VLSI Carver Mead and his students pioneered the development avlsi technology for use in neural circuits
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationDeep Learning with Intel DAAL
Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationReal Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications
Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More informationObject Recognition II
Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationMinimizing Computation in Convolutional Neural Networks
Minimizing Computation in Convolutional Neural Networks Jason Cong and Bingjun Xiao Computer Science Department, University of California, Los Angeles, CA 90095, USA {cong,xiao}@cs.ucla.edu Abstract. Convolutional
More information! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!
Exams ECS 189 WEB PROGRAMMING! If you are satisfied with your scores on the two midterms, you can skip the final! As soon as your Photobooth and midterm are graded, I can give you your course grade (so
More informationFrom Maxout to Channel-Out: Encoding Information on Sparse Pathways
From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant
More informationSmall is the New Big: Data Analytics on the Edge
Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationDEEP NEURAL NETWORKS AND GPUS. Julie Bernauer
DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int
More informationParallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem
Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem Raphaël Couturier, Michel Salomon FEMTO-ST - DISC Department - AND Team November 2 & 3, 2015 / Besançon
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationOpenCL on FPGAs - Creating custom accelerated solutions
OpenCL on FPGAs - Creating custom accelerated solutions Manuel Greisinger Channel Manager, Central & Eastern Europe Oct 13 th, 2015 ESSEI Technology Day, Gilching, Germany Industry Trends Increasing product
More informationConvolutional Neural Networks
Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
More informationConvolutional Deep Belief Networks on CIFAR-10
Convolutional Deep Belief Networks on CIFAR-10 Alex Krizhevsky kriz@cs.toronto.edu 1 Introduction We describe how to train a two-layer convolutional Deep Belief Network (DBN) on the 1.6 million tiny images
More informationNeural Computer Architectures
Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationDEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017
DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More information6. Convolutional Neural Networks
6. Convolutional Neural Networks CS 519 Deep Learning, Winter 2017 Fuxin Li With materials from Zsolt Kira Quiz coming up Next Thursday (2/2) 20 minutes Topics: Optimization Basic neural networks No Convolutional
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationCNP: An FPGA-based Processor for Convolutional Networks
Clément Farabet clement.farabet@gmail.com Computational & Biological Learning Laboratory Courant Institute, NYU Joint work with: Yann LeCun, Cyril Poulet, Jefferson Y. Han Now collaborating with Eugenio
More informationDeep Learning for Remote Sensing
1 ENPC Data Science Week Deep Learning for Remote Sensing Alexandre Boulch 2 ONERA Research, Innovation, expertise and long-term vision for industry, French government and Europe 3 Materials Optics Aerodynamics
More informationNvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationUnified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine
More informationFuture Computer Vision Algorithms for Traffic Sign Recognition Systems
Future Computer Vision Algorithms for Traffic Sign Recognition Systems Dr. Stefan Eickeler Future of Traffic Sign Recognition Triangular Signs Complex Signs All Circular Signs Recognition of Circular Traffic
More informationIntroduction to Neural Networks
Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold
More informationDeploying Deep Learning Networks to Embedded GPUs and CPUs
Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train
More informationIntel PSG (Altera) Enabling the SKA Community. Lance Brown Sr. Strategic & Technical Marketing Mgr.
Intel PSG (Altera) Enabling the SKA Community Lance Brown Sr. Strategic & Technical Marketing Mgr. lbrown@altera.com, 719-291-7280 Agenda Intel Programmable Solutions Group (Altera) PSG s COTS Strategy
More informationMocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT
Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision:
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationCPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia
CPSC340 State-of-the-art Neural Networks Nando de Freitas November, 2012 University of British Columbia Outline of the lecture This lecture provides an overview of two state-of-the-art neural networks:
More informationProfiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang
Profiling the Performance of Binarized Neural Networks Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang 1 Outline Project Significance Prior Work Research Objectives Hypotheses Testing Framework
More informationFINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (NTNU & Xilinx Research Labs Ireland) in collaboration with N Fraser, G Gambardella, M Blott, P Leong, M Jahre and
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationENVISION: A 0.26-to-10 TOPS/W Subword-Parallel Dynamic- Voltage-Accuracy-Frequency- Scalable CNN Processor in 28nm FDSOI
ENVISION: A 0.26-to-10 TOPS/W Subword-Parallel Dynamic- Voltage-Accuracy-Frequency- Scalable CNN Processor in 28nm FDSOI Bert oons, Roel Uytterhoeven, Wim Dehaene, arian Verhelst ESAT/ICAS - KU Leuven
More informationComputer Vision: Making machines see
Computer Vision: Making machines see Roberto Cipolla Department of Engineering http://www.eng.cam.ac.uk/~cipolla/people.html http://www.toshiba.eu/eu/cambridge-research- Laboratory/ Vision: what is where
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationImage Classification using Fast Learning Convolutional Neural Networks
, pp.50-55 http://dx.doi.org/10.14257/astl.2015.113.11 Image Classification using Fast Learning Convolutional Neural Networks Keonhee Lee 1 and Dong-Chul Park 2 1 Software Device Research Center Korea
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A
More information3D Wafer Scale Integration: A Scaling Path to an Intelligent Machine
3D Wafer Scale Integration: A Scaling Path to an Intelligent Machine Arvind Kumar, IBM Thomas J. Watson Research Center Zhe Wan, UCLA Elec. Eng. Dept. Winfried W. Wilcke, IBM Almaden Research Center Subramanian
More informationTwo FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters
Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationBHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques
BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationSpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017
SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary
More informationSpeculations about Computer Architecture in Next Three Years. Jan. 20, 2018
Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization
More informationDeep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.
Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer
More informationAccelerating Neuromorphic Vision Algorithms for Recognition
Accelerating Neuromorphic Vision Algorithms for Recognition Ahmed Al Maashri Michael DeBole Matthew Cotter Nandhini Chandramoorthy Yang Xiao Vijaykrishnan Narayanan Chaitali Chakrabarti *Microsystems Design
More informationDeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,
More informationCS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016
CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)
More informationIntro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn
Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective
More informationCreating Affordable and Reliable Autonomous Vehicle Systems
Creating Affordable and Reliable Autonomous Vehicle Systems Shaoshan Liu shaoshan.liu@perceptin.io Autonomous Driving Localization Most crucial task of autonomous driving Solutions: GNSS but withvariations,
More informationConvolutional Neural Network Layer Reordering for Acceleration
R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationSimplifying ConvNets for Fast Learning
Simplifying ConvNets for Fast Learning Franck Mamalet 1 and Christophe Garcia 2 1 Orange Labs, 4 rue du Clos Courtel, 35512 Cesson-Sévigné, France, franck.mamalet@orange.com 2 LIRIS, CNRS, Insa de Lyon,
More information