Low-Power Neural Processor for Embedded Human and Face detection

Size: px
Start display at page:

Download "Low-Power Neural Processor for Embedded Human and Face detection"

Transcription

1 Low-Power Neural Processor for Embedded Human and Face detection Olivier Brousse 1, Olivier Boisard 1, Michel Paindavoine 1,2, Jean-Marc Philippe, Alexandre Carbon (1) GlobalSensing Technologies (GST) Dijon, France (2) LEAD Université de Bourgogne CNRS, Dijon, France (3) DACLE - CEA LIST Nano-innov, Palaiseau, France June 23th 2016 NeuroSTIC O. Brousse 1

2 Introduction An optimization of performance vs complexity consists in bio-inspired Human Vision performances in words of detection and recognition: Simple Tasks with Human Brain vs Von Neuman Computer (like PC): - Recognizes in less than one second this image: - But Calculates in less than one second ( x =?) Artificial vision model proposal for embedded systems: - Arithmetic calculations used in image filtering for example: -> Von Neuman (or Harvard) architectures - Object recognition from natural images: ->Neuro-inspired Human intelligence: Artificial Intel. on Silicon June 23th 2016 NeuroSTIC O. Brousse 2

3 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 3

4 Deep Neural Network Models ImageNet classification (Hinton s team, hired by Google) 1.2 million high res images, 1,000 different classes Top-5 17% error rate (huge improvement) Learned features on first layer Facebook s DeepFace Program (labs head: Y. LeCun) 4 million images, 4,000 identities 97.25% accuracy, vs % human performance June 23th 2016 NeuroSTIC O. Brousse 4

5 State-of-the-art in Recognition Database # Images # Classes Best score MNIST Handwritten digits 60, , % [3] GTSRB Traffic sign CIFAR-10 airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck ~ 50, % [4] 50, ,000 State-of-the-art are Deep Neural Networks every time % [5] Caltech-101 ~ 50, % [6] ImageNet ~ 1,000,000 1,000 Top-5 83% [1] DeepFace ~ 4,000,000 4, % [2] June 23th 2016 NeuroSTIC O. Brousse 5 INCREASING COMPLEXITY

6 CNNs Organization Deep = number of layers >> 1 June 23th 2016 NeuroSTIC O. Brousse 6

7 State-of-the-art CNN Example The German Traffic Sign Recognition Benchmark (GTSRB) 43 traffic sign types > 50,000 images Neurons: 287,843 Synapses: 1,388,800 Total memory: 1.5MB (with 8 bits synapses) Connections: 124,121,800 [3] D. Ciresan, U. Meier, J. Masci, J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural Networks (32), pp , 2012 Near human recognition (> 98%) [3] June 23th 2016 BioInspried Low-Power - O. Brousse 7

8 An other Neuro-Inspired Model: Hmax (a NeuroScience Approach) Hmax Model: Serre et al, IEEE PAMI 2007 Poggio et al., J Neurophysiol, 2007 June 23th 2016 NeuroSTIC O. Brousse 8

9 Neuro-Inspired Models: The Hmax S1 layer using Gabor filters June 23th 2016 NeuroSTIC O. Brousse 9

10 Neuro-Inspired Models: The Hmax Original Image Gabor Filters June 23th 2016 NeuroSTIC O. Brousse 10

11 Original Image Gabor Filters BioInspried Low-Power - O. Brousse 11 June 23th 2016

12 Neuro-Inspired Models: The Hmax Hmax Model performances June 23th 2016 NeuroSTIC O. Brousse 12

13 Hmax accelerator: Complexity 64 Gabor Filters 1 Mpixels Image complexity: S1: Optimized Gabor Filters: 2.9 GMAC C1: Max: 0.13 GOP RBF Neural Network : 0.4 GOP One IP camera 1M 30 fps: 103 GOP/sec Total: 3.43 GMAC & OP June 23th 2016 NeuroSTIC O. Brousse 13

14 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 14

15 Pneuro accelerator (Joint Laboratory CEA & GST initiated in 2013) Objective: Designing a processor integrating within the same chip signal processing functions and neuronal functions: Hmax, CNN Data In (Signals, Images) Cluster NeuroCores Cluster NeuroCores Cluster NeuroCores Classification Result From Previous NeuroDSP PNeuro: A Cascadable Parallel Architecture To Next NeuroDSP June 23th 2016 NeuroSTIC O. Brousse 15

16 PNeuro accelerator overview June 23th 2016 NeuroSTIC O. Brousse 16

17 PNeuro accelerator: Main Specifications - Programmable NeuroCores, each can perform image/signal processing and neural functions - Optimized for MAC and Neural operations - Signal processing: convolution filters, etc. - Neural functions: weighted inputs sum - Can perform non-linear operations (maximas, tangh, ) - 1 NeuroCore represents 1 neuron - NeuroCores can be time multiplexed for implementing bigger networks - Optimized memory accesses for data locality and reuse Variable number of clusters to accommodate different application domains and related performances June 23th 2016 NeuroSTIC O. Brousse 17

18 PNeuro accelerator: Performances Profiling result: based on FDSOI 28 nm technology One cluster of 4 1GHz: 32 GMAC/sec with 70mW power consumption, including memories and the controller 32 1GHz: 1024 GMAC/sec 2.2W Energy Efficiency: 465 GMAC.s -1 /W Full Hmax One IP camera 1M 30 fps: 103 GOP/sec Needs 4 clusters of 4 Neuro-Cores (sup[103/32]) 280mW June 23th 2016 NeuroSTIC O. Brousse 18

19 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 19

20 Face Detection Application Example (1/2) June 23th 2016 NeuroSTIC O. Brousse 20

21 Face Detection Application Example (2/2) Complexity Calculation divided by 8 (merge 8 scales) For one camera 1M 30 fps: 12.9 GOP.sec -1 (103 GOP.sec -1 /8) Needs One Cluster with 2 NeuroCores: Power consumption < 35mW For a VGA 30 fps only 1 NeuroCore: < 20 mw June 23th 2016 NeuroSTIC O. Brousse 21

22 Human detection: Hmin 64 Gabor Filters (7x7 to 37x37) + Original Image Local Maxima (C1) C1 Output Classification with RBF Neural Network (S2, C2) June 23th 2016 NeuroSTIC O. Brousse 22

23 Human Detection Application Example June 23th 2016 NeuroSTIC O. Brousse 23

24 Human Detection Application Example June 23th 2016 NeuroSTIC O. Brousse 24

25 Human Detection Application Example S1 Layer Gabor Filters C1 Layer Max Pooling RBF Classification Human Detected In order to reduce complexity, optimization from Masquelier et al (Plos Computational Biology 2007): 5 images scales (1, 0.7, 0.5, 0.35 and 0.25) 4 orientations One Gabor filter (15x15) per scale and per orientation 20 Gabor Filters 1 Mpixels Image complexity: S1: Optimized Gabor Filters: 2.9 GMAC 0.6 GMAC C1: Max: 0.1 GOP RBF Neural Network : 0.3 GOP Total: 3.43 GMAC & OP 1 GMAC & OP June 23th 2016 NeuroSTIC O. Brousse 25

26 Human Detection Application Example Complexity Calculation divided by 3.43: Original Hmax One IP camera 1M 30 fps: 103 GOP/sec Optimized Hmax One IP camera 1M 30 fps: 30 GOP/sec Using FDSOI28 technology, one cluster of 4 1GHz: 32 GMAC/sec with 70 mw power consumption Optimized Hmax needs 4 Neuro-Cores for one IP camera 1M 30fps: Power consumption 70mW For a VGA 30 fps only 2 Neuro-Cores : 35mW June 23th 2016 NeuroSTIC O. Brousse 26

27 Human Detection Application Examples June 23th 2016 NeuroSTIC O. Brousse 27

28 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 28

29 Conclusion PNeuro architecture optimized for NeuroInspired algorithms: Hmax, Convolutional Neural Network and more generally Deep Neural Networks PNeuro: A cascadable parallel architecture Performances in FDSOI 28nm allow to consider embedded applications with a very low power consumption: Face Detection needs only 20mW for a VGA image@30fps Human Detection needs only 35mW for a VGA image@30fps PNeuro implemented also on FPGA June 23th 2016 NeuroSTIC O. Brousse 29

30 PNeuro on FPGA First demonstration on a FPGA-based PNeuro Single cluster configuration (4 Neuro-Cores) Embedded CNN application (60 neurons on the hidden layer, 450 KOps) Faces extraction, images on the database, 96% recognition rate Same application ported on 5 different architectures Embedded CPU: Raspberry PI 2 B, Odroid Xu3 Embedded GPU: NVidia Tegra K1 (batch) Desktop CPU: Intel I7 PNeuro, Quad Neuro-Cores Using a in-house prototyping board Target Frequency Energy efficiency Intel I7 (CPU) 3400 MHz 160 images/w Quad ARM A15 (CPU) 2000 MHz 350 images/w Quad ARM A7 (CPU) 900 MHz 380 images/w Tegra K1 (GPU) 850 MHz 600 images/w PNeuro (FPGA) 100 MHz 2000 images/w FPGA approach is already competitive with existing CPU & GPU solutions First FPGA product developed for early 2017 by GST Embedded FPGA: Artix 100 (~1W), 17.6cm² for the board, including one cluster June 23th 2016 NeuroSTIC O. Brousse 30

31 Article in EETimes Embedded WORLD demonstration (feb 2016) June 23th 2016 NeuroSTIC O. Brousse 31

32 Merci! June 23th 2016 NeuroSTIC O. Brousse 32

GST, from Sensor to Decision

GST, from Sensor to Decision GST, from Sensor to Decision Videos Signals Mul2Media sound, images 2D, 3D,.. GST: Created in september 2011 12 persons (10 in R&D and 2 in Business) Technology based on neuro-inspired algorithms: E m

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Dr.-Ing Jens Benndorf (DCT) Gregor Schewior (DCT) A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Tensilica Day 2017 16th

More information

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th

Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th Today s Story Why does CNN matter to the embedded world? How to enable CNN in

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

direct hardware mapping of cnns on fpga-based smart cameras

direct hardware mapping of cnns on fpga-based smart cameras direct hardware mapping of cnns on fpga-based smart cameras Workshop on Architecture of Smart Cameras Kamel ABDELOUAHAB, Francois BERRY, Maxime PELCAT, Jocelyn SEROT, Jean-Charles QUINTON Cordoba, June

More information

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs

More information

Traffic Sign Localization and Classification Methods: An Overview

Traffic Sign Localization and Classification Methods: An Overview Traffic Sign Localization and Classification Methods: An Overview Ivan Filković University of Zagreb Faculty of Electrical Engineering and Computing Department of Electronics, Microelectronics, Computer

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

Outline GF-RNN ReNet. Outline

Outline GF-RNN ReNet. Outline Outline Gated Feedback Recurrent Neural Networks. arxiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

DIGITS DEEP LEARNING GPU TRAINING SYSTEM

DIGITS DEEP LEARNING GPU TRAINING SYSTEM DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection, Localization,

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Deep Learning & Neural Networks

Deep Learning & Neural Networks Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural

More information

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,

More information

An introduction to Machine Learning silicon

An introduction to Machine Learning silicon An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional

More information

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,

More information

DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM

DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM DEEP LEARNING AND DIGITS DEEP LEARNING GPU TRAINING SYSTEM AGENDA 1 Introduction to Deep Learning 2 What is DIGITS 3 How to use DIGITS Practical DEEP LEARNING Examples Image Classification, Object Detection,

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for

More information

Machine Learning on VMware vsphere with NVIDIA GPUs

Machine Learning on VMware vsphere with NVIDIA GPUs Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology

More information

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm

More information

Deep Learning Processing Technologies for Embedded Systems. October 2018

Deep Learning Processing Technologies for Embedded Systems. October 2018 Deep Learning Processing Technologies for Embedded Systems October 2018 1 Neural Networks Architecture Single Neuron DNN Multi Task NN Multi-Task Vehicle Detection With Region-of-Interest Voting Popular

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just

More information

In-memory computing with emerging memory devices

In-memory computing with emerging memory devices In-memory computing with emerging memory devices Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico di Milano daniele.ielmini@polimi.it Emerging memory devices 2 Resistive switching

More information

Neurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University

Neurmorphic Architectures. Kenneth Rice and Tarek Taha Clemson University Neurmorphic Architectures Kenneth Rice and Tarek Taha Clemson University Historical Highlights Analog VLSI Carver Mead and his students pioneered the development avlsi technology for use in neural circuits

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Deep Learning with Intel DAAL

Deep Learning with Intel DAAL Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

Object Recognition II

Object Recognition II Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Minimizing Computation in Convolutional Neural Networks

Minimizing Computation in Convolutional Neural Networks Minimizing Computation in Convolutional Neural Networks Jason Cong and Bingjun Xiao Computer Science Department, University of California, Los Angeles, CA 90095, USA {cong,xiao}@cs.ucla.edu Abstract. Convolutional

More information

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174! Exams ECS 189 WEB PROGRAMMING! If you are satisfied with your scores on the two midterms, you can skip the final! As soon as your Photobooth and midterm are graded, I can give you your course grade (so

More information

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

From Maxout to Channel-Out: Encoding Information on Sparse Pathways From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

Small is the New Big: Data Analytics on the Edge

Small is the New Big: Data Analytics on the Edge Small is the New Big: Data Analytics on the Edge An overview of processors and algorithms for deep learning techniques on the edge Dr. Abhay Samant VP Engineering, Hiller Measurements Adjunct Faculty,

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer DEEP NEURAL NETWORKS AND GPUS Julie Bernauer GPU Computing GPU Computing Run Computations on GPUs x86 CUDA Framework to Program NVIDIA GPUs A simple sum of two vectors (arrays) in C void vector_add(int

More information

Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem

Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem Parallelization and optimization of the neuromorphic simulation code. Application on the MNIST problem Raphaël Couturier, Michel Salomon FEMTO-ST - DISC Department - AND Team November 2 & 3, 2015 / Besançon

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

OpenCL on FPGAs - Creating custom accelerated solutions

OpenCL on FPGAs - Creating custom accelerated solutions OpenCL on FPGAs - Creating custom accelerated solutions Manuel Greisinger Channel Manager, Central & Eastern Europe Oct 13 th, 2015 ESSEI Technology Day, Gilching, Germany Industry Trends Increasing product

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

Convolutional Deep Belief Networks on CIFAR-10

Convolutional Deep Belief Networks on CIFAR-10 Convolutional Deep Belief Networks on CIFAR-10 Alex Krizhevsky kriz@cs.toronto.edu 1 Introduction We describe how to train a two-layer convolutional Deep Belief Network (DBN) on the 1.6 million tiny images

More information

Neural Computer Architectures

Neural Computer Architectures Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017 DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

6. Convolutional Neural Networks

6. Convolutional Neural Networks 6. Convolutional Neural Networks CS 519 Deep Learning, Winter 2017 Fuxin Li With materials from Zsolt Kira Quiz coming up Next Thursday (2/2) 20 minutes Topics: Optimization Basic neural networks No Convolutional

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

CNP: An FPGA-based Processor for Convolutional Networks

CNP: An FPGA-based Processor for Convolutional Networks Clément Farabet clement.farabet@gmail.com Computational & Biological Learning Laboratory Courant Institute, NYU Joint work with: Yann LeCun, Cyril Poulet, Jefferson Y. Han Now collaborating with Eugenio

More information

Deep Learning for Remote Sensing

Deep Learning for Remote Sensing 1 ENPC Data Science Week Deep Learning for Remote Sensing Alexandre Boulch 2 ONERA Research, Innovation, expertise and long-term vision for industry, French government and Europe 3 Materials Optics Aerodynamics

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

Future Computer Vision Algorithms for Traffic Sign Recognition Systems

Future Computer Vision Algorithms for Traffic Sign Recognition Systems Future Computer Vision Algorithms for Traffic Sign Recognition Systems Dr. Stefan Eickeler Future of Traffic Sign Recognition Triangular Signs Complex Signs All Circular Signs Recognition of Circular Traffic

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Deploying Deep Learning Networks to Embedded GPUs and CPUs Deploying Deep Learning Networks to Embedded GPUs and CPUs Rishu Gupta, PhD Senior Application Engineer, Computer Vision 2015 The MathWorks, Inc. 1 MATLAB Deep Learning Framework Access Data Design + Train

More information

Intel PSG (Altera) Enabling the SKA Community. Lance Brown Sr. Strategic & Technical Marketing Mgr.

Intel PSG (Altera) Enabling the SKA Community. Lance Brown Sr. Strategic & Technical Marketing Mgr. Intel PSG (Altera) Enabling the SKA Community Lance Brown Sr. Strategic & Technical Marketing Mgr. lbrown@altera.com, 719-291-7280 Agenda Intel Programmable Solutions Group (Altera) PSG s COTS Strategy

More information

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision:

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia CPSC340 State-of-the-art Neural Networks Nando de Freitas November, 2012 University of British Columbia Outline of the lecture This lecture provides an overview of two state-of-the-art neural networks:

More information

Profiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang

Profiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang Profiling the Performance of Binarized Neural Networks Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang 1 Outline Project Significance Prior Work Research Objectives Hypotheses Testing Framework

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (NTNU & Xilinx Research Labs Ireland) in collaboration with N Fraser, G Gambardella, M Blott, P Leong, M Jahre and

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

ENVISION: A 0.26-to-10 TOPS/W Subword-Parallel Dynamic- Voltage-Accuracy-Frequency- Scalable CNN Processor in 28nm FDSOI

ENVISION: A 0.26-to-10 TOPS/W Subword-Parallel Dynamic- Voltage-Accuracy-Frequency- Scalable CNN Processor in 28nm FDSOI ENVISION: A 0.26-to-10 TOPS/W Subword-Parallel Dynamic- Voltage-Accuracy-Frequency- Scalable CNN Processor in 28nm FDSOI Bert oons, Roel Uytterhoeven, Wim Dehaene, arian Verhelst ESAT/ICAS - KU Leuven

More information

Computer Vision: Making machines see

Computer Vision: Making machines see Computer Vision: Making machines see Roberto Cipolla Department of Engineering http://www.eng.cam.ac.uk/~cipolla/people.html http://www.toshiba.eu/eu/cambridge-research- Laboratory/ Vision: what is where

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Image Classification using Fast Learning Convolutional Neural Networks

Image Classification using Fast Learning Convolutional Neural Networks , pp.50-55 http://dx.doi.org/10.14257/astl.2015.113.11 Image Classification using Fast Learning Convolutional Neural Networks Keonhee Lee 1 and Dong-Chul Park 2 1 Software Device Research Center Korea

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

3D Wafer Scale Integration: A Scaling Path to an Intelligent Machine

3D Wafer Scale Integration: A Scaling Path to an Intelligent Machine 3D Wafer Scale Integration: A Scaling Path to an Intelligent Machine Arvind Kumar, IBM Thomas J. Watson Research Center Zhe Wan, UCLA Elec. Eng. Dept. Winfried W. Wilcke, IBM Almaden Research Center Subramanian

More information

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier

More information

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017 SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary

More information

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018 Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Accelerating Neuromorphic Vision Algorithms for Recognition

Accelerating Neuromorphic Vision Algorithms for Recognition Accelerating Neuromorphic Vision Algorithms for Recognition Ahmed Al Maashri Michael DeBole Matthew Cotter Nandhini Chandramoorthy Yang Xiao Vijaykrishnan Narayanan Chaitali Chakrabarti *Microsystems Design

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Creating Affordable and Reliable Autonomous Vehicle Systems

Creating Affordable and Reliable Autonomous Vehicle Systems Creating Affordable and Reliable Autonomous Vehicle Systems Shaoshan Liu shaoshan.liu@perceptin.io Autonomous Driving Localization Most crucial task of autonomous driving Solutions: GNSS but withvariations,

More information

Convolutional Neural Network Layer Reordering for Acceleration

Convolutional Neural Network Layer Reordering for Acceleration R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Simplifying ConvNets for Fast Learning

Simplifying ConvNets for Fast Learning Simplifying ConvNets for Fast Learning Franck Mamalet 1 and Christophe Garcia 2 1 Orange Labs, 4 rue du Clos Courtel, 35512 Cesson-Sévigné, France, franck.mamalet@orange.com 2 LIRIS, CNRS, Insa de Lyon,

More information