Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th

Similar documents
Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Convolutional Neural Networks

Object Detection. Part1. Presenter: Dae-Yong

Deep Learning & Neural Networks

Dynamic Routing Between Capsules

Face Recognition A Deep Learning Approach

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Deep Learning for Computer Vision II

Fuzzy Set Theory in Computer Vision: Example 3

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Spatial Localization and Detection. Lecture 8-1

CNN Basics. Chongruo Wu

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Binary Convolutional Neural Network on RRAM

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Deep Learning with Tensorflow AlexNet

Inception Network Overview. David White CS793

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

A performance comparison of Deep Learning frameworks on KNL

Safety verification for deep neural networks

High Performance Computing

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Computer Vision Lecture 16

Revolutionizing the Datacenter

Study of Residual Networks for Image Recognition

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Deep Neural Networks:

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

DL Tutorial. Xudong Cao

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CS 523: Multimedia Systems

Using Machine Learning for Classification of Cancer Cells

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Low-Power Neural Processor for Embedded Human and Face detection

Deep Residual Learning

Fuzzy Set Theory in Computer Vision: Example 3, Part II

MoonRiver: Deep Neural Network in C++

All You Want To Know About CNNs. Yukun Zhu

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong

POINT CLOUD DEEP LEARNING

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

C-Brain: A Deep Learning Accelerator

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Convolutional Neural Networks

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

6. Convolutional Neural Networks

ConvolutionalNN's... ConvNet's... deep learnig

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

Object Detection Based on Deep Learning

Deep Learning and Its Applications

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Keras: Handwritten Digit Recognition using MNIST Dataset

CSE 559A: Computer Vision

Deconvolution Networks

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Como funciona o Deep Learning

Convolutional Neural Networks for Facial Expression Recognition

Brainchip OCTOBER

Content-Based Image Recovery

Computer Vision Lecture 16

Classification of objects from Video Data (Group 30)

Joint Object Detection and Viewpoint Estimation using CNN features

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

CNN optimization. Rassadin A

Computer Vision Lecture 16

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Supplementary material for Analyzing Filters Toward Efficient ConvNet

INTRODUCTION TO DEEP LEARNING

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Machine Learning Workshop

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Fei-Fei Li & Justin Johnson & Serena Yeung

Deep Face Recognition. Nathan Sun

Investigation of Machine Learning Algorithm Compared to Fuzzy Logic in Wild Fire Smoke Detection Applications

Convolution Neural Networks for Chinese Handwriting Recognition

Project 3 Q&A. Jonathan Krause

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Unsupervised learning in Vision

Lecture 12: Model Serving. CSE599W: Spring 2018

arxiv: v1 [cs.lg] 31 Oct 2018

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Machine Learning 13. week

The exam is closed book, closed notes except your one-page cheat sheet.

Bilevel Sparse Coding

SHIV SHAKTI International Journal in Multidisciplinary and Academic Research (SSIJMAR) Vol. 7, No. 2, April 2018 (ISSN )

Video Object Segmentation using Deep Learning

3D model classification using convolutional neural network

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Transcription:

Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th

Today s Story Why does CNN matter to the embedded world? How to enable CNN in embedded devices? Complexity vs. performance tradeoff? Concluding thoughts 2 2016 Cadence Design Systems, Inc. All rights reserved.

Why does CNN matter to the embedded world?

What is the semiconductor industry asking for? We are trying to motivate bridging the gap between academic developments and the semiconductor industry needs 4 2016 Cadence Design Systems, Inc. All rights reserved.

CNN is growing fast Today s deep learning industry motto is Bigger is Better Net name Layers Parameters MAC LeNet-5 for MNIST (1998) 7 58,996 77,484 M ImageNet (2012) 8 60 M 1.1 G Deepface (2014) 8 >120 M 1.4 G Ensemble CNN(2014) 16x20 23 M 1.4 G FaceNet (2014) 22 140 M 1.6 G VGG for face (2015) 37 29.7 M 15.5 G Embedded devices power, price, and form-factor requirements can not accommodate this trend Dr. Stephen Hicks, Nuffield Department of Clinical Neurosciences, University of Oxford 5 2016 Cadence Design Systems, Inc. All rights reserved.

Complexity vs Performance Performance Cloud Budget Target recognition rate Current Industry Trend Complexity 6 2016 Cadence Design Systems, Inc. All rights reserved. Embedded Device Budget

Methodology Step 1. Train a network to your heart s desire Use the tool/programming language you are most familiar with (e.g., caffe, tensor flow, matcovnet, theano, CNTK, torch, ) Step 2. Iteratively reduce the key parameters Utilize: Statistics Linear algebra Use the validation set as guidance for convergence 7 2016 Cadence Design Systems, Inc. All rights reserved.

Balancing performance and complexity

How to do it? Reducing number of feature maps AlexNet has up to 384 feature maps. Reducing network depth Oxford VGG face recognition network has 37 layers. Divide and Conquer Do we need them all? Are they all equally important? How many classes should a single network handle? What can a single device see? 9 2016 Cadence Design Systems, Inc. All rights reserved.

Feature Map Reduction

Feature Maps Number of Feature Maps = Number of 3-D filters These filters are highly correlated. Input feature maps/images Output feature maps Feature Map Feature Map Convolutional Layer 11 2016 Cadence Design Systems, Inc. All rights reserved.

Redundancies in Filter Weights Q: How many filters do we need? A: How many independent filters can we really have? Back to linear algebra 101 12 2016 Cadence Design Systems, Inc. All rights reserved.

Linear Algebra 101 The space spanned by a vector of size N is referred to as N - dimensional space v3 c 3 a i Any vector in this space can be expressed as a linear combination of the basis of the space v2 c 2 c 1 v1 It can be shown that there is only N independent basis in the space a i = j=1 N c j v j 13 2016 Cadence Design Systems, Inc. All rights reserved.

Linear Algebra 101 (cont d) Q: How to find v j s? A: SVD (Singular Value Decomposition) M N M M N N N M 0 A U Σ V T A = UΣ V T 14 2016 Cadence Design Systems, Inc. All rights reserved.

Eigen-basis Size Reduction for the Convolutional Filters The i-th convolutional layer C i is of size L i W i H i K i L i, W i, H i : 3D filter size K i : number of feature maps Form the filter coefficients into an matrix A of size M K i Apply SVD decomposition: A = UΣ V T Use the M dominating basis (associated with the M largest singular vales) to initialize the filter at C i Size Reduction 15 2016 Cadence Design Systems, Inc. All rights reserved.

Results Analysis Traffic Sign Recognition German Traffic Sign Recognition Benchmark (GTSRB) 51840 images of German road signs in 43 classes Size of Images varies between 15x15 to 222x193 Images grouped by Class and Track with atleast 30 images per track. Images available as Color Images (RGB), HOG features, Haar features and Color Histograms. 16 2016 Cadence Design Systems, Inc. All rights reserved.

Results Analysis Traffic Sign Recognition (Cont d) Performance(% Correct Detection) 99.9% 99.7% 99.5% 99.3% 99.1% 98.9% GTSRB Best result achieved by HCNN Baseline (LeNet) Best prev. result Eliminating the redundancy in the coefficients Improve training efficiency and performance simultaneously control performance degradation as a function of complexity reduction 98.7% 1,000,000 10,000,000 100,000,000 1,000,000,000 Complexity (MACs per Frame) Log-scale 17 2016 Cadence Design Systems, Inc. All rights reserved.

Results Analysis Handwriting Recognition The database is provided by Mixed National Institute of Standards and Technology (MNIST) 60,000 training images and 10,000 testing images Black and white images Size normalized to fit in a 20 by 20 pixel box Centered in a 28 by 28 field 18 2016 Cadence Design Systems, Inc. All rights reserved.

Results Analysis Handwriting Recognition (Cont d) Performance (% Correction Detection) 99.20% 99.15% 99.10% 99.05% 99.00% 98.95% MNIST 98.90% 300000 3000000 Complexity (MACs per frame) Log-scale Baseline (LeNet-5) Eliminating the redundancy in the coefficients Improve training efficiency and performance simultaneously control performance degradation as a function of complexity reduction 19 2016 Cadence Design Systems, Inc. All rights reserved.

Network Depth Reduction

Network Depth Q1: Are all layers equally important? Q2: How to utilize the difference among layers importance to reduce complexity? 21 2016 Cadence Design Systems, Inc. All rights reserved.

Are all layers equally important? Define a metric to measure the amount of refinement each layer can contribute. C1 C2 C3 C4 C5 input C6 pool relu relu pool relu pool relu relu dropout Level Quality Metric C5 C1 C2 C3 C4 22 2016 Cadence Design Systems, Inc. All rights reserved.

How to use the layer quality (LQ) metric? Eliminate low LQ layers and distribute their functionality over the entire network. C1 C2 C3 C4 C5 input C6 pool relu relu pool relu pool relu relu dropout 23 2016 Cadence Design Systems, Inc. All rights reserved.

How to use the layer quality (LQ) metric? Eliminate low LQ layers and distribute their functionality over the entire network. This can be tricky due to the nonlinear components of the network. C1 C2 C4 C5 input C6 pool relu pool relu pool relu relu dropout An alternative way to use LQ metric is to allocate the computational resources based on LQ 24 2016 Cadence Design Systems, Inc. All rights reserved.

Divide and Conquer Presenter Name and Title (Arial 16pt) Event Name Location Date

What is this? An animal? Wolf? 26 2016 Cadence Design Systems, Inc. All rights reserved.

What is this? An animal? Wolf? Cute Wolf? 27 2016 Cadence Design Systems, Inc. All rights reserved.

What is this? Barking Husky Howling Wolf 28 2016 Cadence Design Systems, Inc. All rights reserved.

Hierarchical Recognition Concept Not all images are equally hard to recognize Dynamically allocate resources based on the problem difficulty. Let CNN automate the partitioning process. 29 2016 Cadence Design Systems, Inc. All rights reserved.

GTSRB Ideal Traffic Sign Speed Limit Signs Other Prohibitory Signs Derestriction Signs Mandatory Signs Other Prohibitory Signs 30 2016 Cadence Design Systems, Inc. All rights reserved.

Hierarchical-CNN for TSR MC-CNN 1 FC-CNN MC-CNN 2 Level-1: Family Classifier CNN (FC-CNN) Level-2: Member Classifier CNN (MC-CNN)... MC-CNN 5 Using hierarchical CNN we are able to dynamically allocate resources in accordance with the problem difficulty The CNN-oriented family clustering method groups signs into subsets that is more preferable than the pre-defined subset for a CNN classifier. Family 1 Decision Decision Decision Family 2 5+1 different CNNs for classification, each has lower complexity than one-vs-all classifier Family 3 Family 4 Family 5 31 2016 Cadence Design Systems, Inc. All rights reserved.

Results CCR% Team Method 99.82 Cadence HCNN + color 99.65 Tsinghua Hinge Loss Trained CNN 99.46 IDSIA Committee of CNNs 99.30 Cadence Modified UYN 99.22 INI-RTCV Human (best individual) 99.17 Sermanet (UNY) Updated multi-scale CNN 98.84 INI-RTCV Human (average) 98.31 Sermanet (UNY) Multi-scale CNN 96.14 CAOR) Random Forest 95.68 INI-RTCV LDA (HOG 2) 32 2016 Cadence Design Systems, Inc. All rights reserved.

Alternative Network Architecture Hierarchical decision process can be incorporated in the network training if we allow gradual decision after each layer Input Conv Pooling ReLU Conv Pooling ReLU Fully Connected 33 2016 Cadence Design Systems, Inc. All rights reserved.

Closing Thoughts

What is the semiconductor industry asking for? We are trying to motivate bridging the gap between academic developments and the semiconductor industry needs 35 2016 Cadence Design Systems, Inc. All rights reserved.

See you in Vegas! The premier summer conference for the computer vision community Computer Vision Pattern Recognition (CVPR2016) will be held in Vegas from June 27-30. Cadence will be offering a tutorial on low complexity recognition SoCs. Embedded CNN Local vs Cloud Compute, why and why not? Complexity Reduction Techniques 36 2016 Cadence Design Systems, Inc. All rights reserved.