Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Similar documents
Deep Residual Learning

Learning Deep Representations for Visual Recognition

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Fei-Fei Li & Justin Johnson & Serena Yeung

Know your data - many types of networks

Learning Deep Features for Visual Recognition

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Study of Residual Networks for Image Recognition

Convolutional Neural Networks

Inception Network Overview. David White CS793

INTRODUCTION TO DEEP LEARNING

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Fuzzy Set Theory in Computer Vision: Example 3

arxiv: v2 [cs.cv] 26 Jan 2018

Deep Learning with Tensorflow AlexNet

Advanced Video Analysis & Imaging

CNN Basics. Chongruo Wu

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Supplementary material for Analyzing Filters Toward Efficient ConvNet

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

arxiv: v1 [cs.cv] 10 Dec 2015

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Spatial Localization and Detection. Lecture 8-1

Binary Convolutional Neural Network on RRAM

Channel Locality Block: A Variant of Squeeze-and-Excitation

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Computer Vision Lecture 16

Elastic Neural Networks for Classification

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Deep Learning for Computer Vision II

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Machine Learning 13. week

Convolutional Neural Networks and Supervised Learning

All You Want To Know About CNNs. Yukun Zhu

POINT CLOUD DEEP LEARNING

CSE 559A: Computer Vision

NeoNet: Object centric training for image recognition

YOLO9000: Better, Faster, Stronger

Classifying a specific image region using convolutional nets with an ROI mask as input

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

arxiv: v1 [cs.cv] 18 Jan 2018

Convolutional Neural Networks

Tutorial on Machine Learning Tools

Como funciona o Deep Learning

Structured Prediction using Convolutional Neural Networks

Residual Networks for Tiny ImageNet

Deep Neural Networks:

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deeply Cascaded Networks

Cascade Region Regression for Robust Object Detection

Machine Learning. MGS Lecture 3: Deep Learning

ECE 5470 Classification, Machine Learning, and Neural Network Review

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Content-Based Image Recovery

Dynamic Routing Between Capsules

Deep Learning for Computer Vision

Predicting Goal-Scoring Opportunities in Soccer by Using Deep Convolutional Neural Networks. Master s Thesis

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Computer Vision Lecture 16

arxiv: v1 [cs.cv] 4 Dec 2014

Report: Privacy-Preserving Classification on Deep Neural Network

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Multi-Task Self-Supervised Visual Learning

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT

Deep Learning and Its Applications

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Transfer Learning. Style Transfer in Deep Learning

Kaggle Data Science Bowl 2017 Technical Report

arxiv: v1 [cs.cv] 20 Dec 2016

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Using Machine Learning for Classification of Cancer Cells

Computer Vision Lecture 16

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

arxiv: v3 [cs.lg] 30 Dec 2016

A Quick Guide on Training a neural network using Keras.

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance

A performance comparison of Deep Learning frameworks on KNL

Face Recognition A Deep Learning Approach

CrescendoNet: A New Deep Convolutional Neural Network with Ensemble Behavior

CRESCENDONET: A NEW DEEP CONVOLUTIONAL NEURAL NETWORK WITH ENSEMBLE BEHAVIOR

Deep Learning. Architecture Design for. Sargur N. Srihari

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Advanced Introduction to Machine Learning, CMU-10715

Convolu'onal Neural Networks

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Transcription:

Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC) 1

GoogLeNet SLIDE CREDIT:GOOGLE INC 1x1 Convolution Does it make any sense to do 1x1 convolutions? Can we do dimensionality reduction on the depth? 1x1 conv 2

1x1 filters = Average Pooling The key contribution is made by average pooling instead of fully connected layers Average pooling 1x1 Convolutions When # of image channels > # of filters: It s considered as dimension reduction: (row, height, channels) (row, height, filters). When # of image channels = # of filters: Projection onto space of the same dimension with average pooling. It increases non-linearity w/o affecting receptive field. It acts like coordinate-dependent transformation in the filter space. It suffers with less over-fitting due to smaller kernel size (1x1). Another perspective: Fully connected with weight sharing Used in many networks, including GoogLeNet 3

Choice of Modules Which one? Pick them all!! 1x1 conv? 3x3 conv? 5x5 conv? Pooling? Inception Module What to do? Pick them all!! 1x1 conv? 3x3 conv? 5x5 conv? Pooling? Inception module (naïve version) 4

Inception Module Inception module with dimensionality reduction Inception module (naïve version) Inception Module in GoogLeNet 9 inception layers 5

Classification results on ImageNet Team Year Place Error (top-5) Uses external data SuperVision 2012-16.4% no SuperVision 2012 1st 15.3% ImageNet 22k Clarifai 2013-11.7% no Clarifai 2013 1st 11.2% ImageNet 22k MSRA 2014 3rd 7.35% no VGG 2014 2nd 7.32% no GoogLeNet 2014 1st 6.67% no SLIDE CREDIT:GOOGLE INC GoogLeNet Only 5M parameters! (12x fewer than Alex net, 27x fewer than VGG net!) 6

GoogLeNet Optimal local sparse structure using available dense components To capture dense clusters : 1x1 convolutions More spatially spread out clusters captured by 3x3 and 5x5. Pooling layer: Generally improves performance. Outputs of all these are concatenated and passed to next layer Give rise to the (naive) Inception Module Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC) Problem: Training the deeper network is more difficult because of vanishing/exploding gradients problem. Solution: Residual network 7

Residual Network Introduced residual nets against plain nets Reached 3.57% top-5 error rate!! Winners of ImageNet 2015 in all sub-competitions! Residual Network Residual block Appropriate for treating perturbation as keeping a base information Avoid vanishing/exploding gradients problem. 8

Residual Network 17 Difference between an original image and a changed image Preserving base information Some Network residual can treat perturbation Residual Network Shortcuts connections Identity shortcuts Projection shortcuts Using tensorflow: y = # y = F(x, {W i }) x = # x = W s x y = tf.add(y, x) 9

Residual learning building block y ReLU + y conv2d y ReLU y conv2d x 19 Residual Network Code Example def _residual_v1(self, x, kernel_size, in_filter, out_filter, stride): "" Residual unit with 2 sub layers """ with tf.name_scope('residual_v1') as name_scope: orig_x = x x = self._conv(x, kernel_size, out_filter, stride) x = self._batch_norm(x) x = self._relu(x) x = self._conv(x, kernel_size, out_filter, 1) x = self._batch_norm(x) if in_filter!= out_filter: orig_x = self._avg_pool(orig_x, stride, stride) pad = (out_filter - in_filter) // 2 orig_x = tf.pad(orig_x, [[0, 0], [0, 0], [0, 0], [pad, pad]]) x = self._relu(tf.add(x, orig_x)) return x 10

Deep residual network 21 ResNet is an ensemble model? 22 11

Remove a layer? What happens if we remove the second layer? 23 Residual Networks How many layers to stack? Single layer? = 1 12

Network Design 25 Basic design (VGG-style) All 3x3 conv (almost) Spatial size/2 => #filters x2 Batch normalization Simple design, just deep Other remarks No max pooling (almost) No hidden functions No dropout Network Design ResNet-152 Use bottlenecks ResNet-152(11.3 billion FLOPs) has lower complexity than VGG-16/19 nets (15.3/19.6 billion FLOPs) 13

Residual Networks A deeper counterpart (34 layers) A shallow network (18 layers) Degradation problem : A deeper model should not have higher training error Original layers: copied from shallower model Extra layers: set as identity At least the same training error Therefore, Solvers might have difficulties approximating identity mappings by multiple non-linear layers Deep Neural Network 28 Overly deep plain nets have higher training error A general phenomenon, observed in many datasets Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arxiv 2015. 14

Residual Network Deeper ResNets have lower training error Results Deep ResNets can be trained without difficulties Deep ResNets have lower training error, and also lower test error 15

Residual Networks Results 32 1 st places in all five main tracks in ILSVRC & COCO 2015 Competitions ImageNet Classification ImageNet Detection ImageNet Localization COCO Detection COCO Segmentation Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arxiv 2015. 16

Residual Networks Dealing with different dimensions: (A) Zero padding (no extra parameters) (B) Zero padding and 1x1 conv (C) All 1x1 conv Possible Architectures for Residual Blocks AlphaGo uses the first with nonlinear rectifiers 17

Highway Networks We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on information highways,,.,., Transform gate Carry gate For simplicity let 1,.,. 1, Highway Networks [Srivastava et al. 2015] We refer to networks with this architecture as highway networks, since they allow unimpeded information flow across several layers on information highways,.,. 1,,, if, = 0 if, = 1 To be learned via backpropagation, Initialize with negative value (e.g. -3) to have an initial carry behavior (inspired by the preferred LSTM initial bias in bridging) 18

Highway vs. Residual Networks [Srivastava et al. 2015] Highway Nets Residual Nets,.,. 1, Can pass through transform gate, carry gate or a linear combination of them! Can do differently for different features Learn the gate functions in a data driven way Always pass both identity and the transformation Same behavior for all features No data driven approach Extra parameters ( ) Parameter free No improvement with deeper nets Performs way better in practice Highway vs. Residual Networks (CIFAR) Highway nets 19

Densely Connected CNNs Densely Connected CNNs 20

Depth vs. width The authors of residual networks tried to make them as thin as possible in favor of increasing their depth and having less parameters, and even introduced a «bottleneck» block which makes ResNet blocks even thinner. We note, however, that the residual block with identity mapping that allows to train very deep networks is at the same time a weakness of residual networks. As gradient flows through the network there is nothing to force it to go through residual block weights and it can avoid learning anything during training, so it is possible that there is either only a few blocks that learn useful representations, or many blocks share very little information with small contribution to the final goal. 41 Exploring over 1000 layers Test 1202 layers Training is finished Training error is similar Testing error is high because of over fitting 21

Experimental results 8 times faster to train 43 Experimental results 44 22

Wide Net Widening consistently improves performance across residual networks of different depth Increasing both depth and width helps until the number of parameters becomes too high and stronger regularization is needed Wide networks can successfully learn with a 2 or more times larger number of parameters than thin ones, which would require doubling the depth of thin networks, making them infeasibly expensive to train. 45 Conclusions The deeper network can cover more complex problems Receptive field size Non-linearity However, training the deeper network is more difficult because of vanishing/exploding gradients problem. Residual Networks help to avoid such a problem. Wide Networks are an alternative for achieving the same or better high performance. No matter how deeper or wider, keep the total size smaller. 23