Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Similar documents
Category Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Computer Vision Lecture 16

Deep Neural Networks:

Deep Learning for Computer Vision II

Computer Vision Lecture 16

Computer Vision Lecture 16

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Study of Residual Networks for Image Recognition

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Convolutional Neural Networks

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Deep Learning with Tensorflow AlexNet

Spatial Localization and Detection. Lecture 8-1

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Introduction to Neural Networks

ECE 6504: Deep Learning for Perception

Deconvolutions in Convolutional Neural Networks

Learning Deep Representations for Visual Recognition

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Machine Learning. MGS Lecture 3: Deep Learning

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Learning Deep Features for Visual Recognition

Know your data - many types of networks

CNN Basics. Chongruo Wu

ImageNet Classification with Deep Convolutional Neural Networks

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance

Object detection with CNNs

Introduction to Neural Networks

Como funciona o Deep Learning

Machine Learning 13. week

Seminars in Artifiial Intelligenie and Robotiis

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Lecture 37: ConvNets (Cont d) and Training

Object Recognition II

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Image and Video Understanding

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

KamiNet A Convolutional Neural Network for Tiny ImageNet Challenge

CS 523: Multimedia Systems

Fei-Fei Li & Justin Johnson & Serena Yeung

Classification of objects from Video Data (Group 30)

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

Deep Learning & Neural Networks

INTRODUCTION TO DEEP LEARNING

Object Detection Based on Deep Learning

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Global Optimality in Neural Network Training

Tiny ImageNet Visual Recognition Challenge

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

arxiv: v1 [cs.cv] 4 Dec 2014

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

CSE 559A: Computer Vision

Convolu'onal Neural Networks

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning for Vision: Tricks of the Trade

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Channel Locality Block: A Variant of Squeeze-and-Excitation

Advanced Introduction to Machine Learning, CMU-10715

Yiqi Yan. May 10, 2017

Real-time convolutional networks for sonar image classification in low-power embedded systems

Fuzzy Set Theory in Computer Vision: Example 3

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

11. Neural Network Regularization

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Advanced Video Analysis & Imaging

ConvolutionalNN's... ConvNet's... deep learnig

Elastic Neural Networks for Classification

Learning-based Methods in Vision

Rotation Invariance Neural Network

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

arxiv: v2 [cs.cv] 30 Oct 2018

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Recurrent Convolutional Neural Networks for Scene Labeling

Rich feature hierarchies for accurate object detection and semantic segmentation

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Intro to Deep Learning for Computer Vision

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Deep Learning for Object detection & localization

Weighted Convolutional Neural Network. Ensemble.

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Return of the Devil in the Details: Delving Deep into Convolutional Nets

arxiv: v1 [cs.cv] 26 Jun 2017

All You Want To Know About CNNs. Yukun Zhu

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Transcription:

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech

Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN

Image Categorization: Training phase Training Images Training Training Labels Image Features Classifier Training Trained Classifier

Image Categorization: Testing phase Training Images Training Training Labels Image Features Classifier Training Trained Classifier Test Image Testing Image Features Trained Classifier Prediction Outdoor

Features are the Keys SIFT [Loewe IJCV 04] HOG [Dalal and Triggs CVPR 05] SPM [Lazebnik et al. CVPR 06] DPM [Felzenszwalb et al. PAMI 10] Color Descriptor [Van De Sande et al. PAMI 10]

Learning a Hierarchy of Feature Extractors Each layer of hierarchy extracts features from output of previous layer All the way from pixels classifier Layers have the (nearly) same structure Image/video Layer 1 Layer 2 Layer 3 Labels

Biological neuron and Perceptrons A biological neuron An artificial neuron (Perceptron) - a linear classifier

Simple, Complex and Hypercomplex cells David H. Hubel and Torsten Wiesel Suggested a hierarchy of feature detectors in the visual cortex, with higher level features responding to patterns of activation in lower level cells, and propagating activation upwards to still higher level cells. David Hubel's Eye, Brain, and Vision

Hubel/Wiesel Architecture and Multi-layer Neural Network Hubel and Weisel s architecture Multi-layer Neural Network - A non-linear classifier

Multi-layer Neural Network A non-linear classifier Training: find network weights w to minimize the error between true training labels y i and estimated labels f w x i Minimization can be done by gradient descent provided f is differentiable This training method is called back-propagation

Convolutional Neural Networks Also known as CNN, ConvNet, DCN CNN = a multi-layer neural network with 1. Local connectivity 2. Weight sharing

CNN: Local Connectivity Hidden layer Input layer Global connectivity Local connectivity # input units (neurons): 7 # hidden units: 3 Number of parameters Global connectivity: 3 x 7 = 21 Local connectivity: 3 x 3 = 9

CNN: Weight Sharing Hidden layer w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 w 1 w 2 w 3 w 1 w 2 w 3 w 1 w 2 w 3 Input layer Without weight sharing With weight sharing # input units (neurons): 7 # hidden units: 3 Number of parameters Without weight sharing: 3 x 3 = 9 With weight sharing : 3 x 1 = 3

CNN with multiple input channels Hidden layer Input layer Channel 1 Channel 2 Single input channel Multiple input channels Filter weights Filter weights

CNN with multiple output maps Hidden layer Map 1 Map 2 Input layer Single output map Multiple output maps Filter 1 Filter 2 Filter weights Filter weights

Putting them together Local connectivity Weight sharing Handling multiple input channels Handling multiple output maps Weight sharing Local connectivity # input channels # output (activation) maps Image credit: A. Karpathy

Neocognitron [Fukushima, Biological Cybernetics 1980] Deformation-Resistant Recognition S-cells: (simple) - extract local features C-cells: (complex) - allow for positional errors

LeNet [LeCun et al. 1998] Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-1 from 1993

What is a Convolution? Weighted moving sum... Input Feature Activation Map slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned)... Input Image Input Feature Map slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Max pooling Spatial pooling Non-linearity Convolution (Learned) Input Image Max-pooling: a non-linear down-sampling Provide translation invariance slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Feature Maps Feature Maps After Contrast Normalization Convolution (Learned) Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Engineered vs. learned features Convolutional filters are trained in a supervised manner by back-propagating classification error Label Dense Dense Dense Convolution/pool Label Classifier Pooling Feature extraction Image Convolution/pool Convolution/pool Convolution/pool Convolution/pool Image

Gradient-Based Learning Applied to Document Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of the IEEE, 1998 Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012 Slide Credit: L. Zitnick

Gradient-Based Learning Applied to Document Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of the IEEE, 1998 Imagenet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012 * Rectified activations and dropout Slide Credit: L. Zitnick

SIFT Descriptor Image Pixels Apply gradient filters Lowe [IJCV 2004] Spatial pool (Sum) Normalize to unit length Feature Vector

SIFT Descriptor Image Pixels Apply oriented filters Lowe [IJCV 2004] Spatial pool (Sum) Normalize to unit length Feature Vector slide credit: R. Fergus

Spatial Pyramid Matching SIFT Features Filter with Visual Words Lazebnik, Schmid, Ponce [CVPR 2006] Max Multi-scale spatial pool (Sum) Classifier slide credit: R. Fergus

Deformable Part Model Deformable Part Models are Convolutional Neural Networks [Girshick et al. CVPR 15]

AlexNet Similar framework to LeCun 98 but: Bigger model (7 hidden layers, 650,000 units, 60,000,000 params) More data (10 6 vs. 10 3 images) GPU implementation (50x speedup over CPU) Trained on two GPUs for a week A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Using CNN for Image Classification Fully connected layer Fc7 d = 4096 AlexNet Averaging Fixed input size: 224x224x3 d = 4096 Softmax Layer Jia-Bin

Progress on ImageNet 15 10 ImageNet Image Classification Top5 Error 5 2012 AlexNet 2013 ZF 2014 VGG 2014 GoogLeNet 2015 2016 ResNetGoogLeNet-v4

VGG-Net The deeper, the better Key design choices: 3x3 conv. Kernels - very small conv. stride 1 - no loss of information Other details: Rectification (ReLU) non-linearity 5 max-pool layers (x2 reduction) no normalization 3 fully-connected (FC) layers

VGG-Net Why 3x3 layers? Stacked conv. layers have a large receptive field two 3x3 layers 5x5 receptive field three 3x3 layers 7x7 receptive field More non-linearity Less parameters to learn ~140M per net

ResNet Can we just increase the #layer? How can we train very deep network? - Residual learning

DenseNet Shorter connections (like ResNet) help Why not just connect them all?

Training Convolutional Neural Networks Backpropagation + stochastic gradient descent with momentum Neural Networks: Tricks of the Trade Dropout Data augmentation Batch normalization Initialization Transfer learning

Training CNN with gradient descent A CNN as composition of functions f w x = f L ( (f 2 f 1 x; w 1 ; w 2 ; w L ) Parameters w = (w 1, w 2, w L ) Empirical loss function L w = 1 n i l(z i, f w (x i )) Gradient descent New weight w t+1 = w t η t f w (wt ) Old weight Learning rate Gradient

An Illustrative example f x, y = xy, f x = y, f y = x Example: x = 4, y = 3 f x, y = 12 Partial derivatives f x = 3, f y = 4 Gradient f = [ f x, f y ] Example credit: Andrej Karpathy

f x, y, z = x + y z = qz q = x + y q x = 1, q y = 1 f = qz f q = z, f z = q Goal: compute the gradient f = [ f x, f y, f z ] Example credit: Andrej Karpathy

f x, y, z = x + y z = qz q = x + y q x = 1, q y = 1 f = qz f q = z, f z = q Chain rule: f x = f q q x Example credit: Andrej Karpathy

Backpropagation (recursive chain rule) w 1 w 2 w n q f q f = q f w i w i q Local gradient Can be computed during forward pass Gate gradient The gate receives this during backprop

Dropout Intuition: successful conspiracies 50 people planning a conspiracy Strategy A: plan a big conspiracy involving 50 people Likely to fail. 50 people need to play their parts correctly. Strategy B: plan 10 conspiracies each involving 5 people Likely to succeed! Dropout: A simple way to prevent neural networks from overfitting [Srivastava JMLR 2014]

Dropout Main Idea: approximately combining exponentially many different neural network architectures efficiently Dropout: A simple way to prevent neural networks from overfitting [Srivastava JMLR 2014]

Data Augmentation (Jittering) Create virtual training samples Horizontal flip Random crop Color casting Geometric distortion Deep Image [Wu et al. 2015]

Parametric Rectified Linear Unit Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification [He et al. 2015]

Swish The Swish activation function First derivatives of Swish https://arxiv.org/abs/1710.05941

Batch Normalization Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [Ioffe and Szegedy 2015]

Understanding and Visualizing CNN Find images that maximize some class scores Individual neuron activation Breaking CNNs

Find images that maximize some class scores person: HOG template Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps [Simonyan et al. ICLR Workshop 2014]

Individual Neuron Activation RCNN [Girshick et al. CVPR 2014]

Individual Neuron Activation RCNN [Girshick et al. CVPR 2014]

Individual Neuron Activation RCNN [Girshick et al. CVPR 2014]

Map activation back to the input pixel space What input pattern originally caused a given activation in the feature maps? Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 1 Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 2 Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 3 Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 4 and 5 Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Network Dissection http://netdissect.csail.mit.edu/

Deep learning library TensorFlow Research + Production PyTorch Research Caffe2 Production

Things to remember Convolutional neural networks A cascade of conv + ReLU + pool Representation learning Advanced architectures Tricks for training CNN Visualizing CNN Activation Dissection

Resources http://deeplearning.net/ Hub to many other deep learning resources https://github.com/christoschristofidis/awesome-deeplearning A resource collection deep learning https://github.com/kjw0612/awesome-deep-vision A resource collection deep learning for computer vision http://cs231n.stanford.edu/syllabus.html Nice course on CNN for visual recognition

Things to remember Overview Neuroscience, Perceptron, multi-layer neural networks Convolutional neural network (CNN) Convolution, nonlinearity, max pooling CNN for classification and beyond Understanding and visualizing CNN Find images that maximize some class scores; visualize individual neuron activation, input pattern and images; breaking CNNs Training CNN Dropout; data augmentation; batch normalization; transfer learning