Deep Learning for Computer Vision II

Similar documents
Deep Learning with Tensorflow AlexNet

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

ImageNet Classification with Deep Convolutional Neural Networks

Machine Learning. MGS Lecture 3: Deep Learning

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Deep Neural Networks:

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Introduction to Neural Networks

Convolutional Neural Networks

Perceptron: This is convolution!

Computer Vision Lecture 16

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Study of Residual Networks for Image Recognition

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Introduction to Neural Networks

Deconvolutions in Convolutional Neural Networks

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Structured Prediction using Convolutional Neural Networks

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Deep Learning & Neural Networks

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Convolutional Neural Networks

Weighted Convolutional Neural Network. Ensemble.

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Advanced Introduction to Machine Learning, CMU-10715

Lecture 37: ConvNets (Cont d) and Training

Machine Learning 13. week

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Know your data - many types of networks

Deep Learning Cook Book

CSE 559A: Computer Vision

All You Want To Know About CNNs. Yukun Zhu

Accelerating Convolutional Neural Nets. Yunming Zhang

INTRODUCTION TO DEEP LEARNING

Dynamic Routing Between Capsules

Deep Learning and Its Applications

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Convolu'onal Neural Networks

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Seminars in Artifiial Intelligenie and Robotiis

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Como funciona o Deep Learning

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Advanced Video Analysis & Imaging

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Spatial Localization and Detection. Lecture 8-1

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Rotation Invariance Neural Network

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

arxiv: v1 [cs.lg] 16 Jan 2013

Survey of Convolutional Neural Network

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Global Optimality in Neural Network Training

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Fusion of Mini-Deep Nets

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Multi-Task Learning of Facial Landmarks and Expression

Deep Learning With Noise

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Introduction. Neural Networks. Chapter , , 18.7 and Deep Learning paper. Recognizing Digits using a Neural Net

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Lecture 5: Training Neural Networks, Part I

Multi-Glance Attention Models For Image Classification

Fei-Fei Li & Justin Johnson & Serena Yeung

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Classification of objects from Video Data (Group 30)

Computer Vision Lecture 16

Computer Vision Lecture 16

DropConnect Regularization Method with Sparsity Constraint for Neural Networks

Introduction to Deep Learning

Tiny ImageNet Visual Recognition Challenge

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Backpropagation and Neural Networks. Lecture 4-1

6. Convolutional Neural Networks

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

3D model classification using convolutional neural network

AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Groupout: A Way to Regularize Deep Convolutional Neural Network

Deep Indian Delicacy: Classification of Indian Food Images using Convolutional Neural Networks

Deep neural networks II

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Transcription:

IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar

Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L 3 L 4 Layers - (Hierarchical decomposition)

Common pipeline

Common Pipeline

A simple network x 0 f 1 f 2 f n-1 f n x 1 x n-2 x n-1 x n w 1 w 2 w n-1 w n Here each output x j depends on previous input x j-1 through a function f j with parameters w j

Feed forward neural network x 00 x 01 x n1 x n2 x 0d x nc W 1 W n

Feed forward neural network x 00 x 01 x n1 x n2 x 0d x nc W 1 W n LOSS z y = [0,0,,1, 0]

Feed forward neural network W 1 W n LOSS z Weight updates using back propagation of gradients

Training Vanishing Gradient Problem Consider a simple network. x 0 x 1 x 2 C w 1 w 2 w 3 < ¼ < ¼ < ¼ ¼ Squashing Behaviour Deeper the network, gradients vanish quickly, thereby slowing the rate of change in initial layers.

Convolutional Network Fully connected layer Locally connected layer 200x200x3 200x200x3 3x3x3 #Hidden Units: 120,000 #Params: 14.4 billion Need huge training data to prevent over-fitting! #Hidden Units: 120,000 #Params: 3.2 Million Useful when the image is highly registered

Convolutional layer with single feature map. Convolutional Network Convolutional layer with multiple feature maps 200x200x3 3x3x3 200x200x3 Receptive field #Hidden Units: 120,000 #Params: 27 x #Feature Maps Sharing parameters Exploiting the stationarity property and preserves locality of pixel dependencies 200 3 3 3 3? # feature maps?

Convolutional Network 200x200x3 Image size: W1xH1xD1 Receptive field size: FxF #Feature maps: K It is also better to do zero padding to preserve input size spatially. W2=(W1-F)/S+1 H2=(H1-F)/S+1 D2=K

Convolutional Layer x 1 n-1 x 2 n-1 x 3 n-1 Conv. Layer y 1 n y 2 n y F n Here f is a non-linear activation function. F= no. of feature maps n= layer index * represents element-by-element multiplication

Activation Functions Sigmoid tanh ReLU Leaky ReLU maxout

CONV POOL NORM CONV POOL NORM FC SOFTMAX Typical Architecture A typical deep convolutional network Other layers Pooling Normalization Fully connected etc.

Pooling Layer Pool Size: 2x2 Stride: 2 Type: Max 2 8 9 4 3 6 5 7 3 1 6 4 8 9 5 7 2 5 7 3 Max pooling Role of an aggregator. Invariance to image transformation and increases compactness to representation. Pooling types: Max, Average, L2 etc. Image Courtesy: Ranzato CVPR 14

Normalization Local contrast normalization (Jarrett et.al ICCV 09) Improves invariances Improves sparsity Local response normalization (Krizhevesky et.al. NIPS 12) Kind of lateral inhibition and performed across the channels Batch normalization Activation of the mini-batch is centered to zeromean and unit variance to prevent internal covariate shifts. Need similar responses

Multi layer perceptron Role of a classifier Fully connected Generally used in final layers to classify the object represented in terms of discriminative parts and higher semantic entities. SoftMax Normalizes the output.

Case Study: AlexNet Winner of ImageNet LSVRC-2010. Trained over 1.2M images using SGD with regularization. Deep architecture (60M parameters.) Optimized GPU implementation (cuda-convnet) Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." NIPS 2012.

AlexNet Architecture 8 Layers in total ( 5 convolutional layers, 3 fully connected layers ) Trained on ImageNet Dataset [Deng et al. CVPR 09 ] Response-normalization layers follow the first and second convolutional layers. Max-pooling follow first, second and the fifth convolutional layers. The ReLU non-linearity is applied to the output of every layer Softmax Output Layer 7: Full Layer 6: Full Layer 5: Conv + Pool Layer 4: Conv Layer 3: Conv Layer 2: Conv + Pool Layer 1: Conv + Pool Input Image

AlexNet Architecture

Parameter Calculation 227 55 227 11 11 55 3 K=96 Filter Size F Input volume streams be D # filters be K # parameters in a layer is ( F. F. D ). K Hyper parameters Example: For layer 1, Input images are 227 x 227 x 3 F = 11 and K = 96 Each filter has 11 x 11 x 3 = 363 and 1 (bias) i.e., 364 weights # weights = 364 x 96 = 35 K (approx.) Hyper parameters Stride S Zero padding P Input Size: W1 x H1 x D1 Output Size: W2 x H2 x D2 W2 = [ ( W1 F + 2P ) / S ] + 1 and D2 = K S = 4, W1 = 227, F =11, P = 0 D2 = 96 W2 = (227-11 )/4 + 1 = 55 Output Size: 55 x 55 X 96

AlexNet Architecture Convolutional layers cumulatively contain about 90-95% of computation, only about 5% of the parameters Fully-connected layers contain about 95% of parameters.

AlexNet Architecture Parameters 4 M Softmax Output Neurons 1000 Trained with stochastic gradient descent on two NVIDIA GTX 580 3GB GPUs for about a week 16 M 37 M 442 K Layer 7: Full Layer 6: Full Layer 5: Conv + Pool 4096 4096 43 K 650,000 neurons 60 M parameters 1.3 M Layer 4: Conv 65 K 630 M connections 884 K Layer 3: Conv 65 K Final feature layer: 4096- dimensional 307 K 35 K Layer 2: Conv + Pool Layer 1: Conv + Pool 187 K 253 K Input Image

Training Learning: Minimizing the loss function (incl. regularization) w.r.t. parameters of the network. LOSS FC Filter weights Mini batch stochastic gradient descent Sample a batch of data. Forward propagation Backward propagation Parameter update NORM POOL CONV NORM POOL CONV x n y n

Backpropagation Training Consider an layer f with parameters w: LOSS FC Here z is scalar which is the loss computed from loss function h. The derivative of loss function w.r.t to parameters is given as: NORM POOL CONV Recursive eq. which is applicable to each layer NORM POOL CONV x n y n

Parameter update Stochastic gradient descent Training LOSS FC Here η is the learning rate and θ is the set of all parameters Stochastic gradient descent with momentum NORM POOL CONV NORM POOL CONV x n y n

Loss functions. Training Classification Soft-max loss / multinomial logistic regression loss LOSS FC NORM POOL Derivative w.r.t. x i CONV NORM Other variations: cross entropy loss, log loss POOL CONV x n y n

Loss functions. Classification Hinge Loss Training LOSS FC Hinge loss is a convex function but not differentiable but sub-gradient exists. NORM POOL Sub-gradient w.r.t. x i CONV NORM POOL CONV x n y n

Loss functions. Training Regression Euclidean loss / Squared loss LOSS FC NORM Derivative w.r.t. x i POOL CONV NORM POOL CONV Read MatConvNet manual for understanding derivatives specific to each layer. http://www.vlfeat.org/matconvnet/matconvnet-manual.pdf x n y n

top5- error Generalization How to prevent? Underfitting Deeper n/w s Overfitting Training Stopping at the right time. Weight penalties.» L1» L2» Max norm Dropout Model ensembles E.g. Same model, different initializations. val-2 accuracy (overfitting) epoch training accuracy

Generalization Dropouts Stochastic regularization. Idea applicable to many other networks. Dropping out hidden units randomly with fixed probability p (say 0.5) temporarily while training. While testing all the units are preserved but scaled with p. Dropouts along with max norm constraint is found to be useful. Before dropout After dropout Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. JMLR 2014

Without dropout Generalization With dropout Features learned with one hidden layers auto-encoder on MNIST dataset. Sparsity Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. JMLR 2014

Data Augmentation/Jittering A popular scheme to minimize overfitting The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations. Researchers employ different forms of data augmentation: image translation horizontal reflections changing RGB intensities Control the emount of jitter. Excessive can be counter productive

AlexNet Implementation Details Trained with stochastic gradient descent on two NVIDIA GTX 580 3GB GPUs Highly optimized GPU implementation of 2D convolution (for a batch size of 128) Originally implemented using cuda-convent Trained for 90 epochs through training set of 1.2 million images Training time about 5 to 6 days Data augmentation and dropout to prevent overfitting.

Some results on ImageNet Source: Krizhevsky et.al. NIPS 12 AlexNet Clarifai GoogLeNet Top-5 classification accuracy

Feature Visualization Corners and other edge/color conjunctions

Feature Visualization Similar textures (note the mesh patterns and text, highlighted with yellow square)

Feature Visualization Object Parts ( dog face & bird legs ) Entire object with pose variation (dogs)

Feature evolution during training Lower layers converge faster Higher layers start to converge later

Stimulus CNN: Visualization

CNN: Visualization

CNN: Visualization

CNN: Visualization

CNN: Visualization

CNN: Visualization

Historical Note: LeNet (1989,1998) Architecture of LeNet-5 used for recognizing digits.

Historical Note: Neocognitron Inspired from [Hubel & Wiesel 1962] Simple cells detect local features Complex cells pool the outputs of simple cells within a retinotopic neighborhood. Slide Courtesy: LeCun ICML 2013

Summary Deep Convolutional Networks Conv, Norm, Pool, FC, Layers Training by Back propagation Many specific enhancements Nonlinearity (ReLU), Dropout, Superior GD,.. Lots of data, Lots of computation Anatomy and Physiology of AlexNet Architecture, Parameters Feature Visualization Next: What is going on during 2012-2016

IIIT Hyderabad Thank You!!