ECE 6504: Deep Learning for Perception

Similar documents
CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Computer Vision Lecture 16

Introduction to Neural Networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Large-Scale Visual Recognition With Deep Learning

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Know your data - many types of networks

Deep Learning for Computer Vision II

CNN Basics. Chongruo Wu

Lecture 37: ConvNets (Cont d) and Training

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Deep Neural Networks:

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Introduction to Neural Networks

Deep Learning & Neural Networks

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Machine Learning 13. week

Computer Vision Lecture 16

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

6. Convolutional Neural Networks

INTRODUCTION TO DEEP LEARNING

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Computer Vision Lecture 16

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Deep Learning with Tensorflow AlexNet

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Deep (1) Matthieu Cord LIP6 / UPMC Paris 6

Why Is Recognition Hard? Object Recognizer

Global Optimality in Neural Network Training

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

ConvolutionalNN's... ConvNet's... deep learnig

Deep Learning for Vision: Tricks of the Trade

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Machine Learning. MGS Lecture 3: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Convolutional Neural Networks

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Image and Video Understanding

Introduction to Deep Learning

Weighted Convolutional Neural Network. Ensemble.

Convolu'onal Neural Networks

Object Detection. Part1. Presenter: Dae-Yong

Computer vision: teaching computers to see

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Advanced Introduction to Machine Learning, CMU-10715

Lecture 20: Neural Networks for NLP. Zubin Pahuja

EE 511 Neural Networks

Convolutional-Recursive Deep Learning for 3D Object Classification

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Deconvolutions in Convolutional Neural Networks

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

CNN Visualizations. Seoul AI Meetup Martin Kersner, 2018/01/06

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

LOCALIZATION OF HUMANS IN IMAGES USING CONVOLUTIONAL NETWORKS

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Using Machine Learning for Classification of Cancer Cells

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016

Introduction. Neural Networks. Chapter , , 18.7 and Deep Learning paper. Recognizing Digits using a Neural Net

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

Study of Residual Networks for Image Recognition

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Locally Scale-Invariant Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks

Feature Visualization

Introduction to Neural Networks and Brief Tutorial with Caffe 10 th Set of Notes

Object Recognition II

Multi-Glance Attention Models For Image Classification

Classification of objects from Video Data (Group 30)

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Accelerating Convolutional Neural Nets. Yunming Zhang

CSE 559A: Computer Vision

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

Learning Feature Hierarchies for Object Recognition

Category Recognition. Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision

Deep Learning for Generic Object Recognition

C-Brain: A Deep Learning Accelerator

Deep Learning. Volker Tresp Summer 2014

Deep Learning and Its Applications

Part Localization by Exploiting Deep Convolutional Networks

Deep Learning. Architecture Design for. Sargur N. Srihari

Learning Deep Representations for Visual Recognition

Martian lava field, NASA, Wikipedia

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Transcription:

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural Nets Dhruv Batra Virginia Tech

Administrativia Presentation Assignments https://docs.google.com/spreadsheets/d/ 1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/ edit#gid=2045905312 (C) Dhruv Batra 2

Recap of last time (C) Dhruv Batra 3

Last Time Notation + Setup Neural Networks Chain Rule + Backprop (C) Dhruv Batra 4

Recall: The Neuron Metaphor Neurons accept information from multiple inputs, transmit information to other neurons. Artificial neuron Multiply inputs by weights along edges Apply some function to the set of inputs at each node Image Credit: Andrej Karpathy, CS231n 5

Activation Functions sigmoid vs tanh (C) Dhruv Batra 6

A quick note (C) Dhruv Batra Image Credit: LeCun et al. 98 7

Rectified Linear Units (ReLU) (C) Dhruv Batra 8

(C) Dhruv Batra 9

(C) Dhruv Batra 10

Visualizing Loss Functions Sum of individual losses (C) Dhruv Batra Image Credit: Andrej Karpathy, CS231n 11

Detour (C) Dhruv Batra 12

Logistic Regression as a Cascade x w x x, Yann LeCun 13 x

Key Computation: Forward-Prop, Yann LeCun 14

Key Computation: Back-Prop, Yann LeCun 15

Plan for Today MLPs Notation Backprop CNNs Notation Convolutions Forward pass Backward pass (C) Dhruv Batra 16

Multilayer Networks Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights (C) Dhruv Batra Image Credit: Andrej Karpathy, CS231n 17

Equivalent Representations, Yann LeCun 18

Backward Propagation Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation works. Question: What's the computational cost of BPROP? Answer: About twice FPROP (need to compute gradients w.r.t. input and parameters at every layer). Note: FPROP and BPROP are dual of each other. E.g.,: FPROP BPROP SUM + COPY (C) Dhruv Batra + Slide Credit: Marc'Aurelio Ranzato, Yann LeCun 19

Fully Connected Layer Example: 200x200 image 40K hidden units ~2B parameters!!! - Spatial correlation is local - Waste of resources + we have not enough training samples anyway.. Slide Credit: Marc'Aurelio Ranzato 20

Locally Connected Layer Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). Slide Credit: Marc'Aurelio Ranzato 21

Locally Connected Layer STATIONARITY? Statistics is similar at different locations Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). Slide Credit: Marc'Aurelio Ranzato 22

Convolutional Layer Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels Slide Credit: Marc'Aurelio Ranzato 23

"Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/ wiki/file:convolution_of_box_signal_with_itself2.gif#/media/file:convolution_of_box_signal_with_itself2.gif (C) Dhruv Batra 24

Convolution Explained http://setosa.io/ev/image-kernels/ https://github.com/bruckner/deepviz (C) Dhruv Batra 25

Convolutional Layer 26

Convolutional Layer 27

Convolutional Layer 28

Convolutional Layer 29

Convolutional Layer 30

Convolutional Layer 31

Convolutional Layer 32

Convolutional Layer 33

Convolutional Layer 34

Convolutional Layer 35

Convolutional Layer 36

Convolutional Layer 37

Convolutional Layer 38

Convolutional Layer 39

Convolutional Layer 40

Convolutional Layer Mathieu et al. Fast training of CNNs through FFTs ICLR 2014 41

Convolutional Layer * -1 0 1-1 0 1-1 0 1 = 42

Convolutional Layer Learn multiple filters. E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters 43

Convolutional Nets a INPUT 32x32 C1: feature maps 6@28x28 C3: f. maps 16@10x10 S4: f. maps 16@5x5 S2: f. maps 6@14x14 C5: layer 120 F6: layer 84 OUTPUT 10 Convolutions Subsampling Convolutions Full connection Gaussian connections Subsampling Full connection (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy 44

h n i Convolutional Layer 8 < #input channels = max : 0, X j=1 h n 1 j 9 = wij n ; output feature map input feature map kernel h 1 n 1 h 2 n 1 h 3 n 1 Conv. layer h 1 n h 2 n 45

h n i Convolutional Layer 8 < #input channels = max : 0, X j=1 h n 1 j 9 = wij n ; output feature map input feature map kernel h 1 n 1 h 2 n 1 h 3 n 1 h 1 n h 2 n 46

h n i Convolutional Layer 8 < #input channels = max : 0, X j=1 h n 1 j 9 = wij n ; output feature map input feature map kernel h 1 n 1 h 2 n 1 h 3 n 1 h 1 n h 2 n 47

Convolutional Layer Question: What is the size of the output? What's the computational cost? Answer: It is proportional to the number of filters and depends on the stride. If kernels have size KxK, input has size DxD, stride is 1, and there are M input feature maps and N output feature maps then: - the input has size M@DxD - the output has size N@(D-K+1)x(D-K+1) - the kernels have MxNxKxK coefficients (which have to be learned) - cost: M*K*K*N*(D-K+1)*(D-K+1) Question: How many feature maps? What's the size of the filters? Answer: Usually, there are more output feature maps than input feature maps. Convolutional layers can increase the number of hidden units by big factors (and are expensive to compute). The size of the filters has to match the size/scale of the patterns we want to detect (task dependent). 48

Key Ideas A standard neural net applied to images: - scales quadratically with the size of the input - does not leverage stationarity Solution: - connect each hidden unit to a small patch of the input - share the weight across space This is called: convolutional layer. A network with convolutional layers is called convolutional network. LeCun et al. Gradient-based learning applied to document recognition IEEE 1998 49

Pooling Layer Let us assume filter is an eye detector. Q.: how can we make the detection robust to the exact location of the eye? 50

Pooling Layer By pooling (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features. 51

Max-pooling: Pooling Layer: Examples h n i (r, c) = max r2n(r), c2n(c) hn 1 i ( r, c) Average-pooling: h n i (r, c) = mean r2n(r), c2n(c) hn 1 i ( r, c) L2-pooling: h n i (r, c) = s X r2n(r), c2n(c) h n 1 i ( r, c) 2 L2-pooling over features: s X h n i (r, c) = j2n(i) h n 1 i (r, c) 2 52

Pooling Layer Question: What is the size of the output? What's the computational cost? Answer: The size of the output depends on the stride between the pools. For instance, if pools do not overlap and have size KxK, and the input has size DxD with M input feature maps, then: - output is M@(D/K)x(D/K) - the computational cost is proportional to the size of the input (negligible compared to a convolutional layer) Question: How should I set the size of the pools? Answer: It depends on how much invariant or robust to distortions we want the representation to be. It is best to pool slowly (via a few stacks of conv-pooling layers). 53

Pooling Layer: Interpretation Task: detect orientation L/R Conv layer: linearizes manifold 54

Pooling Layer: Interpretation Task: detect orientation L/R Conv layer: linearizes manifold Pooling layer: collapses manifold 55

Pooling Layer: Receptive Field Size h n 1 h n Pool. h n 1 Conv. layer layer If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1) 56

Pooling Layer: Receptive Field Size h n 1 h n Pool. h n 1 Conv. layer layer If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1) 57

ConvNets: Typical Stage One stage (zoom) Convol. Pooling courtesy of K. Kavukcuoglu 58

ConvNets: Typical Stage One stage (zoom) Convol. Pooling Conceptually similar to: SIFT, HoG, etc. 59

Note: after one stage the number of feature maps is usually increased (conv. layer) and the spatial resolution is usually decreased (stride in conv. and pooling layers). Receptive field gets bigger. Reasons: - gain invariance to spatial translation (pooling layer) - increase specificity of features (approaching object specific units) courtesy of K. Kavukcuoglu 60

ConvNets: Typical Architecture One stage (zoom) Convol. Pooling Whole system Input Image Fully Conn. Layers Class Labels 1 st stage 2 nd stage 3 rd stage 61

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 62

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 63

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 64

Fancier Architectures: Multi-Modal Matching shared representation CNN Text Embedding tiger Frome et al. Devise: a deep visual semantic embedding model NIPS 2013 65

Fancier Architectures: Multi-Task Fully Conn. Attr. 1 image Conv Norm Pool Conv Norm Pool Conv Norm Pool Conv Norm Pool Fully Conn. Fully Conn.... Fully Conn. Attr. 2 Attr. N Zhang et al. PANDA.. CVPR 2014 66

Fancier Architectures: Generic DAG Any DAG of differentialble modules is allowed! 67