Como funciona o Deep Learning

Similar documents
Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Deep Learning for Computer Vision II

Deep Learning with Tensorflow AlexNet

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask

INTRODUCTION TO DEEP LEARNING

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Perceptron: This is convolution!

Convolutional Neural Networks

Know your data - many types of networks

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Computer Vision Lecture 16

CSE 559A: Computer Vision

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Fuzzy Set Theory in Computer Vision: Example 3

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Machine Learning 13. week

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Fei-Fei Li & Justin Johnson & Serena Yeung

Spatial Localization and Detection. Lecture 8-1

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Advanced Video Analysis & Imaging

Introduction to Neural Networks

Deep neural networks II

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Inception Network Overview. David White CS793

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Structured Prediction using Convolutional Neural Networks

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Deep Learning and Its Applications

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Fuzzy Set Theory in Computer Vision: Example 3, Part II

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Lecture 37: ConvNets (Cont d) and Training

CNN Basics. Chongruo Wu

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Convolutional Neural Networks

Using Machine Learning for Classification of Cancer Cells

CS230: Deep Learning Winter Quarter 2018 Stanford University

Machine Learning. MGS Lecture 3: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

ImageNet Classification with Deep Convolutional Neural Networks

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Training Deep Neural Networks (in parallel)

Face Recognition A Deep Learning Approach

Survey of Convolutional Neural Network

Fusion of Mini-Deep Nets

POINT CLOUD DEEP LEARNING

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Convolutional Neural Networks and Supervised Learning

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

All You Want To Know About CNNs. Yukun Zhu

Deep Residual Learning

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Classification of objects from Video Data (Group 30)

Study of Residual Networks for Image Recognition

Neural Network Neurons

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Deep Face Recognition. Nathan Sun

Computer Vision Lecture 16

Deep Learning & Neural Networks

Computer Vision Lecture 16

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

KamiNet A Convolutional Neural Network for Tiny ImageNet Challenge

Dynamic Routing Between Capsules

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Elastic Neural Networks for Classification

Deep Learning Cook Book

Deep Neural Networks:

Predicting Goal-Scoring Opportunities in Soccer by Using Deep Convolutional Neural Networks. Master s Thesis

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Regression in Deep Learning: Siamese and Triplet Networks

Keras: Handwritten Digit Recognition using MNIST Dataset

Parallel Deep Network Training

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Learning Deep Features for Visual Recognition

OBJECT DETECTION HYUNG IL KOO

Parallel Deep Network Training

Convolu'onal Neural Networks

Facial Expression Classification with Random Filters Feature Extraction

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

ROB 537: Learning-Based Control

Neural Networks with Input Specified Thresholds

Introduction to Neural Networks

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

The Mathematics Behind Neural Networks

Transcription:

Como funciona o Deep Learning Moacir Ponti (com ajuda de Gabriel Paranhos da Costa) ICMC, Universidade de São Paulo Contact: www.icmc.usp.br/~moacir moacir@icmc.usp.br Uberlandia-MG/Brazil October, 2017 Moacir A. Ponti How Deep Learning Works 1 / 74

Agenda 1 Image classification basics 2 Searching for an image classification method that beats humans 3 Neural networks: from shallow to deep Motivation and definitions Linear function, loss function, optimization Simple Neural Network 4 Convolutional Neural Networks Current Architectures Guidelines for training 5 How it Works, Limitations and final remarks Moacir A. Ponti How Deep Learning Works 2 / 74

Image classification basics Image classification example Problem given two classes of images: class 1: desert, class 2: beach, and also a set of 9 images taken from each class, develop a program able to classify a new, and unseen image, into one of those two classes. Object: image Moacir A. Ponti How Deep Learning Works 3 / 74

Image classification basics Image classification example Feature: set of values extracted from images that can be used to measure the (dis)similarity between images Any suggestion? Requantize the image to obtain only 64 colours per image, use the two most frequent colours as features! Each image is represented by 2 values: 2D feature space. Moacir A. Ponti How Deep Learning Works 4 / 74

Image classification basics Image classification example Moacir A. Ponti How Deep Learning Works 5 / 74

Image classification basics Image classification example Classifier: a model build using labeled examples (images for which the classes are known). This model must be able to predict the class of a new image. Any suggestion? To find a partition of the space, using the data distribution. Moacir A. Ponti How Deep Learning Works 6 / 74

Image classification basics Image classification example Examples used to build the classifier : training set. Training data is seldom linearly separable Therefore there is a training error Moacir A. Ponti How Deep Learning Works 7 / 74

Image classification basics Image classification example The model, or classifier, can then be used to predict/infer the class of a new example. Moacir A. Ponti How Deep Learning Works 8 / 74

Image classification basics Image classification example Now we want to test, for future data (not used in training), the classifier error rate (or alternatively, its accuracy) The examples used in this stage is known as test set. Moacir A. Ponti How Deep Learning Works 9 / 74

Image classification basics Terminology Class: label/category, Ω = {ω 1, ω 2,..., ω c } Dataset: X = {x 1, x 2,..., x N }, for x i R M x i R M l(x i ) = y i Ω example (object) in the feature space: the feature vector labels assigned to the each example matrix N examples M features: x 1,1 x 1,2 x 1,M l(x 1 ) = y 1 X = x 2,1 x 2,2 x 2,M, labels = Y = l(x 2 ) = y 2 x N,1 x N,2 x N,M l(x N ) = y N Moacir A. Ponti How Deep Learning Works 10 / 74

Searching for an image classification method that beats humans Agenda 1 Image classification basics 2 Searching for an image classification method that beats humans 3 Neural networks: from shallow to deep Motivation and definitions Linear function, loss function, optimization Simple Neural Network 4 Convolutional Neural Networks Current Architectures Guidelines for training 5 How it Works, Limitations and final remarks Moacir A. Ponti How Deep Learning Works 11 / 74

Searching for an image classification method that beats humans Introduction Recent history that tries to solve the problem of image classification: Color, shape and texture descriptors (1970-2000) SIFT (>1999) Histogram of Gradients (>2005) Bag of Features (>2004) Spatial Pyramid Matching (>2006), Moacir A. Ponti How Deep Learning Works 12 / 74

Searching for an image classification method that beats humans Pipeline 1 Descriptor grid: HoG, LBP, SIFT, SURF 2 Fisher Vectors 3 Spatial Pyramid Matching 4 Classification Algorithm Moacir A. Ponti How Deep Learning Works 13 / 74

Searching for an image classification method that beats humans Image Net/ Large Scale Visual Recognition Challenge ImageNet: 22000 categories, 14 million images ImageNet Challenge: 1.4 million images, 1000 classes. Moacir A. Ponti How Deep Learning Works 14 / 74

Searching for an image classification method that beats humans Architectures and number of layers AlexNet (9) GoogLeNet (22) VGG (16/19) ResNet (34+) Moacir A. Ponti How Deep Learning Works 15 / 74

Searching for an image classification method that beats humans CNNs were not invented in 2012... Fukushima s Neocognitron (1989) LeCun s LeNet (1998) Moacir A. Ponti How Deep Learning Works 16 / 74

Neural networks: from shallow to deep Agenda 1 Image classification basics 2 Searching for an image classification method that beats humans 3 Neural networks: from shallow to deep Motivation and definitions Linear function, loss function, optimization Simple Neural Network 4 Convolutional Neural Networks Current Architectures Guidelines for training 5 How it Works, Limitations and final remarks Moacir A. Ponti How Deep Learning Works 17 / 74

Neural networks: from shallow to deep Motivation and definitions Motivation with two problems We want to find a function in the form f (x) = y (depends on the task) Image classification: animals Data available: pairs of images and labels from dogs, cats and turtles, Input: RGB image in the form x, Output: predicted label y (e.g. cat, dog) assigned to the input image. Anomaly detection in audio for Parkinsons Disease diagnosis Data available: speech audio recorded from people without Parkinsons Disease, Input: time series with 16-bit audio content x, Output: probability y of observing an anomalous audio input (indicating possible disease). Moacir A. Ponti How Deep Learning Works 18 / 74

Neural networks: from shallow to deep Motivation and definitions Machine Learning (ML) vs Deep Learning (DL) Machine Learning A more broad area that includes DL. Algorithms aims to infer f () from a space of admissible functions given training data. shallow methods often infer a single function f (.). e.g. a linear function f (x) = w x + θ, Common algorithms: The Perceptron, Support Vector Machines, Logistic Classifier, etc. Moacir A. Ponti How Deep Learning Works 19 / 74

Neural networks: from shallow to deep Motivation and definitions Machine Learning (ML) vs Deep Learning (DL) Deep Learning Involves learning a sequence of representations via composite functions. Given an input x 1 several intermediate representations are produced: x 2 = f 1 (x 1 ) x 3 = f 2 (x 2 ) x 4 = f 3 (x 3 ) The output is achieved by several L nested functions in the form: f L ( f 3 (f 2 (f 1 (x 1, W 1 ), W 2 ), W 3 ), W L ), W i are hyperparameters associated with each function i. Moacir A. Ponti How Deep Learning Works 20 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization A shallow linear classifier Input x weight matrix image bias term f (W, x) = W x + b = scores for possible classes of x Moacir A. Ponti How Deep Learning Works 21 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Linear classifier for image classification Input: image (with N M 3 numbers) vectorized into column x Classes: cat, turtle, owl Output: class scores = x = [1, 73, 227, 82] f (x, W ) = s 3 numbers with class scores 0.1 0.25 0.1 2.5 0 0.5 0.2 0.6 2 0.8 1.8 0.1 W x + b 1 73 227 + 82 2.0 1.7 0.5 = 337.3 38.6 460.30 Moacir A. Ponti How Deep Learning Works 22 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Linear classifier for image classification We need: cat -337.3 380.3 8.6 owl 460.3 160.3 26.3 turtle 38.6 17.6 21.8 a loss function that quantifies undesired scenarios in the training set an optimization algorithm to find W so that the loss function is minimized! Moacir A. Ponti How Deep Learning Works 23 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Linear classifier for image classification We want to optimize some function to produce the best classifier This function is often called loss function, Let (X, Y ) be the training set: X are the features, Y are the class labels, and f (.) a classifier that maps any value in X into a class: predicted label l (f (W, x i, y i ) = ( f (W, x i ) true label y i ) 2 (1) Moacir A. Ponti How Deep Learning Works 24 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization A linear classifier we would like turtle classifier owl classifier cat classifier Moacir A. Ponti How Deep Learning Works 25 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Minimizing the loss function Use the slope of the loss function over the space of parameters! For each dimension j: df (x) dx dl (f (w j, x i )) dw j = lim δ 0 f (x + δ) f (x) δ = lim δ 0 f (w j + δ, x i ) f (w j, x i ) δ We have multiple dimensions, therefore a gradient (vector of derivatives). We may use: 1 Numerical gradient: approximate 2 Analytic gradient: exact Gradient descent search for the valley of the function, moving in the direction of the negative gradient. Moacir A. Ponti How Deep Learning Works 26 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Gradient descent Changes in a parameter affects the loss (ideal example) Squared Error Loss 6 8 10 12 1.5 1.0 0.5 0.0 0.5 1.0 w Moacir A. Ponti How Deep Learning Works 27 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Gradient descent W 0.1, 0.25, 0.1, 2.5, 0,..., 0.1 w i + δ 0.1 + 0.001, 0.25, 0.1, 2.5, 0,..., 0.1 dw i?,,,,,..., l (f (W )) = 2.31298 l (f (W )) = 2.31201 (f (w i + δ) f (w i ))/δ Moacir A. Ponti How Deep Learning Works 28 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Gradient descent W 0.1, 0.25, 0.1, 2.5, 0,..., 0.1 w i + δ 0.1 + 0.001, 0.25, 0.1, 2.5, 0,..., 0.1 dw i +0.93,,,,,..., l (f (W )) = 2.31298 l (f (W )) = 2.31201 (f (w i + δ) f (w i ))/δ Moacir A. Ponti How Deep Learning Works 29 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Gradient descent W 0.1, 0.25, 0.1, 2.5, 0,..., 0.1 w i + δ 0.1, 0.25 + 0.001, 0.1, 2.5, 0,..., 0.1 dw i +0.93, 0.0,,,,..., l (f (W )) = 2.31298 l (f (W )) = 2.31298 (f (w i + δ) f (w i ))/δ Moacir A. Ponti How Deep Learning Works 30 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Gradient descent W 0.1, 0.25, 0.1, 2.5, 0,..., 0.1 w i + δ 0.1, 0.25, 0.1 + 0.001, 2.5, 0,..., 0.1 dw i +0.93, 0.0, 1.12,,,..., l (f (W )) = 2.31298 l (f (W 1)) = 2.31459 (f (w i + δ) f (w i ))/δ Moacir A. Ponti How Deep Learning Works 31 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Gradient descent W 0.1, 0.25, 0.1, 2.5, 0,..., 0.1 w i + δ 0.1, 0.25, 0.1, 2.5, 0,..., 0.1 dw i +0.93, 0.0, 1.12, +0.02, +0.5,..., 5.7 l (f (W )) = 2.31298 l (f (W )) = 1.98720 (f (w i + δ) f (w i ))/δ Moacir A. Ponti How Deep Learning Works 32 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Regularization l(w ) = 1 N W l(w ) = 1 N regularization N l i (x i, y + i, W )+ λr(w ) i=1 N W l i (x i, y + i, W ) + λ W R(W ) i=1 Regularization will help the model to keep it simple. Possible methods L2 : R(W ) = i j W i,j 2 L1 : R(W ) = i j W i,j Alternatives: dropout and batch normalization Moacir A. Ponti How Deep Learning Works 33 / 74

Neural networks: from shallow to deep Linear function, loss function, optimization Stochastic Gradient Descent (SGD) It is hard to compute the gradient, when N is large. SGD: Approximate the sum using a minibatch (random sample) of instances: something between 32 and 512. Because it uses only a fraction of the data: fast often gives bad estimates on each iteration, needing more iterations Moacir A. Ponti How Deep Learning Works 34 / 74

Neural networks: from shallow to deep Simple Neural Network Neuron input: several values (i 0, i 1,..., i n ) output: a single value x k. each connection associated with a weight w (connection strength) often there is a bias value b (intercept) to learn is to adapt the parameters: weights w and b function f (.) is called activation function (transforms output) i 0 w k,0 i 1 w k,1 net k =Σ j w k,j i j +b k w k,n x k =f k (net k ) x k i n b k Moacir A. Ponti How Deep Learning Works 35 / 74

Neural networks: from shallow to deep Simple Neural Network Some activation functions Sigmoid Hiperbolic Tangent f (x) = 1 1+e x f (x) = tanh(x) ReLU Leaky ReLU f (x) = max(0, x) f (x) = max(0.1x, x) Moacir A. Ponti How Deep Learning Works 36 / 74

Neural networks: from shallow to deep Simple Neural Network A simpler problem: digit classification Moacir A. Ponti How Deep Learning Works 37 / 74

Neural networks: from shallow to deep Simple Neural Network Neural Network with Single Layer Grayscale Image to Vector softmax 1 (w1 t x + b 1) softmax 2 (w2 t x + b 2) softmax 3 (w3 t x + b 3) softmax 4 (w4 t x + b 4) softmax 5 (w5 t x + b 5) softmax 6 (w6 t x + b 6)... softmax 7 (w t 7 x + b 7) softmax 8 (w t 8 x + b 8) softmax 9 (w t 9 x + b 9) softmax 10 (w t 10 x + b 10) Moacir A. Ponti How Deep Learning Works 38 / 74

Neural networks: from shallow to deep Simple Neural Network A simpler problem: digit classification w x 0,0 x 0,1 x 0,2... x 0,0 w 0,1... w 0,9 0,783 x 1,0 x 0,1 x 1,2... x 0,783 w 1,0 w 1,1... w 1,9....... w 2,0 w 2,1... w 2,9 + [ ] b 0 b 1 b 2... b 9...... x 63,0 x 63,1 x 63,2... x 63,783 w 783,0 w 783,1... w 783,9 Y = softmax(x W + b) y 0,0 y 0,1 y 0,2... y 0,9 y 1,0 y 1,1 y 1,2... y 1,9 Y =....... y 63,0 y 63,1 y 63,2... y 63,9 Moacir A. Ponti How Deep Learning Works 39 / 74

Neural networks: from shallow to deep Simple Neural Network Backpropagation Algorithm that recursively apply chain rule to compute weight adaptation for all parameters. Forward: compute the loss function for some training input over all neurons, Backward: apply chain rule to compute the gradient of the loss function, propagating through all layers of the network, in a graph structure Moacir A. Ponti How Deep Learning Works 40 / 74

Neural networks: from shallow to deep Simple Neural Network Simple NN with two layers The linear classifier was defined as f (W, x) = W x A two-layer neural network could be seen as: f (W 2, max(0, W 1 x)) input: image 32 32 3 hidden layer: 256 neurons output: vector with 3 scores Moacir A. Ponti How Deep Learning Works 41 / 74

Neural networks: from shallow to deep Simple Neural Network Simple NN with two layers 3072 W1 256 W2 3072x256 256x3 3 x h o Moacir A. Ponti How Deep Learning Works 42 / 74

Neural networks: from shallow to deep Simple Neural Network Simple NN with two layers softmax 1(w3,1 t x3 + b3) softmax 2(w3,2 t x3 + b3) softmax 3(w3,3 t x3 + b3) softmax 4(w3,4 t x3 + b3) softmax 5(w3,5 t x3 + b3) softmax 6(w3,6 t x3 + b3)... softmax 7(w3,7 t x3 + b3) softmax 8(w3,8 t x3 + b3) softmax 9(w3,9 t x3 + b3) softmax 10(w3,10 t x3 + b3) f 1(x 1) = s(w t 1,j x1 + b1) f 2(x 2) = s(wt 2,j x 2 + b 2) f 3(x 3) = softmax(w3,j t x3 + b3) Moacir A. Ponti How Deep Learning Works 43 / 74

Convolutional Neural Networks Agenda 1 Image classification basics 2 Searching for an image classification method that beats humans 3 Neural networks: from shallow to deep Motivation and definitions Linear function, loss function, optimization Simple Neural Network 4 Convolutional Neural Networks Current Architectures Guidelines for training 5 How it Works, Limitations and final remarks Moacir A. Ponti How Deep Learning Works 44 / 74

Convolutional Neural Networks Architecture LeNet New terminology: Convolutional layer Pooling Feature (or Activation) maps Fully connected (or Dense) layer Moacir A. Ponti How Deep Learning Works 45 / 74

Convolutional Neural Networks Convolutional layer Input (N M L) e.g. 32 32 3 Filter (neuron) w with P Q D, e.g. 5 5 3 (keeps depth) Each neuron/filter performs a convolution with the input image Centred at a specific pixel, we have, mathematically w T x + b Moacir A. Ponti How Deep Learning Works 46 / 74

Convolutional Neural Networks Convolutional layer: feature maps Moacir A. Ponti How Deep Learning Works 47 / 74

Convolutional Neural Networks Convolutional layer: local receptive field Moacir A. Ponti How Deep Learning Works 48 / 74

Convolutional Neural Networks Convolutional layer: input x filter x stride The convolutional layer must take into account input size filter size convolution stride An input with size N I N I, filter size P P and stride s will produce an output with size: Examples: (7 3)/1 + 1 = 5 (7 3)/2 + 1 = 3 (7 3)/3 + 1 = 2.3333 N O = (N I P) s + 1 Moacir A. Ponti How Deep Learning Works 49 / 74

Convolutional Neural Networks Convolutional layer Feature maps are stacked images generated after convolution with filters followed by an activation function (e.g. ReLU) 01 (32 32 3) convolution layer 10 filters of 5 5 3 02... size (28 28 10) 10 Moacir A. Ponti How Deep Learning Works 50 / 74

Convolutional Neural Networks Convolutional layer: zero padding In practice, zero padding is used to avoid losing borders. Example: input size: 10 10 filter size: 5 5 convolution stride: 1 zero padding: 1 output: 10 10 General rule: zero padding size to preserve image size: (P 1)/2 Example: 32 32 3 input with P = 5, s = 1 and zero padding z = 2 Output size: (N I + (2 z) P)/s + 1 = (32 + (2 2) 5)/1 + 1 = 32 Moacir A. Ponti How Deep Learning Works 51 / 74

Convolutional Neural Networks Convolutional layer: number of parameters Parameters in a convolutional layer is [(P P d) + 1] K: filter weights: P P d, d is given by input depth number of filters/neurons: K (each processes input in a different way) +1 is the bias term Example, with an image input 32 32 3: Conv Layer 1: P = 5, K = 8 Conv Layer 2: P = 5, k = 16 Conv Layer 3: P = 1, k = 32 # parameters Conv layer 1: [(5 5 3) + 1] 8 = 608 # parameters Conv layer 2: [(5 5 8) + 1] 16 = 3216 # parameters Conv layer 3: [(1 1 16) + 1] 32 = 544 Moacir A. Ponti How Deep Learning Works 52 / 74

Convolutional Neural Networks Convolutional layer: pooling Operates over each feature map, to make the data smaller Example: max pooling with downsampling factor 2 and stride 2. Moacir A. Ponti How Deep Learning Works 53 / 74

Convolutional Neural Networks Convolutional layer: convolution + activation + pooling Convolution: as seen before Activation: ReLU Pooling: maxpooling Moacir A. Ponti How Deep Learning Works 54 / 74

Convolutional Neural Networks Fully connected layer + Output layer Fully connected (FC) layer: FC layers work as in a regular Multilayer Perceptron A given neuron operates over all values of previous layer Output layer: each neuron represents a class of the problem Moacir A. Ponti How Deep Learning Works 55 / 74

Convolutional Neural Networks Visualization Donglai et al. Understanding Intra-Class Knowledge Inside CNN, 2015, Tech Report Moacir A. Ponti How Deep Learning Works 56 / 74

Convolutional Neural Networks Current Architectures Agenda 1 Image classification basics 2 Searching for an image classification method that beats humans 3 Neural networks: from shallow to deep Motivation and definitions Linear function, loss function, optimization Simple Neural Network 4 Convolutional Neural Networks Current Architectures Guidelines for training 5 How it Works, Limitations and final remarks Moacir A. Ponti How Deep Learning Works 57 / 74

Convolutional Neural Networks Current Architectures AlexNet (Krizhevsky, 2012) 60 million parameters. input 224 224 conv1: K = 96 filters with 11 11 3, stride 4, conv2: K = 256 filters with 5 5 48, conv3: K = 384 filters with 3 3 256, conv4: K = 384 filters with 3 3 192, conv5: K = 256 filters with 3 3 192, fc1, fc2: K = 4096. Moacir A. Ponti How Deep Learning Works 58 / 74

Convolutional Neural Networks Current Architectures VGG 19 (Simonyan, 2014) +layers, filter size = less parameters input 224 224, filters: all 3 3, conv 1-2: K = 64 + maxpool conv 3-4: K = 128 + maxpool conv 5-6-7-8: K = 256 + maxpool conv 9-10-11-12: K = 512 + maxpool conv 13-14-15-16: K = 512 + maxpool fc1, fc2: K = 4096 Moacir A. Ponti How Deep Learning Works 59 / 74

Convolutional Neural Networks Current Architectures GoogLeNet (Szegedy, 2014) 22 layers Starts with two convolutional layers Inception layer ( filter bank ): filters 1 1, 3 3, 5 5 + max pooling 3 3; reduce dimensionality using 1 1 filters. 3 classifiers in different parts Blue = convolution, Red = pooling, Yellow = Softmax loss fully connected layers Green = normalization or concatenation Moacir A. Ponti How Deep Learning Works 60 / 74

Convolutional Neural Networks Current Architectures GoogLeNet: inception module 1 1 convolution reduces the depth of previous layers by half this is needed to reduce complexity (e.g. from 256 to 128 d) concatenates 3 filters plus an extra max pooling filter (because). Moacir A. Ponti How Deep Learning Works 61 / 74

Convolutional Neural Networks Current Architectures Inception modules (V2 and V3) multiple 3 3 convs. flattened conv. decrease size Moacir A. Ponti How Deep Learning Works 62 / 74

Convolutional Neural Networks Current Architectures VGG19 vs VGG34 vs ResNet34 Moacir A. Ponti How Deep Learning Works 63 / 74

Convolutional Neural Networks Current Architectures Residual Network ResNet (He et al, 2015) Reduces number of filters, increases number of layers (34-1000). Residual architecture: add identity before activation of next layer. Moacir A. Ponti How Deep Learning Works 64 / 74

Convolutional Neural Networks Current Architectures Comparison Thanks to Qingping Shan www.qingpingshan.com Moacir A. Ponti How Deep Learning Works 65 / 74

Convolutional Neural Networks Current Architectures Densenet Input BN + ReLU + Conv: H1 BN + ReLU + Conv: H2 BN + ReLU + Conv: H3 BN + ReLU + Conv: H4 BN + ReLU + Conv: H5 Transition (BN+Conv+Pool) Moacir A. Ponti How Deep Learning Works 66 / 74

Convolutional Neural Networks Guidelines for training Agenda 1 Image classification basics 2 Searching for an image classification method that beats humans 3 Neural networks: from shallow to deep Motivation and definitions Linear function, loss function, optimization Simple Neural Network 4 Convolutional Neural Networks Current Architectures Guidelines for training 5 How it Works, Limitations and final remarks Moacir A. Ponti How Deep Learning Works 67 / 74

Convolutional Neural Networks Guidelines for training Tricks Batch Mini-batch: in order to make it easier to process, on SGD use several images at the same time, Mini-batch size: 128 or 256, if not enough memory, 64 or 32, Batch normalization: when using ReLU, normalize the batch. Convergence and training set Learning rate: in SGD apply a decaying learning rate, a fixed momentum, Clean data: cleaniness of the data is very important, Data augmentation: generate new images by perturbation of existing ones, Loss, validation and training error: plot values for each epoch. Moacir A. Ponti How Deep Learning Works 68 / 74

Convolutional Neural Networks Guidelines for training Guidelines for new data Classification (finetuning) Data similar to ImageNet: fix all Conv Layers, train FC layers Data not similar to ImageNet: fix lower Conv Layers, train others Moacir A. Ponti How Deep Learning Works 69 / 74

Convolutional Neural Networks Guidelines for training Guidelines for new data Feature extraction for image classification and retrieval Perform forward, get activation values of higher Conv and/or FC layers Apply some dimensionality reduction: e.g. PCA, Product Quantization, etc. Use external classifier: e.g. SVM, k-nn, etc. Moacir A. Ponti How Deep Learning Works 70 / 74

How it Works, Limitations and final remarks How it Works Let an integer k 1 for any dimension d 1. Exists a function f : R d R computed via a ReLU neural network with 2k 3 + 8 layers (3k 3 + 12 neuros) and 4 + d distinct parameters so that inf g C [0,1] d f (x) g(x) dx 1 64, C are: (1) functions computed by networks (t, α, β)-semi-algebric with k layers and 2 k /(tαβ) neurons (ReLU and maxpool); (2) functions computed by linearly combined decision trees t with 2 k3 /t neurons such as boosted decision trees.. Moacir A. Ponti How Deep Learning Works 71 / 74

How it Works, Limitations and final remarks Limitations CNNs are easily fooled Moacir A. Ponti How Deep Learning Works 72 / 74

How it Works, Limitations and final remarks Concluding remarks Deep Learning is not a panacea; There are important concerns about generalization of Deep Networks; However those methods can be really useful for finding representations; Many challenges and research frontiers for more complex tasks. Moacir A. Ponti How Deep Learning Works 73 / 74

How it Works, Limitations and final remarks References Ponti, M.; Paranhos da Costa, G. Como funciona o Deep Learning. Tópicos em Gerenciamento de Dados e Informações. 2017. Ponti, M.; Ribeiro, L.; Nazare, T.; Bui, T.; Collomosse, J. Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask. In: SIBGRAPI Conference on Graphics, Patterns and Images, 2017. http://sibgrapi.sid.inpe.br/rep/sid.inpe.br/sibgrapi/2017/09.05.22.09 LeCun, Y. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. Demos : http://yann.lecun.com/exdb/lenet Szegedy, C. et al. Going Deeper with Convolutions. CVPR 2015 http://www.cs.unc.edu/~wliu/papers/googlenet.pdf. He, K et al Deep Residual Learning for Image Recognition https://arxiv.org/abs/1512.03385 Donglai et al. Understanding Intra-Class Knowledge Inside CNN, 2015, Tech Report. http://vision03.csail.mit.edu/cnn_art/index.html Mahendran and Vedaldi. Understanding Deep Image Representations by Inverting Them, 2014. Moacir A. Ponti How Deep Learning Works 74 / 74