COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Similar documents
Deep Learning with Tensorflow AlexNet

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks

Deep Learning for Computer Vision II

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Know your data - many types of networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Learning Deep Features for Visual Recognition

INTRODUCTION TO DEEP LEARNING

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Perceptron: This is convolution!

Introduction to Neural Networks

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Convolutional Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Study of Residual Networks for Image Recognition

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Learning Deep Representations for Visual Recognition

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Inception Network Overview. David White CS793

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

ImageNet Classification with Deep Convolutional Neural Networks

Fuzzy Set Theory in Computer Vision: Example 3

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Supplementary material for Analyzing Filters Toward Efficient ConvNet

ECE 5470 Classification, Machine Learning, and Neural Network Review

Machine Learning. MGS Lecture 3: Deep Learning

11. Neural Network Regularization

Deconvolutions in Convolutional Neural Networks

Seminars in Artifiial Intelligenie and Robotiis

Report: Privacy-Preserving Classification on Deep Neural Network

Structured Prediction using Convolutional Neural Networks

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Dynamic Routing Between Capsules

CNN Basics. Chongruo Wu

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Como funciona o Deep Learning

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Elastic Neural Networks for Classification

CSE 559A: Computer Vision

Survey of Convolutional Neural Network

Advanced Introduction to Machine Learning, CMU-10715

Deep Learning and Its Applications

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Keras: Handwritten Digit Recognition using MNIST Dataset

Lecture 37: ConvNets (Cont d) and Training

All You Want To Know About CNNs. Yukun Zhu

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

Advanced Video Analysis & Imaging

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Predicting Goal-Scoring Opportunities in Soccer by Using Deep Convolutional Neural Networks. Master s Thesis

Deep Neural Networks:

Advanced Machine Learning

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

Machine Learning 13. week

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Computer Vision Lecture 16

Face Recognition A Deep Learning Approach

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Residual Networks for Tiny ImageNet

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Chapter 8: Convolutional Networks

Convolutional Neural Networks and Supervised Learning

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Deep Learning in Image Processing

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

arxiv: v1 [cs.cv] 18 Jan 2018

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance

Channel Locality Block: A Variant of Squeeze-and-Excitation

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Fusion of Mini-Deep Nets

Neural style transfer

Introduction to Neural Networks

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning Cook Book

6. Convolutional Neural Networks

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong

Recurrent Convolutional Neural Networks for Scene Labeling

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Lecture : Training a neural net part I Initialization, activations, normalizations and other practical details Anne Solberg February 28, 2018

Transfer Learning. Style Transfer in Deep Learning

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Deep Residual Learning

Transcription:

COMP9444 Neural Networks and Deep Learning 7. Image Processing

COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization Residual Networks Dense Networks Style Transfer

COMP9444 17s2 Image Processing 2 MNIST Handwritten Digit Dataset black and white, resolution 28 28 60,000 images 10 classes (0,1,2,3,4,5,6,7,8,9)

COMP9444 17s2 Image Processing 3 CIFAR Image Dataset color, resolution 32 32 50,000 images 10 classes

COMP9444 17s2 Image Processing 4 ImageNet LSVRC Dataset color, resolution 227 227 1.2 million images 1000 classes

COMP9444 17s2 Image Processing 5 Image Processing Tasks image classification object detection object segmentation style transfer generating images generating art image captioning

COMP9444 17s2 Image Processing 6 Object Detection

COMP9444 17s2 Image Processing 7 LeNet trained on MNIST The 5 5 window of the first convolution layer extracts from the original 32 32 image a 28 28 array of features. Subsampling then halves this size to 14 14. The second Convolution layer uses another 5 5 window to extract a 10 10 array of features, which the second subsampling layer reduces to 5 5. These activations then pass through two fully connected layers into the 10 output units corresponding to the digits 0 to 9.

COMP9444 17s2 Image Processing 8 ImageNet Architectures AlexNet, 8 layers (2012) VGG, 19 layers (2014) GoogleNet, 22 layers (2014) ResNets, 152 layers (2015)

COMP9444 17s2 Image Processing 9 AlexNet Architecture 5 convolutional layers + 3 fully connected layers max pooling with overlapping stride softmax with 1000 classes 2 parallel GPUs which interact only at certain layers

COMP9444 17s2 Image Processing 10 Convolutional Neural Networks j k l Assume the original image is J K, with L channels. We apply an M N filter to these inputs to compute one hidden unit in the convolution layer. In this example J = 6,K = 7,L=3,M = 3,N = 3.

COMP9444 17s2 Image Processing 11 Convolutional Neural Networks j j+m k k+n Z j,k i = g( b i + M 1 m=0 N 1 n=0 K l,m,n i V j+m, l k+n l The same weights are applied to the next M N block of inputs, to compute the next hidden unit in the convolution layer ( weight sharing ). l )

COMP9444 17s2 Image Processing 12 Convolutional Neural Networks If the original image size is J K and the filter is size M N, the convolution layer will be(j+ 1 M) (K+ 1 N)

COMP9444 17s2 Image Processing 13 Stride Assume the original image is J K, with L channels. We again apply an M N filter, but this time with a stride of s>1. In this example J = 7,K = 9,L=3,M = 3,N = 3,s=2.

COMP9444 17s2 Image Processing 14 Stride Z j,k i = g( b i + M 1 m=0 N 1 n=0 K l,m,n i V l ) j+m, k+n l The same formula is used, but j and k are now incremented by s each time. The number of free parameters is 1+L M N

COMP9444 17s2 Image Processing 15 Stride Dimensions j takes on the values 0,s,2s,...,(J M) k takes on the values 0,s,2s,...,(K N) The next layer is(1+(j M)/s) by (1+(K N)/s)

COMP9444 17s2 Image Processing 16 Stride with Zero Padding When combined with zero padding of width P, j takes on the values 0,s,2s,...,(J+ 2P M) k takes on the values 0,s,2s,...,(K+ 2P N) The next layer is(1+(j+ 2P M)/s) by (1+(K+ 2P N)/s)

COMP9444 17s2 Image Processing 17 Example: AlexNet Conv Layer 1 For example, in the first convolutional layer of AlexNet, J = K = 224, P=2, M = N = 11, s=4. The width of the next layer is 1+(J+ 2P M)/s=1+(224+2 2 11)/4=55 Question: If there are 96 filters in this layer, compute the number of: weights per neuron? neurons? connections? independent parameters?

COMP9444 17s2 Image Processing 18 Example: AlexNet Conv Layer 1 For example, in the first convolutional layer of AlexNet, J = K = 224, P=2, M = N = 11, s=4. The width of the next layer is 1+(J M)/s=1+(224+2 2 11)/4=55 Question: If there are 96 filters in this layer, compute the number of: weights per neuron? 1+11 11 3 = 364 neurons? 55 55 96 = 290,400 connections? 55 55 96 364 = 105,705,600 independent parameters? 96 364 = 34, 944

COMP9444 17s2 Image Processing 19 Max Pooling

COMP9444 17s2 Image Processing 20 Overlapping Pooling If the previous layer is J K, and max pooling is applied with width F and stride s, the size of the next layer will be (1+(J F)/s) (1+(K F)/s) Question: If max pooling with width 3 and stride 2 is applied to the features of size 55 55 in the first convolutional layer of AlexNet, what is the size of the next layer? Answer: 1+(55 3)/2=27. Question: How many independent parameters does this add to the model? Answer: None! (no weights to be learned, just computing max)

COMP9444 17s2 Image Processing 21 AlexNet Details 650K neurons 630M connections 60M parameters more parameters that images danger of overfitting

COMP9444 17s2 Image Processing 22 Enhancements Rectified Linear Units (ReLUs) overlapping pooling (width= 3, stride = 2) stochastic gradient descent with momentum and weight decay data augmentation to reduce overfitting 50% dropout in the fully connected layers

COMP9444 17s2 Image Processing 23 Data Augmentation ten patches of size 224 224 are cropped from each of the original 227 227 images (using zero padding) the horizontal reflection of each patch is also included. at test time, average the predictions on the 10 patches. also include changes in intensity to RGB channels

COMP9444 17s2 Image Processing 24 Convolution Kernels filters on GPU-1 (upper) are color agnostic filters on GPU-2 (lower) are color specific these resemble Gabor filters

COMP9444 17s2 Image Processing 25 Dealing with Deep Networks > 10 layers weight initialization batch nomalization > 30 layers skip connections > 100 layers identity skip connections

COMP9444 17s2 Image Processing 26 Statistics The mean and variance of a set of n samples x 1,...,x n are given by Mean[x]= 1 n Var[x]= 1 n n k=1 x k n (x k Mean[x]) 2 = k=1 ( 1 n If w k, x k are independent and y= n w k x k then k=1 Var[y] = n Var[w]Var[x] n ) xk 2 Mean[x] 2 k=1

COMP9444 17s2 Image Processing 27 Weight Initialization Consider one layer (i) of a deep neural network with weights w (i) jk at the previous layer to {x (i+1) j } 1 j ni+1 at the next layer, where g() is the transfer function and connecting the activations {x (i) k } 1 k n i Then x (i+1) j ( = g(sum (i) n i ) j )=g w (i) jk x(i) k k=1 Var[sum (i) ]= n i Var[w (i) ]Var[x (i) ] Var[x (i+1) ] G 0 n i Var[w (i) ]Var[x (i) ] Where G 0 is a constant whose value is estimated to take account of the transfer function. If some layers are not fully connected, we replace n i with the average number n in i of incoming connections to each node at layer i+1.

COMP9444 17s2 Image Processing 28 Weight Initialization If the nework has D layers, with input x=x (1) and output z=x (D+1), then Var[z] ( D i=1 ) G 0 n in i Var[w (i) ] Var[x] When we apply gradient descent through backpropagation, the differentials will follow a similar pattern: Var[ x ] ( D i=1 ) G 1 n out i Var[w (i) ] Var[ z ] In this equation, n out i is the average number of outgoing connections for each node at layer i, and G 1 is meant to estimate the average value of the derivative of the transfer function. For Rectified Linear Units, we can assume G 0 = G 1 = 1 2

COMP9444 17s2 Image Processing 29 Weight Initialization In order to have healthy forward and backward propagation, each term in the product must be approximately equal to 1. Any deviation from this could cause the activations to either vanish or saturate, and the differentials to either decay or explode exponentially. Var[z] ( D i=1 ) G 0 n in i Var[w (i) ] Var[x] Var[ x ] ( D i=1 ) G 1 n out i Var[w (i) ] Var[ z ] We therefore choose the initial weights {w (i) jk } in each layer (i) such that G 1 n out i Var[w (i) ]=1

COMP9444 17s2 Image Processing 30 Weight Initialization 22-layer ReLU network (left), G 1 = 1 2 converges faster than G 1 = 1 30-layer ReLU network (right), G 0 = 1 2 is successful while G 1 = 1 fails to learn at all

COMP9444 17s2 Image Processing 31 Batch Normalization We can normalize the activations x (i) k of node k in layer (i) relative to the mean and variance of those activations, calculated over a mini-batch of training items: ˆx (i) k = x(i) k Mean[x(i) k ] Var[x (i) k ] These activations can then be shifted and re-scaled to β (i) k,γ(i) k y (i) k = β (i) k + γ(i) k are additional parameters, for each node, which are trained by backpropagation along with the other parameters (weights) in the network. After training is complete, Mean[x (i) k ] and Var[x(i) k ] are either pre-computed on the entire training set, or updated using running averages. ˆx(i) k

COMP9444 17s2 Image Processing 32 Going Deeper If we simply stack additional layers, it can lead to higher training error as well as higher test error.

COMP9444 17s2 Image Processing 33 Residual Networks Idea: Take any two consecutive stacked layers in a deep network and add a skip connection which bipasses these layers and is added to their output.

COMP9444 17s2 Image Processing 34 Residual Networks the preceding layers attempt to do the whole job, making x as close as possible to the target output of the entire network F(x) is a residual component which corrects the errors from previous layers, or provides additional details which the previous layers were not powerful enough to compute with skip connections, both training and test error drop as you add more layers with more than 100 layers, need to apply relu before adding the residual instead of afterwards. This is called an identity skip connection.

COMP9444 17s2 Image Processing 35 Dense Networks Recently, good results have been achieved using networks with densely connected blocks, within which each layer is connected by shortcut connections to all the preceding layers.

COMP9444 17s2 Image Processing 36 Texture Synthesis

COMP9444 17s2 Image Processing 37 Neural Texture Synthesis 1. pretrain CNN on ImageNet (VGG-19) 2. pass input texture through CNN; compute feature map F l ik for ith filter at spatial location k in layer (depth) l 3. compute the Gram matrix for each pair of features Gi l j = Fik l F jk l k 4. feed (initially random) image into CNN 5. compute L2 distance between Gram matrices of original and new image 6. backprop to get gradient on image pixels 7. update image and go to step 5.

COMP9444 17s2 Image Processing 38 Neural Texture Synthesis We can introduce a scaling factor w l for each layer l in the network, and define the Cost function as E style = 1 4 L l=0 w l N 2 l M2 l i, j (G l i j A l i j) 2 where N l, M l are the number of filters, and size of feature maps, in layer l, and Gi l j, Al i j are the Gram matrices for the original and synthetic image.

COMP9444 17s2 Image Processing 39 Neural Style Transfer content + style new image

COMP9444 17s2 Image Processing 40 Neural Style Transfer

COMP9444 17s2 Image Processing 41 Neural Style Transfer For Neural Style Transfer, we minimize a cost function which is E total = α E content + β E style where = α 2 i,k F l ik (x) F l ik (x c) 2 + β 4 L l=0 w l Nl 2 (G M2 i l j Ai l j) 2 l i, j x c,x F l ik N l, M l w l G l i j, Al i j = content image, synthetic image = i th filter at position k in layer l = number of filters, and size of feature maps, in layer l = weighting factor for layer l = Gram matrices for style image, and synthetic image

COMP9444 17s2 Image Processing 42 References ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., 2015. Understanding the difficulty of training deep feedforward neural networks, Glorot & Bengio, 2010. Batch normalization: Accelerating deep network training by reducing internal covariate shift, Ioffe & Szegedy, ICML 2015. Deep Residual Learning for Image Recognition, He et al., 2016. Densely Connected Convolutional Networks, Huang et al., 2016. A Neural Algorithm of Artistic Style, Gatys et al., 2015.