Machine Learning. MGS Lecture 3: Deep Learning

Similar documents
Deep Learning for Computer Vision II

ImageNet Classification with Deep Convolutional Neural Networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Deep Learning with Tensorflow AlexNet

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Convolutional Neural Networks

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Perceptron: This is convolution!

Global Optimality in Neural Network Training

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Backpropagation + Deep Learning

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Advanced Introduction to Machine Learning, CMU-10715

Advanced Video Analysis & Imaging

Deep Neural Networks:

Deep Learning and Its Applications

Lecture 37: ConvNets (Cont d) and Training

Deconvolutions in Convolutional Neural Networks

All You Want To Know About CNNs. Yukun Zhu

Deep Learning. Volker Tresp Summer 2014

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Computer Vision Lecture 16

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Structured Prediction using Convolutional Neural Networks

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Machine Learning 13. week

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Keras: Handwritten Digit Recognition using MNIST Dataset

Seminars in Artifiial Intelligenie and Robotiis

Study of Residual Networks for Image Recognition

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Artificial Intelligence

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

CSE 559A: Computer Vision

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016

Deep Learning & Neural Networks

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Como funciona o Deep Learning

A Torch Library for Action Recognition and Detection Using CNNs and LSTMs

Know your data - many types of networks

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Deep Learning for Vision: Tricks of the Trade

Deep Learning. a.k.o. Neural Network

INTRODUCTION TO DEEP LEARNING

Project 3 Q&A. Jonathan Krause

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Convolutional Neural Networks

Image and Video Understanding

DEEP LEARNING TO DIVERSIFY BELIEF NETWORKS FOR REMOTE SENSING IMAGE CLASSIFICATION

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CS 179 Lecture 16. Logistic Regression & Parallel SGD

ConvolutionalNN's... ConvNet's... deep learnig

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Dynamic Routing Between Capsules

Introduction to Deep Learning

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

Locally Scale-Invariant Convolutional Neural Networks

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Deep Learning. Volker Tresp Summer 2017

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

ROB 537: Learning-Based Control

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Weakly Supervised Object Recognition with Convolutional Neural Networks

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

arxiv: v2 [cs.ne] 2 Dec 2015

Unsupervised Learning of Spatiotemporally Coherent Metrics

Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Lecture 7: Neural network acoustic models in speech recognition

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Natural Language Processing with Deep Learning CS224N/Ling284. Christopher Manning Lecture 4: Backpropagation and computation graphs

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

MoonRiver: Deep Neural Network in C++

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Transcription:

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer Deep network: simplest case What we really mean with deep network (each rectangle is a layer):

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Definition: Hierarchical organisation with more than one (non-linear) hidden layer in-between the input and the output variables Output of one layer is the input of the next layer Methods: (Deep) Neural Networks Convolutional Neural Networks Restricted Boltzmann Machines/Deep Belief Networks Recurrent Neural Networks

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHY DEEP LEARNING? Hierarchy is a powerful and compact representation: Deep: (x 1 + x 2 + a)(x 3 + x 4 + b) = x 1 x 3 + x 1 x 4 + bx 1 + x 2 x 3 + x 2 x 4 + bx 2 + ax 3 + ax 4 + ab Shallow: x 1 + x 2 + x 3 + x 4 + a

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHY DEEP LEARNING Sharing lower-level representations to build an object Natural data organisation

CONVOLUTIONAL NEURAL NETWORKS Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/

Sliding Window Convolution Convolve one filter over the whole image RESPONSE MAP

Convolutions in a NN Not fixed but learnt from data!! Filter weights (NN weights) Value of a Hidden Neuron Sigmoid( -8 ) or Response ReLU( -8 ) or Input layer: Pixel values

Convolutions in a NN Using a 3 x 3 convolution filter Jumping one pixel at a time (stride = 1) Response map dim( Response ) = (m-2) x (n-2) Input Image dim( Im ) = m x n

Link to linear classification The most simple NN: convolution + decision (thresholding) At test time it is like Logistic Regression Convolution Decision

Multiple Convolutions Grayscale im (1 channel), n convolutions n response maps = n-channel image You can go deep!

Pooling Pooling: Spatial sub-sampling

Conv. + Max-Pooling Response map Input Input

Other Aspects Stride: Information of contiguous patches is highly redundant Stride = 2 means convolutions are done every 2 pixels Fully connected layers: Last 1 to 3 layers of the networks, Like a standard NN Decision layer Last layer: make a decision (typically a logistic regressor)

What does a CNN look like?: AlexNet 7 hidden layers 650.000 neurons 60.000.000 parameters Trained on 2 GPUs for a week (the final CNN) A. Krizhevsky, I. Sutskever, G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

A more modern one: GoogLeNet

Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ BREAK

Training Convolutional Neural Networks

Training formulation How well does the model fit the training data? L2-norm (k outputs) Penalise complex solutions to avoid overfitting (Multinomial) logistic regression Anything differentiable Optimised through gradient descent! (Backpropagation algorithm)

Stochastic Gradient Descent (SGD) Training set often too large to even keep it in memory!! SGD Minimisation through Gradient Descent Minimisation through Stochastic Gradient Descent In practice you can do this with mini-batches

Practicalities of GD As always, take care of overfitting. When using SGD, use proxy of the full loss (e.g. running average?) Empirical approximation of a derivative You will always make mistakes computing derivatives. Use this to check them

The challenges of training Underfitting: The network is trained only to a sub-optimal configuration Variants of gradient descent Sheer computational power ReLU Overfitting: The network does not generalise well to new data Standard regularisation Pre-trained models Drop-out

Backpropagation and vanishing gradient Chain rule: We make an error: We would like NOT to make an error: Backprop can be derived simply using the chain rule Rectified Linear Unit (ReLU) Strong gradient only between -1 and 1. Then almost flat

Making things work Problem: - small gradients - large flat valleys Solutions: Variants of Gradient Descent Powerful GPU for massive convolution parallelisation (CUDA) Patience: Training from scratch takes weeks/months Pre-training

Layer-wise greedy pre-training - Often unsupervised: Pick random images and try to represent them - Greedy learning: each layer is optimised independently of the others Fine-tune with supervised data specific of the problem at hand. Do whole network at once - Large amount of data. Since not labelled, pick a lot from everywhere

The challenges of training Underfitting: The network is trained only to a sub-optimal configuration Variants of gradient descent Sheer computational power ReLU Overfitting: The network does not generalise well to new data Standard regularisation Pre-trained models Drop-out

Use pre-trained networks Most modern approach (last 1-2 years): Take a pre-trained very deep CNN + fine-tune it to your problem! M. Oquab, L. Bottou, I. Laptev, J. Sivic, Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks, CVPR 2014.

Regularisation Penalise complex solutions to avoid overfitting AlexNet has 60.000.000 parameters!! and it is a simple CNN L2 regularisation L1 regularisation (enforces sparsity)

Regularisation: Dropout Idea: cripple the neural network by removing hidden layers stochastically Each hidden unit is set to 0 with probability 0.5 Hidden units cannot co-adapt to other units Hidden units must be more generally useful You can use different dropout probabilities, but 0.5 works well in practice

Publicly-available packages

How good does it get? http://parkorbird.flickr.com/ http://demo.caffe.berkeleyvision.org/

Deep learning = CNN? (Sneak Peak)

Generative Models You can reconstruct data of a given class Auto-Encoder Restricted Boltzmann Machines

Temporal data Recurrent Neural Networks (e.g. LSTM-NN)

Visualise high-dimensional data

All kinds of problems

Even audio can be done with CNN

3D faces from 2D images Created by Aaron Jackson in our own CVL http://cvl-demos.cs.nott.ac.uk/vrn/