Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Size: px
Start display at page:

Download "Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart"

Transcription

1 Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017

2 Neural Networks Consider a regression problem with input x R d and output y R Linear function: (β R d ) f(x) = β x 1-layer Neural Network function: (W 0 R h 1 d ) f(x) = β σ(w 0x) 2-layer Neural Network function: f(x) = β σ(w 1σ(W 0x)) Neural Networks are a special function model y = f(x, w), i.e. a special way to parameterize non-linear functions 2/22

3 Neural Networks: Training How to determine the weights W l,ij in the layer l for the node j, given the sample {x i, y i }? Idea: Initialize the weights W l,j for each layer l and each node j First, propagate x i through the network, bottom up (Forward Propagation) Then, compute the error between prediction and ground-truth y i, given an error function l Subsequently, propagate the error backwards through the network, and recursively compute the error gradients for each W l,ij (Back-Propagation) Update the weights W l,j using the computed error gradients for each sample {x i, y i} Notation: Consider L hidden layers, each h l -dimensional let z l = W l-1 x l-1 be the inputs to all neurons in layer l let x l = σ(z l ) the activation of all neurons in layer l redundantly, we denote by x 0 x the activation of the input layer, and by φ(x) x L the activation of the last hidden layer 3/22

4 Neural Networks: Basic Equations Forward propagation: An L-layer NN recursively computes, for l = 1,.., L, l=1,..,l : z l = W l-1 x l-1, x l = σ(z l ) and then computes the output f z L+1 = W Lx L Backpropagation: Given some loss l(f), let δ L+1 = l. We can recursivly f compute the loss-gradient w.r.t. the inputs of layer l: l=l,..,1 : δ l = dl = dl z l+1 x l = [δ l+1 W l ] [x l (1 x l )] dz l dz l+1 x l z l where is an element-wise product. The gradient w.r.t. weights is: dl = dl dw l,ij dz l+1,i z l+1,i W l,ij = δ l+1,i x l,j or dl dw l = δ l+1x l Weight-update: many ways of different weight-updates possible, given gradients dl dw l for example, the delta rule: W new l = W old l + W l = W old l η dl dw l 4/22

5 Neural Networks: Regression In the standard regression case, y R, we typically assume a squared error loss l(f) = i (f(x i, w) y i ) 2. We have δ L+1 = i 2(f(x i, w) y i ) Regularization: Add a L 2 or L 1 regularization. First compute all gradients as before, then add λw l,ij (for L 2 ), or λ sign W l,ij (for L 1 ) to the gradient. Historically, this is called weight decay, as the additional gradient leads to a step decaying the weighs. The optimal output weights are as for standard regression W L = (X X + λi) -1 X y where X is the data matrix of activations x L φ(x) 5/22

6 Neural Networks: Classification Consider the multi-class case y {1,.., M}. Then we have M output neurons to represent the discriminative function f(x, y, w) = (W Lz L ) y, W R M h L Choosing neg-log-likelihood objective logistic regression Choosing hinge loss objective NN + SVM For given x, let y be the correct class. The one-vs-all hinge loss is: y y max{0, 1 (f y f y)} For output neuron y y this implies a gradient δ y = [f y < f y + 1] For output neuron y this implies a gradient δ y = y y [f y < f y + 1] Only data points inside the margin induce an error (and gradient). This is also called Perceptron Algorithm 6/22

7 Neural Networks: Dimensionality Reduction Dimensionality reduction can be performed with autoencoders An autoencoder typically is a NN of the type which is trained to reproduce the input: min i y(x i ) x i 2 The hidden layer ( bottleneck ) needs to find a good representation/compression. Similar to the PCA objective, but nonlinear Stacking autoencoders (Deep Autoencoders): 7/22

8 Remarks NN is usually trained based on the gradient W l f(x) (The output weights can be optimized analytically as for linear regression) NNs are a very powerful function class (By tweaking/training the weights one can approximate any non-linear function) BUT: Are there any guarantees on generalization? What happens with the gradients, when the NN is very deep? How can NN be used to learn intelligent (autonomous) behavior (e.g. Autonomous Learning, Reinforcement Learning, Robotics, etc.)? Is there any insight on what the neurons will actually represent (e.g. discovering/developing abstractions, hierarchies, etc.)? Deep Learning is a revival of Neural Networks and was mainly driven by the latter, i.e. learning useful representations 8/22

9 Deep Learning: Basic Concept Idea: learn hierarchical features from data, from simple features to complex features Deep Learning can also be performed with other frameworks, e.g. Deep Gaussian Processes So what changed towards classical NN? Algorithmic advancement e.g. Dropout, ReLUs, Pre-training More general models, e.g. Deep GPs, Deep Kernel Machines,... More computational power (e.g. GPUs) Large data sets Deep Learning is useful for very high dimensional problems with many labeled or unlabeled samples (e.g. vision and speech tasks) 9/22

10 Typical Process to Train a Deep Network pre-process data, e.g. ZCA, distortions network type, e.g. convolutional network activation function, e.g. ReLU regularization, e.g. dropout network training, e.g. stochastic gradient descent with Adadelta combining multiple models, e.g. ensemble of networks optimizing high-level parameters, e.g. with Bayesian optimization Many heuristics involved when training Deep Networks 10/22

11 Example: 2-D Convolutional Network Open parameters: Nr. of layers Nr. of feature maps per convolution Filter size for each convolution Subsamling size Nr. of hidden units 11/22

12 Pre-Processing Steps 1. Removing means from images 2. Distortions of images 3. Zero Component Analysis Subtracting mean from images Standardizing the data Add distorted images to training data Randomly translate & rotate images Zero Component Analysis Perform transformation: x = P T Λ 1 P x where Λ = diag( σ 1 + ɛ, σ 2 + ɛ,..., σ n + ɛ) In practice, ɛ has the effect of strengthening the edges 12/22

13 Activation Function: Rectified Linear Units New activation function: rectified linear units (ReLUs) ReLU: f(z) = max(0, z) non-saturating sparse activation helps against vanishing gradients Relation to logistic activations n=1 logistic(z n) log(1 + e z ) max(0, z) 13/22

14 Deep Networks and Overfitting Overfitting: good training, bad testing performance. Deep models are very sensitive to overfitting, due complex model structures. How to avoid overfitting Weight-decay, penalize W 1 or W 2 Early stopping: recognize overfitting on validation data set Pre-training: initialize parameters meaningful Dropout 14/22

15 Dropout Training (Backpropagation): randomly deactivate each unit with probability p compute error for new network architecture perform gradient descent step Prediction (Forward Propagation): multiply output of each unit by p preserves expected value of output for single layer /22

16 Dropout Training (Backpropagation): randomly deactivate each unit with probability p compute error for new network architecture perform gradient descent step Prediction (Forward Propagation): multiply output of each unit by p preserves expected value of output for single layer /22

17 ADADELTA: Stochastic Gradient Descent Computation of update steps on batch of samples ADADELTA uses only first-order gradients Simple in implementation and application Appl. for large data and number of parameters ( ) ADADELTA Update rule: x t+1 = x t + x t, where x t = η t g t = α T i=1 ρi (1 ρ) x t i T g i=0 ρi (1 ρ) g t t i Remarks: Adaptive learning rate η t. Parameters α and ρ muss be chosen Estimation of learning rate from previous gradients g t and t The algorithm has shown to work well in practice 16/22

18 Bayesian Optimization Optimizing selected network parameters, e.g. decay rate ρ Objective function unknown (i.e. parameters pred. errors) Bayesian Optimization: Optimizing while approximating objective function Infering objective functions from data, i.e. [parameters, errors] 17/22

19 Bayesian Optimization Optimizing selected network parameters, e.g. decay rate ρ Objective function unknown (i.e. parameters pred. errors) Bayesian Optimization: Optimizing while approximating objective function Infering objective functions from data, i.e. [parameters, errors] Initializing parameters 17/22

20 Bayesian Optimization Optimizing selected network parameters, e.g. decay rate ρ Objective function unknown (i.e. parameters pred. errors) Bayesian Optimization: Optimizing while approximating objective function Infering objective functions from data, i.e. [parameters, errors] Initializing parameters Training the network with the parameters 17/22

21 Bayesian Optimization Optimizing selected network parameters, e.g. decay rate ρ Objective function unknown (i.e. parameters pred. errors) Bayesian Optimization: Optimizing while approximating objective function Infering objective functions from data, i.e. [parameters, errors] Initializing parameters Training the network with the parameters Compute prediction error on validation data 17/22

22 Bayesian Optimization Optimizing selected network parameters, e.g. decay rate ρ Objective function unknown (i.e. parameters pred. errors) Bayesian Optimization: Optimizing while approximating objective function Infering objective functions from data, i.e. [parameters, errors] Initializing parameters Learn the objective function: Parameters Validation error Training the network with the parameters Compute prediction error on validation data 17/22

23 Bayesian Optimization Optimizing selected network parameters, e.g. decay rate ρ Objective function unknown (i.e. parameters pred. errors) Bayesian Optimization: Optimizing while approximating objective function Infering objective functions from data, i.e. [parameters, errors] Choosing parameters according to a criterion Initializing parameters Learn the objective function: Parameters Validation error Training the network with the parameters Compute prediction error on validation data 17/22

24 Bayesian Optimization with Gaussian Prior Learning objective function with Gaussian process regression GP prediction for a test point x t : N(µ(x t ), ν(x t )) Selection criterion is computed based µ(x t ) and ν(x t ) 1 Φ norm( )... normal accumulative distribution function, φ norm( )... normal probability density function, y best... currently best measurement / observation 18/22

25 Bayesian Optimization with Gaussian Prior Learning objective function with Gaussian process regression GP prediction for a test point x t : N(µ(x t ), ν(x t )) Selection criterion is computed based µ(x t ) and ν(x t ) Expected Improvement criterion for given point x a EI = ν(x) [γ(x)φ norm (γ(x)) + φ norm (x)] ; γ(x) = y best µ(x) Expected Improvement ν(x) Φ norm( )... normal accumulative distribution function, φ norm( )... normal probability density function, y best... currently best measurement / observation 18/22

26 Ensembles Boosting Prediction Performance Standard ML approach to improve test performance: combine output of different models We can use different random weight initializations and training with/without the validation set How to combine the predictions? each network gives us a prediction, e.g. p 1 = (0.4, 0.3, 0.3), p 2 = (0.35, 0.35, 0.3), p 3 = (0.1, 0.9, 0.0) we can take the arithmetic or geometric mean, e.g. p avg = (0.28, 0.52, 0.2) class prediction is the index with the highest score, e.g. class 2 19/22

27 Results on Traffic Sign Recognition CCR (%) Team Deep Learning used IDSIA yes human average BOSCH deep nets yes Sermanet yes CAOR no INI-RTCV no INI-RTCV no INI-RTCV no Correct classification rate (CCR) on the final-stage of the German Traffic Sign Recognition Benchmark: images for training and images for testing from 43 different German road sign classes. 20/22

28 Remarks Various approaches for optimizing and training of Deep Nets, e.g. Bayesian optimization, pre-processing, dropouts... Choice of appropriate techniques based on applications, experiences and knowledge in Machine Learning Try-out of different training approaches Gaining experiences Keep up with the developments in the Deep Learning community Further research problems: Bayesian Deep Learning, unsupervised learning, generative deep models, deep reinforcement learning, adversarial problems, etc. 60 % Error Rate traditional involves deep learning 100 Error Score # DL Publications Google Scholar 1400 traditional involves deep learning % % % 16.4 % % * 2014 * Year * only 10 best results plotted Year Year * * as of October 14, /22

29 Deep Learning further reading Weston, Ratle & Collobert: Deep Learning via Semi-Supervised Embedding, ICML Hinton & Salakhutdinov: Reducing the Dimensionality of Data with Neural Networks, Science 313, pp , Bengio & LeCun: Scaling Learning Algorithms Towards AI. In Bottou et al. (Eds) Large-Scale Kernel Machines, MIT Press Hadsell, Chopra & LeCun: Dimensionality Reduction by Learning an Invariant Mapping, CVPR Glorot, Bengio: Understanding the difficulty of training deep feedforward neural networks, AISTATS 10. Jason Weston et al.: Deep Learning via Semi-SupervisedEmbedding, ICML and newer papers citing those 22/22

Artificial Intelligence

Artificial Intelligence Artificial Intelligence AI & Machine Learning & Neural Nets Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: Neural networks became a central topic for Machine Learning and AI. But in

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick Neural Networks Theory And Practice Marco Del Vecchio marco@delvecchiomarco.com Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Model Generalization and the Bias-Variance Trade-Off

Model Generalization and the Bias-Variance Trade-Off Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Model Generalization and the Bias-Variance Trade-Off Neural Networks and Deep Learning, Springer, 2018 Chapter 4, Section 4.1-4.2 What

More information

Deep Learning & Neural Networks

Deep Learning & Neural Networks Deep Learning & Neural Networks Machine Learning CSE4546 Sham Kakade University of Washington November 29, 2016 Sham Kakade 1 Announcements: HW4 posted Poster Session Thurs, Dec 8 Today: Review: EM Neural

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Advanced Introduction to Machine Learning, CMU-10715

Advanced Introduction to Machine Learning, CMU-10715 Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

Deep Learning. Volker Tresp Summer 2014

Deep Learning. Volker Tresp Summer 2014 Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Introduction to Deep Learning

Introduction to Deep Learning ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,

More information

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning: Dropout Sargur N. srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Deep Learning Cook Book

Deep Learning Cook Book Deep Learning Cook Book Robert Haschke (CITEC) Overview Input Representation Output Layer + Cost Function Hidden Layer Units Initialization Regularization Input representation Choose an input representation

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

On the Effectiveness of Neural Networks Classifying the MNIST Dataset On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Radial Basis Function Networks Adrian Horzyk Preface Radial Basis Function Networks (RBFN) are a kind of artificial neural networks that use radial basis functions (RBF) as activation

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Efficient Algorithms may not be those we think

Efficient Algorithms may not be those we think Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Seminars in Artifiial Intelligenie and Robotiis

Seminars in Artifiial Intelligenie and Robotiis Seminars in Artifiial Intelligenie and Robotiis Computer Vision for Intelligent Robotiis Basiis and hints on CNNs Alberto Pretto What is a neural network? We start from the frst type of artifcal neuron,

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Deep Learning. Volker Tresp Summer 2017

Deep Learning. Volker Tresp Summer 2017 Deep Learning Volker Tresp Summer 2017 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Machine Learning and Object Recognition 2016-2017 Course website: http://thoth.inrialpes.fr/~verbeek/mlor.16.17.php Biological motivation Neuron is basic computational unit

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

Deep Neural Networks with Flexible Activation Function

Deep Neural Networks with Flexible Activation Function Deep Neural Networks with Flexible Activation Function Faculty of Information Engineering, Informatics, and Statistics Master in Artificial Intelligence and Robotics Candidate Alberto Marinelli 1571560

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Learning image representations equivariant to ego-motion (Supplementary material)

Learning image representations equivariant to ego-motion (Supplementary material) Learning image representations equivariant to ego-motion (Supplementary material) Dinesh Jayaraman UT Austin dineshj@cs.utexas.edu Kristen Grauman UT Austin grauman@cs.utexas.edu max-pool (3x3, stride2)

More information

Deep Learning for Vision: Tricks of the Trade

Deep Learning for Vision: Tricks of the Trade Deep Learning for Vision: Tricks of the Trade Marc'Aurelio Ranzato Facebook, AI Group www.cs.toronto.edu/~ranzato BAVM Friday, 4 October 2013 Ideal Features Ideal Feature Extractor - window, right - chair,

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Vulnerability of machine learning models to adversarial examples

Vulnerability of machine learning models to adversarial examples Vulnerability of machine learning models to adversarial examples Petra Vidnerová Institute of Computer Science The Czech Academy of Sciences Hora Informaticae 1 Outline Introduction Works on adversarial

More information

Weiguang Guan Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz

Weiguang Guan Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz Weiguang Guan guanw@sharcnet.ca Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz Outline Part I: Introduction Overview of machine learning and AI Introduction to neural network and deep learning

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th

Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th Today s Story Why does CNN matter to the embedded world? How to enable CNN in

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

Deep Learning. Volker Tresp Summer 2015

Deep Learning. Volker Tresp Summer 2015 Deep Learning Volker Tresp Summer 2015 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there

More information

Deep neural networks II

Deep neural networks II Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Autoencoders, denoising autoencoders, and learning deep networks

Autoencoders, denoising autoencoders, and learning deep networks 4 th CiFAR Summer School on Learning and Vision in Biology and Engineering Toronto, August 5-9 2008 Autoencoders, denoising autoencoders, and learning deep networks Part II joint work with Hugo Larochelle,

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

Probabilistic Siamese Network for Learning Representations. Chen Liu

Probabilistic Siamese Network for Learning Representations. Chen Liu Probabilistic Siamese Network for Learning Representations by Chen Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Minimum Risk Feature Transformations

Minimum Risk Feature Transformations Minimum Risk Feature Transformations Shivani Agarwal Dan Roth Department of Computer Science, University of Illinois, Urbana, IL 61801 USA sagarwal@cs.uiuc.edu danr@cs.uiuc.edu Abstract We develop an approach

More information

3D Object Recognition with Deep Belief Nets

3D Object Recognition with Deep Belief Nets 3D Object Recognition with Deep Belief Nets Vinod Nair and Geoffrey E. Hinton Department of Computer Science, University of Toronto 10 King s College Road, Toronto, M5S 3G5 Canada {vnair,hinton}@cs.toronto.edu

More information

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012

More information

Single Image Depth Estimation via Deep Learning

Single Image Depth Estimation via Deep Learning Single Image Depth Estimation via Deep Learning Wei Song Stanford University Stanford, CA Abstract The goal of the project is to apply direct supervised deep learning to the problem of monocular depth

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information