COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Similar documents
Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

CMPT 882 Week 3 Summary

DEEP LEARNING IN PYTHON. The need for optimization

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Multilayer Feed-forward networks

For Monday. Read chapter 18, sections Homework:

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Week 3: Perceptron and Multi-layer Perceptron

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Using neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

Technical University of Munich. Exercise 7: Neural Network Basics

Supervised Learning in Neural Networks (Part 2)

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

CS6220: DATA MINING TECHNIQUES

Data Mining. Neural Networks

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Backpropagation + Deep Learning

Neural Network Neurons

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Perceptron: This is convolution!

Neural Networks (Overview) Prof. Richard Zanibbi

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Hybrid PSO-SA algorithm for training a Neural Network for Classification

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

27: Hybrid Graphical Models and Neural Networks

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Opening the Black Box Data Driven Visualizaion of Neural N

Notes on Multilayer, Feedforward Neural Networks

Deep Learning. Architecture Design for. Sargur N. Srihari

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

Neural Networks CMSC475/675

Homework 5. Due: April 20, 2018 at 7:00PM

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Neural Networks Optimization

ImageNet Classification with Deep Convolutional Neural Networks

COMPUTATIONAL INTELLIGENCE

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Administrative. Assignment 1 due Wednesday April 18, 11:59pm

Deep neural networks II

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Lecture #11: The Perceptron

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning Classifiers and Boosting

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CSC 578 Neural Networks and Deep Learning

Back Propagation and Other Differentiation Algorithms. Sargur N. Srihari

Artificial Intelligence

2. Neural network basics

Machine Learning 13. week

Simple Model Selection Cross Validation Regularization Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

732A54/TDDE31 Big Data Analytics

Practice Exam Sample Solutions

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

CS489/698: Intro to ML

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Deep Generative Models Variational Autoencoders

11/14/2010 Intelligent Systems and Soft Computing 1

Backpropagation and Neural Networks. Lecture 4-1

Neural Nets & Deep Learning

Neural Network Weight Selection Using Genetic Algorithms

Parallel Deep Network Training

Fall 09, Homework 5

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Unit V. Neural Fuzzy System

Deep Learning for Computer Vision II

The exam is closed book, closed notes except your one-page cheat sheet.

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

COMPUTATIONAL INTELLIGENCE

Support Vector Machines

Character Recognition Using Convolutional Neural Networks

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

EE 511 Neural Networks

Introduction AL Neuronale Netzwerke. VL Algorithmisches Lernen, Teil 2b. Norman Hendrich

Neural Networks: What can a network represent. Deep Learning, Fall 2018

Neural Networks: What can a network represent. Deep Learning, Spring 2018

Learning via Optimization

Neural Networks (pp )

10703 Deep Reinforcement Learning and Control

Transcription:

COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor s written permission.

Kaggle: Project 2 2

Recall the perceptron We can take a linear combination and threshold it: The output is taken as the predicted class. 3

Decision surface of a perceptron Can represent many functions. To represent non-linearly separate functions (e.g. XOR), we could use a network of perceptron-like elements. If we connect perceptrons into networks, the error surface for the network is not differentiable (because of the hard threshold). 4

Example: A network representing XOR N1 N2 o2 N3 o1 5

Recall the sigmoid function σ(w x) = 1 1+ e w x Sigmoid provide soft threshold, whereas perceptron provides hard threshold s is the sigmoid function: σ(z) = It has the following nice property: 1 1+ e z We can derive a gradient descent rule to train: dσ(z) = σ(z)(1 σ(z)) dz One sigmoid unit; Multi-layer networks of sigmoid units. 6

Feed forward neural networks A collection of neurons with sigmoid activation, arranged in layers. Layer 0 is the input layer, its units just copy the input. Last layer (layer K) is the output layer, its units provide the output. Layers 1,.., K-1 are hidden layers, cannot be detected outside of network. 7

Why this name? In feed-forward networks the output of units in layer k become input to the units in layers k+1, k+2,, K. No cross-connection between units in the same layer. No backward ( recurrent ) connections from layers downstream. Typically, units in layer k provide input to units in layer k+1 only. In fully-connected networks, all units in layer k provide input to all units in layer k+1. 8

Feed-forward neural networks Notation: w ji denotes weight on connection from unit i to unit j. By convention, x j0 = 1, " j Output of unit j, denoted o j is computed using a sigmoid: o j = s(w j x j ) where w j is vector of weights entering unit j x j is vector of inputs to unit j By definition, x ji = o i. Given an input, how do we compute the output? How do we train the weights? 9

Computing the output of the network Suppose we want network to make prediction about instance <x,y=?>. Run a forward pass through the network. For layer k = 1 K 1. Compute the output of all neurons in layer k: 2. Copy this output as the input to the next layer: The output of the last layer is the predicted output y. 10

Learning in feed-forward neural networks Assume the network structure (units+connections) is given. The learning problem is finding a good set of weights to minimize the error at the output of the network. Approach: gradient descent, because the form of the hypothesis formed by the network, h w is: Differentiable! Because of the choice of sigmoid units. Very complex! Hence direct computation of the optimal weights is not possible. 11

Gradient-descent preliminaries for NN Assume we have a fully connected network: N input units (indexed 1,, N) H hidden units in a single layer (indexed N+1,, N+H) one output unit (indexed N+H+1) Suppose you want to compute the weight update after seeing instance <x, y>. Let o i, i = 1,, H+N+1 be the outputs of all units in the network for the given input x. The sum-squared error function is: 12

Gradient-descent update for output node Derivative with respects to the weights w N+H+1,j entering o N+H+1 : Use the chain rule: J(w)/ w = ( J(w)/ σ) ( σ/ w) J(w)/ σ = -(y-o N+H+1 ) 13

Gradient-descent update for output node Derivative with respects to the weights w N+H+1,j entering o N+H+1 : Use the chain rule: J(w)/ w = ( J(w)/ σ) ( σ/ w) J(w)/ σ = -(y-o N+H+1 ) 14

Gradient-descent update for output node Derivative with respects to the weights w N+H+1,j entering o N+H+1 : Use the chain rule: J(w)/ w = ( J(w)/ σ) ( σ/ w) Hence, we can write: where: 15

Gradient-descent update for hidden node The derivative wrt the other weights, w l,j where j = 1,, N and l = N+1,, N+H can be computed using chain rule: Recall that x N+H+1,l = o l. Hence we have: Putting these together and using similar notation as before: COMP-424: Artificial intelligence 16

Gradient-descent update for hidden node The derivative wrt the other weights, w k,j where j = 1,, N and k = N+1,, N+H can be computed again using chain rule. Image from: http://openi.nlm.nih.gov/detailedresult.php?img=2716495_bcr2257-1&req=4 17

Stochastic gradient descent Initialize all weights to small random numbers. Initialization Repeat until convergence: Pick a training example. Feed example through network to compute output o = o N+H+1. Forward pass For the output unit, compute the correction: For each hidden unit h, compute its share of the correction: Backpropagation Update each network weight: Gradient descent 18

Organizing the training data Stochastic gradient descent: Compute error on a single example at a time (as in previous slide). Batch gradient descent: Compute error on all examples. Loop through the training data, accumulating weight changes. Update all weights and repeat. Mini-batch gradient descent: Compute error on small subset. Randomly select a mini-batch (i.e. subset of training examples). Calculate error on mini-batch, apply to update weights, and repeat. 19

Expressiveness of feed-forward NN A single sigmoid neuron? 20

Expressiveness of feed-forward NN A single sigmoid neuron? Same representational power as a perceptron: Boolean AND, OR, NOT, but not XOR. 21

Expressiveness of feed-forward NN A single sigmoid neuron? Same representational power as a perceptron: Boolean AND, OR, NOT, but not XOR. A neural network with a single hidden layer? 22

Learning the identity function 23

Learning the identity function Neural network structure: Learned hidden layer weights: 24

Expressiveness of feed-forward NN A single sigmoid neuron? Same representational power as a perceptron: Boolean AND, OR, NOT, but not XOR. A neural network with a single hidden layer? Can represent every boolean function, but might require a number of hidden units that is exponential in the number of inputs. Every bounded continuous function can be approximated with arbitrary precision by a boolean function. 25

Expressiveness of feed-forward NN A single sigmoid neuron? Same representational power as a perceptron: Boolean AND, OR, NOT, but not XOR. A neural network with a single hidden layer? Can represent every boolean function, but might require a number of hidden units that is exponential in the number of inputs. Every bounded continuous function can be approximated with arbitrary precision by a boolean function. A neural network with two hidden layers? 26

Expressiveness of feed-forward NN A single sigmoid neuron? Same representational power as a perceptron: Boolean AND, OR, NOT, but not XOR. A neural network with a single hidden layer? Can represent every boolean function, but might require a number of hidden units that is exponential in the number of inputs. Every bounded continuous function can be approximated with arbitrary precision by a boolean function. A neural network with two hidden layers? Any function can be approximated to arbitrary accuracy by a network with two hidden layers. 27

Project 3: Visual add / multiply (due Nov.13) For each image, give the result of the equation. If it s an A, add the 2 digits. If it s an M, multiply the 2 digits. 28

What you should know: Final notes Definition / components of neural networks. Training by backpropagation. Additional information about neural networks: Video & slides from the Montreal Deep Learning Summer School: http://videolectures.net/deeplearning2017_larochelle_neural_networks/ https://drive.google.com/file/d/0byukrdicdk7-c2s2rjbisms2uza/view?usp=drive_web https://drive.google.com/file/d/0byukrdicdk7-uxb1r1zpx082mek/view?usp=drive_web Tutorial 3 is today! TR3120, 6-7pm. Project #2 peer reviews will open next Monday on CMT. 29