SGD: Stochastic Gradient Descent
|
|
- Garry Moore
- 5 years ago
- Views:
Transcription
1 Improving SGD Hantao Zhang Deep Learning with Python Reading: Chapter 2 SGD: Stochastic Gradient Descent Main Idea: Given a set of input/output examples D = { (x, y) }. Define the network as a function f(w, x) on weights w and x. Define the cost, say C = ½( (x,y) D a(x) y 2 )/ D and try to minimalize. For each epoch, repeat the following: 1. compute a(x) = f(w, x) and C = ½( (x,y) D a(x) y 2 )/ D. 2. compute C/ w 3. update w by w = w - ( C/ w) to decrease C. For large datasets this is expensive: we don t want to load all the data D into memory, and the gradient depends on all the data. An alternative: pick a small subset of examples, called mini batch, B<<D approximate the gradient using C = ½( (x,y) B a(x) y 2 )/ B. on average C/ w is the right direction. take a step in that direction repeat. B = one example is a very popular choice, called online update 1
2 Batch Update With On-line (stochastic) update we update weights after every pattern. With Batch update we accumulate the changes for each weight in a batch, and update weights at the end of each batch. Batch update often gives a correct direction of the gradient for the entire data set, while on-line could do some weight updates in directions quite different from the average gradient of the entire data set Based on noisy instances and also just that specific instances will not represent the average gradient Size of mini batch? Another super-parameter to choose through experiments and experiences. 3 Stochastic Gradient Descent Since True gradient is approximated only, loss will not always decrease (locally) as training data point is random. Still converges over time. 2
3 Computation in General NN There are L layers, l {1, 2,, L} plus input layer (l = 0); each layer is fully connected to the next. An example of 4-layer NN: C/W i [k, j] = y i [k] i [j] C/W i = (y i 1) (1 i ) For mini-batch B, define i = B ( C/W i ) Update W i by W i = W i i or 4 weights: W 0 W 1 W 2 W 3 W i = W i i / B W i [j,k] = link weight from j th neuron in layer i to k th neuron in layer (i+1) 4 outputs: y 0 y 1 y 2 y 3 y 4 (exclude y 0 = x) 4 sums: y 0 z 0 y 1 z 1 y 2 z 2 y 3 z 3 y 4 z i = y i W i + b i, y i+1 = a(z i ) 4 i & i : Let cost C ½ (y L y) 2, i C/y i, i C/z i, then L = (y L y), i = i+1 a (z i ), and i = W i i Improve readability def backprop(self, x, y): # feedforward computation y_act = x y_acts = [x] # list to store all the activations z_sums = [] # list to store all the z vectors for b, w in zip(self.biases, self.weights): z = np.dot(y_act, w)+ b z_sums.append(z) y_act = self.activation(z) y_acts.append(y_act) # backward propagation theta = self.cost_derivative(y_act, y) for i in range(self.num_layers-1, -1, -1): ad = self.activation_derivative(z_sums[i], y_acts[i+1]) delta = theta * ad self.delta_b[i] = delta y_hat = y_acts[i][:, np.newaxis] delta_hat = delta[np.newaxis, :] self.delta_w[i] = np.dot(y_hat, delta_hat) if (i > 0): theta = np.dot(self.weights[i], delta) return (self.delta_b, self.delta_w) Change negative index to positive z i = y i W i y i+1 = a(z i ) C ½ (y L y) 2 i C/y i i C/z i L = (y L y) i = i+1 a (z i ) i = W i i C/W i = (y i 1) (1 i ) 3
4 Improve performance Before, one example at a time for backprop. def update_batch(self, batch_x, batch_y, eta): # batch: mini batch of examples # eta: learning rate nabla_b = np.array([np.zeros(b.shape) for b in self.biases]) nabla_w = np.array([np.zeros(w.shape) for w in self.weights]) for x, y in zip(batch_x, batch_y): delta_b, delta_w = self.backprop(x, y) nabla_w = nabla_w + delta_w nabla_b = nabla_b + delta_b self.weights -= eta*nabla_w self.biases -= eta*nabla_b For mini-batch B, define i = B ( C/W i ) Update W i by W i = W i i or W i = W i i / B z i = y i W i y i+1 = a(z i ) C ½ (y L y) 2 i C/y i i C/z i L = (y L y) i = i+1 a (z i ) i = W i i C/W i = (y i 1) (1 i ) Improve performance Now, a batch of examples are feed to backprop. def update_batch(self, batch_x, batch_y, eta): # batch: mini batch of examples # eta: learning rate nabla_w, nabla_b = self.backprop(batch_x, batch_y) for i in range(self.num_layers): self.weights[i] -= eta*np.sum(nabla_w[i], axis=0) self.biases[i] -= eta*np.sum(nabla_b[i], axis=0) For mini-batch B, define i = B ( C/W i ) Update W i by W i = W i i or W i = W i i / B z i = y i W i y i+1 = a(z i ) C ½ (y L y) 2 i C/y i i C/z i L = (y L y) i = i+1 a (z i ) i = W i i C/W i = (y i 1) (1 i ) 4
5 Improve performance def backprop(self, x, y): # feedforward, x is a batch of examples y_act = x y_acts = [x] z_wsums = [] for b, w in zip(self.biases, self.weights): z = np.dot(y_act, w) + b z_wsums.append(z) y_act = self.activation(z) y_acts.append(y_act) # backward propagation theta = self.cost_derivative(y_act, y) for i in range(self.num_layers-1, -1, -1): ad = self.activation_derivative(z_wsums[i], y_acts[i+1]) delta = np.multiply(theta, ad) y_hat = y_acts[i][:, :, np.newaxis] delta_hat = delta[:, np.newaxis, :] self.nabla_w[i] = np.multiply(y_hat, delta_hat) self.nabla_b[i] = delta if (i>0): theta = np.dot(delta, np.transpose(self.weights[i])) return (self.nabla_w, self.nabla_b) Now, a batch of examples are feed to backprop. z i = y i W i y i+1 = a(z i ) C ½ (y L y) 2 i C/y i i C/z i L = (y L y) i = i+1 a (z i ) i = W i i C/W i = (y i 1) (1 i ) MNIST: Database of handwritten digits yann.lecun.com/exdb/mnist/ by Yann LeCun s team at NYU Has a training set of 60 K examples (6K examples for each digit), and a test set of 10K examples. Each digit is a 28 x 28 pixel grey level image. The digit itself occupies the central 20 x 20 pixels, and the center of mass lies at the center of the box. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. 5
6 MNIST: Database of handwritten digits MNIST also keeps a performance record of all image recognition programs. LeCun s Convolutional Neural Networks variations (0.8%, 0.6% and 0.4% on MNIST) Tangent Distance (Simard, LeCun & Denker: 2.5%) Randomized Decision Trees (Amit, Geman & Wilder, 0.8%) K-NN based Shape context/tps matching (Belongie, Malik & Puzicha: 0.6%) SVM on orientation histograms (Maji & Malik, 0.8%) Network.py s performance: architecture = [784, 30, 10], epoch=30: 4.6%, 70 seconds architecture = [784, 60, 30, 10],, epoch=30: 4.2%, 100 seconds mnist.py def vectorized_digit(j): "" Return a 10-dimensional unit vector with a 1.0 in the jth position and zeroes elsewhere. This is used to convert a digit in (0...9) into a corresponding desired output from the neural network.""" e = np.zeros((10)) e[j] = 1.0 return e f = gzip.open('mnist.pkl.gz', 'rb') train_data, valid_data, test_data = pickle.load(f, encoding="latin1") f.close() # train_x.shape = (50000, 784) 784 = 28x28, an image 28x28. # train_data[1].shape = (50000,) # train_y.shape = (50000, 10) train_x = train_data[0] train_y = [vectorized_digit(y) for y in train_data[1]] 6
7 mnist.py import time start_time = time.time() import mlnnsgd as mlnn net = mlnn.network([784, 60, 10]) print('creating Network =', net.sizes) print('weight shapes:', [w.shape for w in net.weights]) net.sgd(train_x, train_y, epochs=10, batch_size=100, eta=0.1, test_data=test_data) print("run time: %s seconds" % (time.time() - start_time)) # in mlnn.py: def evaluate(self, test_data): "" Return the number of test inputs for which the neural network outputs the incorrect result. """ digits = np.argmax(self.feedforward(test_data[0]), axis=1) return np.count_nonzero(digits - test_data[1]) s = 0 for x, y in zip(test_data[0], test_data[1]): if (np.argmax(self.feedforward(x))!= y): s = s+1 return s Backpropagation Observations Procedure is (relatively) efficient All computations are local Use inputs and outputs of current node What is good enough? Rarely reach target (0 or 1) outputs Typically, train until within 0.1 of target How to improve further the performance? 7
8 Hyperparameter Selection Pick a small Learning Rate, e.g. 0.1, as a starting point. Connectivity: typically fully connected between layers Number of hidden nodes: Too many nodes make learning slower, could overfit Too many hidden nodes is usually OK if using a reasonable stopping criteria Too few will underfit Number of layers: 1 (common) or 2 hidden layers which are usually sufficient for good results, attenuation makes learning very slow modern deep learning approaches show significant improvement using many layers Manually set hyperparameters: trial and error runs Often sequential, or binary search: find one hyperparameter value with others held constant, freeze it, find next hyperparameter, etc. Random is empirically most consistently effective typically each hyperparameter is chosen with a uniform distribution from a log scale for each trial Hyperparameters could be learned by the learning algorithm in which case you must take care to not overfit the training data Performance of NN Training Convergence of Backpropagation Let Gradient descent find a local minimum quickly What affect the convergence? NN size and training set size Learning rate Initial weight values Derivative values Avoiding Overfitting Generalize well and work better for general cases What affect overfitting? NN Architecture Weight values using weight-decay through regulation Stop earlier 8
9 Hidden Nodes Typically one fully connected hidden layer. Common initial number is 2n or 2logn hidden nodes where n is the number of inputs In practice train with a small number of hidden nodes, then keep doubling, etc. until no more significant improvement on test sets All output and hidden nodes should have bias weights Hidden nodes discover new higher order features which are fed into the output layer. i k i j k i k i Local Minima SGD in general have more difficulties with simple tasks than with more complex tasks Good news with MLPs Many dimensions make for many descent options Local minima more common with very simple/toy problems, very rare with larger problems and larger nets Even if there are occasional local minima problems, could simply train multiple nets and pick the best Some algorithms add noise to the updates to escape minima 9
10 Local Minima and Neural Networks Neural Network can get stuck in local minima for small networks, but for most large networks (many weights), local minima rarely occur in practice This is because with so many dimensions of weights it is unlikely that we are in a minima in every dimension simultaneously almost always a way down Backpropagation Summary Excellent Empirical results Scaling The pleasant surprise Local minima very rare as problem and network complexity increase Most common neural network approach Many other different styles of neural networks User defined parameters usually handled by multiple experiments Many variants Regression Typically Linear output nodes, normal hidden nodes Adaptive Parameters Many different learning algorithm approaches Higher order gradient descent (Newton, Conjugate Gradient, etc.) Recurrent networks Deep networks! Still an active research area 10
Implement NN using NumPy
Implement NN using NumPy Hantao Zhang Deep Learning with Python Reading: https://www.tutorialspoint.com/numpy/ Recommendation for Using Python Install anaconda on your PC. If you already have installed
More informationWeiguang Guan Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz
Weiguang Guan guanw@sharcnet.ca Code & data: guanw.sharcnet.ca/ss2017-deeplearning.tar.gz Outline Part I: Introduction Overview of machine learning and AI Introduction to neural network and deep learning
More informationThe Mathematics Behind Neural Networks
The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationLecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017
INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationDeep Neural Networks Optimization
Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationDEEP LEARNING IN PYTHON. The need for optimization
DEEP LEARNING IN PYTHON The need for optimization A baseline neural network Input 2 Hidden Layer 5 2 Output - 9-3 Actual Value of Target: 3 Error: Actual - Predicted = 4 A baseline neural network Input
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationDropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:
Dropout Sargur N. srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationA neural network that classifies glass either as window or non-window depending on the glass chemistry.
A neural network that classifies glass either as window or non-window depending on the glass chemistry. Djaber Maouche Department of Electrical Electronic Engineering Cukurova University Adana, Turkey
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the
More informationNeural Nets & Deep Learning
Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg
More informationA Quick Guide on Training a neural network using Keras.
A Quick Guide on Training a neural network using Keras. TensorFlow and Keras Keras Open source High level, less flexible Easy to learn Perfect for quick implementations Starts by François Chollet from
More informationEfficient Deep Learning Optimization Methods
11-785/ Spring 2019/ Recitation 3 Efficient Deep Learning Optimization Methods Josh Moavenzadeh, Kai Hu, and Cody Smith Outline 1 Review of optimization 2 Optimization practice 3 Training tips in PyTorch
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationCOMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization
COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationCS 4510/9010 Applied Machine Learning
CS 4510/9010 Applied Machine Learning Neural Nets Paula Matuszek Spring, 2015 1 Neural Nets, the very short version A neural net consists of layers of nodes, or neurons, each of which has an activation
More informationEnergy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt
Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationOn the Effectiveness of Neural Networks Classifying the MNIST Dataset
On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.
More informationCS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016
CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationMultinomial Regression and the Softmax Activation Function. Gary Cottrell!
Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationCS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016
CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationParallel Stochastic Gradient Descent
University of Montreal August 11th, 2007 CIAR Summer School - Toronto Stochastic Gradient Descent Cost to optimize: E z [C(θ, z)] with θ the parameters and z a training point. Stochastic gradient: θ t+1
More informationA Fast Learning Algorithm for Deep Belief Nets
A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National
More informationLecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan
Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationAdvanced Video Analysis & Imaging
Advanced Video Analysis & Imaging (5LSH0), Module 09B Machine Learning with Convolutional Neural Networks (CNNs) - Workout Farhad G. Zanjani, Clint Sebastian, Egor Bondarev, Peter H.N. de With ( p.h.n.de.with@tue.nl
More informationEE 511 Neural Networks
Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Andrei Karpathy EE 511 Neural Networks Instructor: Hanna Hajishirzi hannaneh@washington.edu Computational
More informationImage Classification using Fast Learning Convolutional Neural Networks
, pp.50-55 http://dx.doi.org/10.14257/astl.2015.113.11 Image Classification using Fast Learning Convolutional Neural Networks Keonhee Lee 1 and Dong-Chul Park 2 1 Software Device Research Center Korea
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationDeep Learning for Generic Object Recognition
Deep Learning for Generic Object Recognition, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang,
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More information3 Types of Gradient Descent Algorithms for Small & Large Data Sets
3 Types of Gradient Descent Algorithms for Small & Large Data Sets Introduction Gradient Descent Algorithm (GD) is an iterative algorithm to find a Global Minimum of an objective function (cost function)
More informationThomas Nabelek September 22, ECE 7870 Project 1 Backpropagation
Thomas Nabelek ECE 7870 Project 1 Backpropagation 1) Introduction The backpropagation algorithm is a well-known method used to train an artificial neural network to sort inputs into their respective classes.
More informationFinal Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad
Final Report: Classification of Plankton Classes By Tae Ho Kim and Saaid Haseeb Arshad Table of Contents 1. Project Overview a. Problem Statement b. Data c. Overview of the Two Stages of Implementation
More informationCOMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare
COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationLecture 37: ConvNets (Cont d) and Training
Lecture 37: ConvNets (Cont d) and Training CS 4670/5670 Sean Bell [http://bbabenko.tumblr.com/post/83319141207/convolutional-learnings-things-i-learned-by] (Unrelated) Dog vs Food [Karen Zack, @teenybiscuit]
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationHand Written Digit Recognition Using Tensorflow and Python
Hand Written Digit Recognition Using Tensorflow and Python Shekhar Shiroor Department of Computer Science College of Engineering and Computer Science California State University-Sacramento Sacramento,
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationNeural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.
Neural networks Spyros Samothrakis Research Fellow, IADS University of Essex About Linear function approximation with SGD From linear regression to neural networks Practical aspects February 28, 2017 Conclusion
More informationExercise: Training Simple MLP by Backpropagation. Using Netlab.
Exercise: Training Simple MLP by Backpropagation. Using Netlab. Petr Pošík December, 27 File list This document is an explanation text to the following script: demomlpklin.m script implementing the beckpropagation
More informationConvolution Neural Networks for Chinese Handwriting Recognition
Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven
More informationReview on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationLecture 3: Theano Programming
Lecture 3: Theano Programming Misc Class Items Registration & auditing class Paper presentation Projects: ~10 projects in total ~2 students per project AAAI: Hinton s invited talk: Training data size increase
More informationSEMANTIC COMPUTING. Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) TU Dresden, 21 December 2018
SEMANTIC COMPUTING Lecture 9: Deep Learning: Recurrent Neural Networks (RNNs) Dagmar Gromann International Center For Computational Logic TU Dresden, 21 December 2018 Overview Handling Overfitting Recurrent
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from
More informationRecurrent Convolutional Neural Networks for Scene Labeling
Recurrent Convolutional Neural Networks for Scene Labeling Pedro O. Pinheiro, Ronan Collobert Reviewed by Yizhe Zhang August 14, 2015 Scene labeling task Scene labeling: assign a class label to each pixel
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More informationA Simple (?) Exercise: Predicting the Next Word
CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane
More informationKnowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationCost Functions in Machine Learning
Cost Functions in Machine Learning Kevin Swingler Motivation Given some data that reflects measurements from the environment We want to build a model that reflects certain statistics about that data Something
More informationNeural Networks (Overview) Prof. Richard Zanibbi
Neural Networks (Overview) Prof. Richard Zanibbi Inspired by Biology Introduction But as used in pattern recognition research, have little relation with real neural systems (studied in neurology and neuroscience)
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationUsing neural nets to recognize hand-written digits. Srikumar Ramalingam School of Computing University of Utah
Using neural nets to recognize hand-written digits Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the first chapter of the online book by Michael
More informationArbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov
Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov Example Style Content Result Motivation Transforming content of an image
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationIn this assignment, we investigated the use of neural networks for supervised classification
Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric
More information