Dept. of Computing Science & Math
|
|
- Sara Theodora Benson
- 5 years ago
- Views:
Transcription
1 Lecture 4: Multi-Laer Perceptrons 1
2 Revie of Gradient Descent Learning 1. The purpose of neural netor training is to minimize the output errors on a particular set of training data b adusting the netor eights 2. We define a Cost Function E( that measures ho far the current netor s output is from the desired one 3. Partial derivatives of the cost function E(/ tell us hich direction e need to move in eight space to reduce the error 4. The learning rate η specifies the step sizes e tae in eight space for each iteration of the eight update equation 5. We eep stepping through eight space until the errors are small enough. 2
3 Graphical Representation ti of GDR Total Error Local minimum Global minimum Ideal eight Weight, i 3
4 Revie of Perceptron Training 1. Generate a training pair or pattern x that ou ish our netor to learn 2. Setup our netor ith N input units full connected to M output units. 3. Initialize eights,, at random 4. Select an appropriate error function E( and learning rate η 5. Appl the eight change Δ = - η E(/ to each eight for each training pattern p. One set of updates for all the eights for all the training i patterns is called one epoch of ftraining. i 6. Repeat step 5 until the netor error function is small enough 4
5 Revie of XOR and Linear Separabilit Recall that it is not possible to find eights that enable Single Laer Perceptrons to deal ith non-linearl separable problems lie XOR XOR in 1 in 2 out I I 2 The proposed solution as to use a more complex netor that is able to generate more complex decision boundaries. That netor is the Multi-Laer Perceptron. 5
6 Multi-Laer Perceptrons (MLPs Y f ( O Y 1 Y 2 Y Output laer, O O 1 f ( i X i i O Hidden laer, X 1 X 2 X 3 X i Input laer, i 6
7 Can We Use a Generalized Form of the PLR/Delta Rule to Train the MLP? Recall the PLR/Delta rule: Adust neuron eights to reduce error at neuron s output: old x here desired Main problem: Ho to adust the eights in the hidden laer, so the reduce the error in the output laer, hen there is no specified target response in the hidden laer? Solution: Alter the non-linear Perceptron (discrete threshold activation function to mae it differentiable and hence, help derive Generalized DR for MLP training Threshold function Sigmoid function 7
8 Sigmoid (S-shaped Function Properties Approximates the threshold function Smoothl differentiable (everhere, and hence DR applicable Positive slope 1 Popular choice is 1 f ( a 1 e Derivative of sigmoidal function is: 0 f ' ( a f ( a (1 a f ( a
9 Weight Update Rule Weight Update Rule Generall eight change from an unit to unit b gradient descent (i e Generall, eight change from an unit to unit b gradient descent (i.e. eight change b small increment in negative direction to the gradient is no called Generalized Delta Rule (GDR or Bacpropagation: x E old No the delta is more complicated because of the sigmoid function: 2 ( 1 ( E o a a f 1 ( a E E ( 2 ( tar E o a a e a f 1 ( tar o o a a E E (1 ( 9
10 Weight Update Rule (2 For the output units, delta is the output error multiplied b a gradient term: ( tar (1 o For hidden units e also need a value of error. A suitable quantit to use is the eighted sum of the output deltas from a hidden unit: o And again the eight change is: (1 o i x i 10
11 Training i of a 2-Laer Feed Forard Netor 1. Tae the set of training patterns ou ish the netor to learn 2. Set up the netor ith N input units full connected to M non-linear hidden units via connections ith eights i, hich in turn are full connected to P output units via connections ith eights 3. Generate random initial eights, e.g. from range [-t, +t] 4. Select appropriate error function E( and learning rate η 5. Appl the eight update equation Δ =-η E( / to each eight for each training pattern p. 6. Do the same to all hidden laers. 7. Repeat step 5-6 until the netor error function is small enough 11
12 Practical Considerations for Learning Rules There are a number of important issues about training single laer neural netors that need further resolving: 1. Do e need to pre-process the training data? If so, ho? 2. Ho do e choose the initial eights from hich e start the training? 3. Ho do e choose an appropriate learning rate η? 4. Should e change the eights after each training pattern, or after the hole set? 5. Are some activation/transfer functions better than others? 6. Ho do e avoid local minima in the error function? 7. Ho do e no hen e should stop the training? 8. Ho man hidden units do e need? 9. Should e have different learning rates for the different laers? We shall no consider each of these issues one b one. 12
13 Pre-processing of the Training Data In principle, e can ust use an ra input-output data to train our netors. Hoever, in practice, it often helps the netor to learn appropriatel if e carr out some pre-processing of the training data before feeding it to the netor. We should mae sure that the training data is representative it should not contain too man examples of one tpe at the expense of another. On the other hand, if one class of pattern is eas to learn, having large numbers of patterns from that class in the training set ill onl slo don the over-all learning process. 13
14 Choosing the Initial Weight Values The gradient descent learning algorithm treats all the eights in the same a, so if e start them all off ith the same values, all the hidden units ill end up doing the same thing and the netor ill never learn properl. For that reason, e generall start off all the eights ith small random values. Usuall e tae them from a flat distribution around zero [ t, +t], or from a Gaussian distribution around zero ith standard deviation t. Choosing a good value of t can be difficult. Generall, it is a good idea to mae it as large as ou can ithout saturating an of the sigmoids. We usuall hope that the final netor performance ill be independent of the choice of initial eights, but e need to chec this b training the netor from a number of different random initial eight sets. 14
15 Choosing the Learning Rate Choosing a good value for the learning rate η is constrained b to opposing facts: 1. If η is too small, it ill tae too long to get anhere near the minimum of the error function. 2. If η is too large, the eight updates ill over-shoot the error minimum and the eights ill oscillate, or even diverge. Unfortunatel, the optimal value is ver problem and netor dependent, so one cannot formulate reliable general prescriptions. Generall, one should tr a range of different values (e.g. η = 0.1, 0.01, 1.0, and use the results as a guide. 15
16 Batch Training vs. On-line Training Batch Training: update the eights after all training patterns have been presented. On-line Training (or Sequential Training: A natural alternative is to update all the eights immediatel after processing each training pattern. On-line learning does not perform true gradient descent, and the individual eight changes can be rather erratic. Normall a much loer learning rate η ill be necessar than for batch learning. Hoever, because each eight no has N updates (here N is the number of patterns per epoch, rather than ust one, overall the learning is often much quicer. This is particularl true if there is a lot of redundanc in the training data, i.e. man training patterns containing similar information. 16
17 Choosing the Transfer Function We have alread seen that t having a differentiable transfer/activation ti ti function is important for the gradient descent algorithm to or. We have also seen that, in terms of computational efficienc, the standard sigmoid (i.e. logistic function is a particularl convenient replacement for the step function of the Simple Perceptron. The logistic function ranges from 0 to 1. There is some evidence that an anti-smmetric transfer function (eg tanh, i.e. one that satisfies f( x = f(x, enables the gradient descent algorithm to learn faster. When the outputs are required to be non-binar, i.e. continuous real values, having sigmoidal transfer functions no longer maes sense. In these cases, a simple linear transfer function f(x = x is appropriate. 17
18 Local Minima Cost functions can quite easil have more than one minimum: If e start off in the vicinit of the local minimum, e ma end up at the local minimum rather than the global minimum. Starting ith a range of different initial eight sets increases our chances of finding the global minimum. An variation from true gradient descent ill also increase our chances of stepping into the deeper valle. 18
19 When to Stop Training The Sigmoid(x function onl taes on its extreme values of 0 and 1 at x = ±. In effect, this means that t the netor can onl achieve its binar targets t hen at least some of its eights reach ±. So, given finite gradient descent step sizes, our netors ill never reach their binar targets. Even if e offset the targets (to 0.1 and 0.9 sa e ill generall require an infinite number of increasingl small gradient descent steps to achieve those targets. Clearl, if the training algorithm can never actuall reach the minimum, e have to stop the training process hen it is near enough. What constitutes near enough depends on the problem. If e have binar targets, it might be enough that all outputs are ithin 0.1 (sa of their targets. Or, it might be easier to stop the training hen the sum squared error function becomes less than a particular small value (0.2 sa. 19
20 Ho Man Hidden Units? The best number of hidden units depends in a complex a on man factors, including: 1. The number of training patterns 2. The numbers of input and output units 3. The amount of noise in the training data 4. The complexit of the function or classification to be learned 5. The tpe of hidden unit activation function 6. The training algorithm Too fe hidden units ill generall leave high training and generalisation errors due to under-fitting. Too man hidden units ill result in lo training errors, but ill mae the training unnecessaril slo, and ill result in poor generalisation unless some other technique (such as regularisation is used to prevent over-fitting. Virtuall all rules of thumb ou hear about are actuall nonsense. A sensible strateg is to tr a range of numbers of hidden units and see hich ors best. 20
21 Different Learning Rates for Different Laers? A netor as a hole ill usuall learn most efficientl if all its neurons are learning at roughl the same speed. So mabe different parts of the netor should have different learning rates η. There are a number of factors that ma affect the choices: 1. The later netor laers (nearer the outputs ill tend to have larger local gradients (deltas than the earlier laers (nearer the inputs. 2. The activations of units ith man connections feeding into or out of them tend to change faster than units ith feer connections. 3. Activations required for linear units ill be different for Sigmoidal units. 4. There is empirical evidence that it helps to have different learning rates η for the thresholds/biases compared ith the real connection eights. In practice, it is often quicer to ust use the same rates η for all the eights and thresholds, rather than spending time tring to or out appropriate differences. A ver poerful approach is to use evolutionar strategies to determine good learning rates. 21
Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationCHAPTER VI BACK PROPAGATION ALGORITHM
6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More information15.4 Constrained Maxima and Minima
15.4 Constrained Maxima and Minima Question 1: Ho do ou find the relative extrema of a surface hen the values of the variables are constrained? Question : Ho do ou model an optimization problem ith several
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationAn Edge Detection Method Using Back Propagation Neural Network
RESEARCH ARTICLE OPEN ACCESS An Edge Detection Method Using Bac Propagation Neural Netor Ms. Utarsha Kale*, Dr. S. M. Deoar** *Department of Electronics and Telecommunication, Sinhgad Institute of Technology
More informationRoberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 6. Euler s method. for approximate solutions of IVP s
Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 6 Euler s method for approximate solutions of IVP s What ou need to know alread: What an initial value problem is.
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationINTELLIGENT PROCESS SELECTION FOR NTM - A NEURAL NETWORK APPROACH
International Journal of Industrial Engineering Research and Development (IJIERD), ISSN 0976 6979(Print), ISSN 0976 6987(Online) Volume 1, Number 1, July - Aug (2010), pp. 87-96 IAEME, http://www.iaeme.com/iierd.html
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationMulti-Layered Perceptrons (MLPs)
Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node to a Perceptron A set of weights can be found for the above 5 connections which will enable the XOR of the inputs to
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationThe Prediction of Real estate Price Index based on Improved Neural Network Algorithm
, pp.0-5 http://dx.doi.org/0.457/astl.05.8.03 The Prediction of Real estate Price Index based on Improved Neural Netor Algorithm Huan Ma, Ming Chen and Jianei Zhang Softare Engineering College, Zhengzhou
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationApplication of neural networks to model catamaran type powerboats
Application of eural etors to model Catamaran Type Poerboats Application of neural netors to model catamaran type poerboats Garron Fish Mie Dempsey Claytex Services Ltd Edmund House, Rugby Road, Leamington
More informationLearning and Generalization in Single Layer Perceptrons
Learning and Generalization in Single Layer Perceptrons Neural Computation : Lecture 4 John A. Bullinaria, 2015 1. What Can Perceptrons do? 2. Decision Boundaries The Two Dimensional Case 3. Decision Boundaries
More informationLearning from Data: Adaptive Basis Functions
Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationy = f(x) x (x, f(x)) f(x) g(x) = f(x) + 2 (x, g(x)) 0 (0, 1) 1 3 (0, 3) 2 (2, 3) 3 5 (2, 5) 4 (4, 3) 3 5 (4, 5) 5 (5, 5) 5 7 (5, 7)
0 Relations and Functions.7 Transformations In this section, we stud how the graphs of functions change, or transform, when certain specialized modifications are made to their formulas. The transformations
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationCP365 Artificial Intelligence
CP365 Artificial Intelligence Tech News! Apple news conference tomorrow? Tech News! Apple news conference tomorrow? Google cancels Project Ara modular phone Weather-Based Stock Market Predictions? Dataset
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationArtificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5
Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationMachine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016
Winter 2016 Lecture 2, Januar 26, 2016 f2? cathedral high-rise f1 A common computer vision pipeline before 2012 1. 2. 3. 4. Find interest points. Crop patches around them. Represent each patch with a sparse
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationImplicit differentiation
Roberto s Notes on Differential Calculus Chapter 4: Basic differentiation rules Section 5 Implicit differentiation What ou need to know alread: Basic rules of differentiation, including the chain rule.
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationAPPLICATION OF RECIRCULATION NEURAL NETWORK AND PRINCIPAL COMPONENT ANALYSIS FOR FACE RECOGNITION
APPLICATION OF RECIRCULATION NEURAL NETWORK AND PRINCIPAL COMPONENT ANALYSIS FOR FACE RECOGNITION Dmitr Brliuk and Valer Starovoitov Institute of Engineering Cbernetics, Laborator of Image Processing and
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationMultilayer Feed-forward networks
Multi Feed-forward networks 1. Computational models of McCulloch and Pitts proposed a binary threshold unit as a computational model for artificial neuron. This first type of neuron has been generalized
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationNeural Nets for Adaptive Filter and Adaptive Pattern Recognition
Adaptive Pattern btyoung@gmail.com CSCE 636 10 February 2010 Outline Adaptive Combiners and Filters Minimal Disturbance and the Algorithm Madaline Rule II () Published 1988 in IEEE Journals Bernard Widrow
More informationToday. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps
Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x
More informationAn Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.
An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. Mohammad Mahmudul Alam Mia, Shovasis Kumar Biswas, Monalisa Chowdhury Urmi, Abubakar
More informationSection 1.4: Graphing Calculators and Computers
Section 1.4: Graphing Calculators and Computers In this section we shall show some of the was that calculators can help us in mathematics. 1. Calculator Warnings Before we even consider how a calculator
More informationMulti Layer Perceptron with Back Propagation. User Manual
Multi Layer Perceptron with Back Propagation User Manual DAME-MAN-NA-0011 Issue: 1.3 Date: September 03, 2013 Author: S. Cavuoti, M. Brescia Doc. : MLPBP_UserManual_DAME-MAN-NA-0011-Rel1.3 1 INDEX 1 Introduction...
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationKnowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationLearning internal representations
CHAPTER 4 Learning internal representations Introduction In the previous chapter, you trained a single-layered perceptron on the problems AND and OR using the delta rule. This architecture was incapable
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationNeural Networks CMSC475/675
Introduction to Neural Networks CMSC475/675 Chapter 1 Introduction Why ANN Introduction Some tasks can be done easily (effortlessly) by humans but are hard by conventional paradigms on Von Neumann machine
More informationOptional: Building a processor from scratch
Optional: Building a processor from scratch In this assignment we are going build a computer processor from the ground up, starting with transistors, and ending with a small but powerful processor. The
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University
More information3/1/2016. Calculus: A Reminder. Calculus: A Reminder. Calculus: A Reminder. Calculus: A Reminder. Calculus: A Reminder. Calculus: A Reminder
1 Intermediate Microeconomics W3211 Lecture 5: Choice and Demand Introduction Columbia Universit, Spring 2016 Mark Dean: mark.dean@columbia.edu 2 The Stor So Far. 3 Toda s Aims 4 We have now have had a
More informationCost Functions in Machine Learning
Cost Functions in Machine Learning Kevin Swingler Motivation Given some data that reflects measurements from the environment We want to build a model that reflects certain statistics about that data Something
More informationOptimal Hierarchical Energy Efficient Design for MANETs Wasim El-Hajj Western Michigan University Kalamazoo, MI
Optimal Hierarchical Energ Efficient Design for MAETs Wasim El-Ha Western Michigan Universit Kalamaoo, MI elha@cs.mich.edu Dionsios Kountanis Western Michigan Universit Kalamaoo, MI, USA ountan@cs.mich.edu
More informationGrid and Mesh Generation. Introduction to its Concepts and Methods
Grid and Mesh Generation Introduction to its Concepts and Methods Elements in a CFD software sstem Introduction What is a grid? The arrangement of the discrete points throughout the flow field is simpl
More informationDeep Neural Networks Optimization
Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy
More informationTopic 2 Transformations of Functions
Week Topic Transformations of Functions Week Topic Transformations of Functions This topic can be a little trick, especiall when one problem has several transformations. We re going to work through each
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationTechnical University of Munich. Exercise 7: Neural Network Basics
Technical University of Munich Chair for Bioinformatics and Computational Biology Protein Prediction I for Computer Scientists SoSe 2018 Prof. B. Rost M. Bernhofer, M. Heinzinger, D. Nechaev, L. Richter
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationMixture models and clustering
1 Lecture topics: Miture models and clustering, k-means Distance and clustering Miture models and clustering We have so far used miture models as fleible ays of constructing probability models for prediction
More informationLinear models. Subhransu Maji. CMPSCI 689: Machine Learning. 24 February February 2015
Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015 Overvie Linear models Perceptron: model and learning algorithm combined as one Is there a better ay to learn
More informationIn this assignment, we investigated the use of neural networks for supervised classification
Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric
More informationLogistic Regression and Gradient Ascent
Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationCS 450: COMPUTER GRAPHICS RASTERIZING LINES SPRING 2016 DR. MICHAEL J. REALE
CS 45: COMPUTER GRAPHICS RASTERIZING LINES SPRING 6 DR. MICHAEL J. REALE OBJECT-ORDER RENDERING We going to start on how we will perform object-order rendering Object-order rendering Go through each OBJECT
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationLecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017
INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,
More informationInternational Journal of Electrical and Computer Engineering 4: Application of Neural Network in User Authentication for Smart Home System
Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, and D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationAutomatic Recognition and Assignment of Missile Pieces in Clutter
Automatic Recognition and Assignment of Missile Pieces in Clutter Cherl Resch *, Fernando J. Pineda, and I-Jeng Wang Johns Hopins Universit Applied Phsics Laborator {cherl.resch, fernando.pineda,i-jeng.wang}@jhuapl.edu
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationStereo: the graph cut method
Stereo: the graph cut method Last lecture we looked at a simple version of the Marr-Poggio algorithm for solving the binocular correspondence problem along epipolar lines in rectified images. The main
More informationTransformations of Functions. 1. Shifting, reflecting, and stretching graphs Symmetry of functions and equations
Chapter Transformations of Functions TOPICS.5.. Shifting, reflecting, and stretching graphs Smmetr of functions and equations TOPIC Horizontal Shifting/ Translation Horizontal Shifting/ Translation Shifting,
More informationRational functions and graphs. Section 2: Graphs of rational functions
Rational functions and graphs Section : Graphs of rational functions Notes and Eamples These notes contain subsections on Graph sketching Turning points and restrictions on values Graph sketching You can
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More information6.034 Notes: Section 7.1
6.034 Notes: Section 7.1 Slide 7.1.1 We have been using this simulated bankruptcy data set to illustrate the different learning algorithms that operate on continuous data. Recall that R is supposed to
More information2.2 Absolute Value Functions
. Absolute Value Functions 7. Absolute Value Functions There are a few was to describe what is meant b the absolute value of a real number. You ma have been taught that is the distance from the real number
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationMachine Learning written examination
Institutionen för informationstenologi Olle Gällmo Universitetsadjunt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Machine Learning written examination Friday, June 10, 2011 8 00-13 00 Allowed help
More informationTime Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph. Laura E. Carter-Greaves
http://neuroph.sourceforge.net 1 Introduction Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph Laura E. Carter-Greaves Neural networks have been applied
More informationIMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM
Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationHidden Units. Sargur N. Srihari
Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationMultinomial Regression and the Softmax Activation Function. Gary Cottrell!
Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),
More information