Neural Networks for Classification
|
|
- Ami Blanche Nash
- 5 years ago
- Views:
Transcription
1 Neural Networks for Classification Andrei Alexandrescu June 19, / 40
2 Neural Networks: History What is a Neural Network? Examples of Neural Networks Elements of a Neural Network 2 / 40
3 Neural Networks: History Modeled after the human brain Experimentation and marketing predated theory Considered the forefront of the AI spring Suffered from the AI winter Theory today still not fully developed and understood Neural Networks: History What is a Neural Network? Examples of Neural Networks Elements of a Neural Network 3 / 40
4 What is a Neural Network? Essentially: A network of interconnected functional elements each with several inputs/one output y(x 1,..., x n ) = f(w 1 x 1 +w 2 x w n x n ) (1) w i are parameters f is the activation function Crucial for learning that addition is used for integrating the inputs Neural Networks: History What is a Neural Network? Examples of Neural Networks Elements of a Neural Network 4 / 40
5 Examples of Neural Networks Logical functions with 0/1 inputs and outputs Fourier series: F(x) = i 0 (a i cos(ix) + b i sin(ix)) (2) Taylor series: F(x) = i 0 a i (x x 0 ) i (3) Neural Networks: History What is a Neural Network? Examples of Neural Networks Elements of a Neural Network Automata 5 / 40
6 Elements of a Neural Network The function performed by an element The topology of the network The method used to train the weights Neural Networks: History What is a Neural Network? Examples of Neural Networks Elements of a Neural Network 6 / 40
7 The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 7 / 40
8 The Perceptron n inputs, one output: y(x 1,..., x n ) = f(w 1 x w n x n ) (4) Oldest activation function (McCulloch/Pitts): f(v) = 1 x 0 (v) (5) The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 8 / 40
9 Perceptron Capabilities Advertised to be as extensive as the brain itself Can (only) distinguish between two linearly-separable sets Smallest undecidable function: XOR Minsky s proof started the AI winter It was not fully understood what connected layers could do The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 9 / 40
10 Bias Notice that the decision hyperplane must go through the origin Could be achieved by preprocessing the input Not always desirable or possible Add a bias input: y(x 1,..., x n ) = f(w 0 +w 1 x w n x n ) (6) Same as an input connected to the constant 1 We consider that ghost input implicit henceforth The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 10 / 40
11 Training the Perceptron Switch to vector notation: y(x) = f(wx) = f w (x) (7) Assume we need to separate sets of points A and B. E(w) = (1 f w (x))+ x A x B f w (x) (8) Goal: E(w) = 0 Start from a random w and improve it The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 11 / 40
12 Algorithm 1. Start with random w, set t = 0 2. Select a vector x A B 3. If x A and wx 0, then w t+1 = w t + x 4. Else if x B and wx 0, then w t+1 = w t x 5. Conditionally go to step 2 Guaranteed to converge iff A and B are linearly separable! The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 12 / 40
13 Summary of Simple Simple training Limited capabilities Reasonably efficient training Simplex, linear programming are better The Perceptron Perceptron Capabilities Bias Training the Perceptron Algorithm Summary of Simple 13 / 40
14 A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 14 / 40
15 Let s connect the output of a perceptron to the input of another What can we compute with this horizontal combination? (We already take vertical combination for granted) A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 15 / 40
16 A Misunderstanding of Epic Proportions Some say two-layered network Two cascaded layers of computational units Some say three-layered network There is one extra input layer that does nothing Let s arbitrarily choose three-layered Input Hidden Output A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 16 / 40
17 Workings The hidden layer maps inputs into a second space: feature space, classification space This makes the job of the output layer easier A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 17 / 40
18 Capabilities Each hidden unit computes a linear separation of the input space Several hidden units can carve a polytope in the input space Output units can distinguish polytope membership Any union of polytopes can be decided A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 18 / 40
19 Training Prerequisite The step function bad for gradient descent techniques Replace with a smooth step function: f(v) = e v (9) Notable fact: f (v) = f(v)(1 f(v)) Makes the function cycles-friendly A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 19 / 40
20 Output Activation Simple binary discrimination zero-centered sigmoid: f(v) = 1 e v 1 + e v (10) Probability distribution softmax: f(v i ) = ev i e v (11) j j A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 20 / 40
21 The Backpropagation Algorithm Works on any differentiable activation function Gradient descent in weight space Metaphor: a ball rolls on the error function s envelope Condition: no flat portion Ball would stop in indifferent equilibrium Some add a slight pull term: f(v) = 1 e v + cv (12) 1 + e v A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 21 / 40
22 The Task Minimize error function: where: E = 1 2 p i=1 o i actual outputs t i desired outputs p number of patterns o i t i 2 (13) A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 22 / 40
23 Training. The Delta Rule Compute E = Update weights: ( E w 1,..., E ) w l w i = γ E w i i = 1,..., l (14) Expect to find a point E = 0 Algorithm for computing E: backpropagation Beyond the scope of this class A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 23 / 40
24 Gradient Locality Only summation guarantees locality of backpropagation Otherwise backpropagation would propagate errors due to one input to all inputs Essential to use summation as input integration! A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 24 / 40
25 Regularization Weights can grow uncontrollably Add a regularization term that opposes weight growth w i = γ E w i αw i (15) Very important practical trick Also avoids overspecialization Forces a smoother output A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 25 / 40
26 Local Minima The gradient surf can stop in a local minimum Biggest issue with neural networks Overspecialization second biggest Convergence not guaranteed either, but regularization helps A Misunderstanding of Epic Proportions Workings Capabilities Training Prerequisite Output Activation The Backpropagation Algorithm The Task Training. The Delta Rule Gradient Locality Regularization Local Minima 26 / 40
27 Discrete Inputs One-Hot Encoding Optimizing One-Hot Encoding One-Hot Encoding: Interesting Tidbits 27 / 40
28 Many NLP applications foster discrete features Neural nets expect real numbers Smooth: similar outputs for similar inputs Any two discrete inputs are just as different Treating them as integral numbers undemocratic One-Hot Encoding Optimizing One-Hot Encoding One-Hot Encoding: Interesting Tidbits 28 / 40
29 One-Hot Encoding One discrete feature with n values n real inputs The i th feature value sets the i th input to 1 and others to 0 The Hamming distance between any two distinct inputs is now constant! Disadvantage: input vector size much larger One-Hot Encoding Optimizing One-Hot Encoding One-Hot Encoding: Interesting Tidbits 29 / 40
30 Optimizing One-Hot Encoding Each hidden unit has all inputs zero except the i th one Even that one is just multiplied by 1 Regroup weights by discrete input, not by hidden unit! Matrix w of size n l Input i just copies row i to the output (virtual multiplication by 1) Cheap computation Delta rule applies as usual One-Hot Encoding Optimizing One-Hot Encoding One-Hot Encoding: Interesting Tidbits 30 / 40
31 One-Hot Encoding: Interesting Tidbits The row w i is a continuous representation of discrete feature i Only one row trained per sample The size of the continuous representation can be chosen depending on the feature s complexity Mix this continuous representation freely with truly continuous features, such as acoustic features One-Hot Encoding Optimizing One-Hot Encoding One-Hot Encoding: Interesting Tidbits 31 / 40
32 Multi-Label Classification Soft Training 32 / 40
33 Multi-Label Classification n real outputs summing to 1 Normalization included in the softmax function: f(v i ) = ev i e v = evi vmax j e v j v max (16) j Train with 1 ǫ for the known label, ǫ n 1 for all others (avoids saturation) j Multi-Label Classification Soft Training 33 / 40
34 Soft Training Maybe the targets are known probability distribution Or want to reduce the number of training cycles Train with actual desired distributions as desired outputs Example: for feature vector x, labels l 1, l 2, l 3 are possible with equal probability Train with 1 ǫ ǫ 3 for the three, n 3 for all others Multi-Label Classification Soft Training 34 / 40
35 Language Modeling Lexicon Learning Word Sense Disambiguation 35 / 40
36 Language Modeling Input: n-gram context May include arbitrary word features (cool!!!) Output: probability distribution of next word Automatically figures which features are important Language Modeling Lexicon Learning Word Sense Disambiguation 36 / 40
37 Lexicon Learning Input: Word-level features (root, stem, morph) Input: Most frequent previous/next words Output: Probability distribution of the word s possible POSs Language Modeling Lexicon Learning Word Sense Disambiguation 37 / 40
38 Word Sense Disambiguation Input: bag of words in context, local collocations Output: Probability distribution over senses Language Modeling Lexicon Learning Word Sense Disambiguation 38 / 40
39 39 / 40
40 Neural nets respectable machine learning technique Theory not fully developed Local optima and overspecialization are killers Yet can learn very complex functions Long training time Short testing time Small memory requirements 40 / 40
For Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationArtificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5
Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationCOMPUTATIONAL INTELLIGENCE
COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence
More informationMulti-Layered Perceptrons (MLPs)
Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node to a Perceptron A set of weights can be found for the above 5 connections which will enable the XOR of the inputs to
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More information6. Linear Discriminant Functions
6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationIntroduction AL Neuronale Netzwerke. VL Algorithmisches Lernen, Teil 2b. Norman Hendrich
Introduction AL 64-360 Neuronale Netzwerke VL Algorithmisches Lernen, Teil 2b Norman Hendrich University of Hamburg, Dept. of Informatics Vogt-Kölln-Str. 30, D-22527 Hamburg hendrich@informatik.uni-hamburg.de
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationNeural Networks. By Laurence Squires
Neural Networks By Laurence Squires Machine learning What is it? Type of A.I. (possibly the ultimate A.I.?!?!?!) Algorithms that learn how to classify data The algorithms slowly change their own variables
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationMultinomial Regression and the Softmax Activation Function. Gary Cottrell!
Multinomial Regression and the Softmax Activation Function Gary Cottrell Notation reminder We have N data points, or patterns, in the training set, with the pattern number as a superscript: {(x 1,t 1 ),
More informationMachine Learning in Telecommunications
Machine Learning in Telecommunications Paulos Charonyktakis & Maria Plakia Department of Computer Science, University of Crete Institute of Computer Science, FORTH Roadmap Motivation Supervised Learning
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationMotivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function
Neural Networks Motivation Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight Fixed basis function Flashback: Linear regression Flashback:
More informationIMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM
Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve
More informationIn this assignment, we investigated the use of neural networks for supervised classification
Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationLearning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation
Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters
More informationOptimization in the Big Data Regime 5: Parallelization? Sham M. Kakade
Optimization in the Big Data Regime 5: Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 21 Announcements...
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationCOMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units
COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions
More informationStatistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology
Statistical Learning Part 2 Nonparametric Learning: The Main Ideas R. Moeller Hamburg University of Technology Instance-Based Learning So far we saw statistical learning as parameter learning, i.e., given
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationNeural Networks: What can a network represent. Deep Learning, Spring 2018
Neural Networks: What can a network represent Deep Learning, Spring 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain
More informationMulti Layer Perceptron with Back Propagation. User Manual
Multi Layer Perceptron with Back Propagation User Manual DAME-MAN-NA-0011 Issue: 1.3 Date: September 03, 2013 Author: S. Cavuoti, M. Brescia Doc. : MLPBP_UserManual_DAME-MAN-NA-0011-Rel1.3 1 INDEX 1 Introduction...
More information2. Neural network basics
2. Neural network basics Next commonalities among different neural networks are discussed in order to get started and show which structural parts or concepts appear in almost all networks. It is presented
More informationPerceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15
Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15 Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks)
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationArtificial Neural Networks
The Perceptron Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br Conceptually
More informationIntroduction to Multilayer Perceptrons
An Introduction to Multilayered Neural Networks Introduction to Multilayer Perceptrons Marco Gori University of Siena Outline of the course Motivations and biological inspiration Multilayer perceptrons:
More informationExtreme Learning Machines. Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014
Extreme Learning Machines Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014 This presentation covers: Revision of Neural Network theory Introduction to Extreme Learning Machines ELM Early
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationNeural Networks: What can a network represent. Deep Learning, Fall 2018
Neural Networks: What can a network represent Deep Learning, Fall 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain
More informationThis leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section
An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,
More informationClassification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions
ENEE 739Q SPRING 2002 COURSE ASSIGNMENT 2 REPORT 1 Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions Vikas Chandrakant Raykar Abstract The aim of the
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More information6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION
6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm
More informationMultilayer Feed-forward networks
Multi Feed-forward networks 1. Computational models of McCulloch and Pitts proposed a binary threshold unit as a computational model for artificial neuron. This first type of neuron has been generalized
More informationHidden Units. Sargur N. Srihari
Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationNeural Networks CMSC475/675
Introduction to Neural Networks CMSC475/675 Chapter 1 Introduction Why ANN Introduction Some tasks can be done easily (effortlessly) by humans but are hard by conventional paradigms on Von Neumann machine
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationLecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan
Lecture 17: Neural Networks and Deep Learning Instructor: Saravanan Thirumuruganathan Outline Perceptron Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks Auto Encoders
More informationMachine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016
Winter 2016 Lecture 2, Januar 26, 2016 f2? cathedral high-rise f1 A common computer vision pipeline before 2012 1. 2. 3. 4. Find interest points. Crop patches around them. Represent each patch with a sparse
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationBackpropagation + Deep Learning
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation + Deep Learning Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationYuki Osada Andrew Cannon
Yuki Osada Andrew Cannon 1 Humans are an intelligent species One feature is the ability to learn The ability to learn comes down to the brain The brain learns from experience Research shows that the brain
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationConstructively Learning a Near-Minimal Neural Network Architecture
Constructively Learning a Near-Minimal Neural Network Architecture Justin Fletcher and Zoran ObradoviC Abetract- Rather than iteratively manually examining a variety of pre-specified architectures, a constructive
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.
More informationInstructor: Jessica Wu Harvey Mudd College
The Perceptron Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Eric Eaton (UPenn), David Kauchak (Pomona), and the many others who made their course
More informationarxiv: v1 [cs.lg] 25 Jan 2018
A New Backpropagation Algorithm without Gradient Descent arxiv:1802.00027v1 [cs.lg] 25 Jan 2018 Varun Ranganathan Student at PES University varunranga1997@hotmail.com January 2018 S. Natarajan Professor
More informationIMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.
IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC. CONTENTS Deep Learning Review Implementation on GPU using cudnn Optimization Issues Introduction to VUNO-Net DEEP LEARNING REVIEW BRIEF HISTORY OF NEURAL
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationHybrid PSO-SA algorithm for training a Neural Network for Classification
Hybrid PSO-SA algorithm for training a Neural Network for Classification Sriram G. Sanjeevi 1, A. Naga Nikhila 2,Thaseem Khan 3 and G. Sumathi 4 1 Associate Professor, Dept. of CSE, National Institute
More informationNeural Nets. General Model Building
Neural Nets To give you an idea of how new this material is, let s do a little history lesson. The origins of neural nets are typically dated back to the early 1940 s and work by two physiologists, McCulloch
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationCS 354R: Computer Game Technology
CS 354R: Computer Game Technology AI Fuzzy Logic and Neural Nets Fall 2018 Fuzzy Logic Philosophical approach Decisions based on degree of truth Is not a method for reasoning under uncertainty that s probability
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More information!!! Warning!!! Learning jargon is always painful even if the concepts behind the jargon are not hard. So, let s get used to it. In mathematics you don't understand things. You just get used to them. von
More informationMathematical Programming and Research Methods (Part II)
Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types
More informationArtificial Intellegence
Artificial Intellegence Neural Net: Based on Nature Perceptron Variations Perceptrons: A Basic Neural Net In machine learning, the perceptron is an algorithm for supervised classification of an input into
More informationLinear Discriminant Functions: Gradient Descent and Perceptron Convergence
Linear Discriminant Functions: Gradient Descent and Perceptron Convergence The Two-Category Linearly Separable Case (5.4) Minimizing the Perceptron Criterion Function (5.5) Role of Linear Discriminant
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationNeural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018
Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2018 1 Story so far Neural networks are universal approximators Can model any odd thing Provided they have the right architecture We must
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationLogical Rhythm - Class 3. August 27, 2018
Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological
More information