This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section
|
|
- Oswin Francis
- 5 years ago
- Views:
Transcription
1 An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton, NY (Proceedings of WCNN'96, San Diego, CA, PP ) ABSTRACT Anovel algorithm is presented to construct feedforward networks of threshold units. The inputs of the threshold units (including external network inputs) are not restricted to be binary: they can assume any REAL values. The algorithm is derived from the Cascade- Correlation and Adaline algorithms. The cascade architecture is examined from a linear systems perspective which reveals a connection between the training of hidden units and raising the rank of a set of vectors consisting of the network inputs and the outputs of the hidden units. This leads to our algorithm, which is shown to work successfully on several standard benchmarks. The algorithm can generate a UNIQUE solution (both the topology, as well as the weight and bias values), irrespective of the random initializations. The algorithm is flexible and the search-based steps therein can be completely replaced by well known methods from linear algebra, which could lead to a considerable reduction in learning time. Merits and drawbacks of this method are discussed in the context of other relevant approaches. Several further extensions are suggested. I Introduction The sigmoid function is commonly used as the nonlinear squashing function of the processing elements (or units) in Feedforward ANNs (Artificial Neural Networks). Reasons for the widespread use of the sigmoid function are (i) it is a one-to-one continuous function with well defined derivatives of all orders, and (ii) it approximates a step (threshold) function: it can be considered to be a soft step. Most training algorithms involve optimization of an objective function such as total error in backpropagation, correlation in the Cascade-Correlation [1], etc., with respect to the adjustable parameters (i.e., the weights and biases). The optimization is typically achieved via gradient based methods. Hence, the squashing function of the units must be differentiable. A step function is a highly many-to-one function and is not differentiable everywhere. Therefore a network of threshold units cannot be trained by gradient based methods. From a hardware implementation perspective, however, a step function is highly desirable: it is a lot simpler to realize in hardware than a sigmoid. Hence, several researchers have addressed the construction of feedforward networks of threshold units [3, 5] [6] [9]. The inputs, however, are assumed to be binary in most cases. Most tasks of practical utility including regression tasks (these have real valued inputs and real valued continuously variable outputs) as well as classification tasks (real valued inputs and discrete outputs) specify real valued inputs, as illustrated by several benchmarks [2]. Hence the restriction to binary valued inputs necessitates quantization and binary encoding of real valued inputs. This might lead to a very large number of inputs to the network if the number of quantization levels is high. This paper presents a method to incrementally build networks of threshold units with real valued inputs. The method is based on the Cascade-Correlation (abbreviated Cascor) [1] and Adaline algorithms. The next section analyses the cascade architecture from a linear systems perspective. 999
2 This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section presents conclusion and extensions. II Cascade Architecture from a Linear Systems Perspective As mentioned above, the algorithm is similar to the Cascade-Correlation (please refer to [1, 10] for details). We therefore examine the Cascor from a linear systems perspective. In the following, vectors are indicated with an overline, while matrices are indicated by an underline. Assume that there are P patterns (samples) in the training set. Then, the output of unit k (it can be hidden or output unit) for pattern i =1;:::P is denoted by y ki and is given by y ki = f(u ki ); where u ki =[weighted sum of inputs + bias] for unit k for pattern i (1) and f denotes the squashing function of the unit. In vector notation, equation (1) can be rewritten as y k = f(u k ) where squashing function f is applied independently to each of the P (2) components of vector u k to obtain vector y k. For the purpose of illustration, assume that the problem specified has n inputs and 1 output, without loss of generality (it is easy to extend the ensuing analysis to more than one outputs and the final algorithm does handle multiple outputs). Typically, number of samples P >> n, the number of inputs. Assume the output unit to be linear. Cascor starts off by connecting the n inputs to the output unit and minimizes the total error E total = i=1 (d i y i ) 2 where d i and y i are desired (target) and actual outputs for pattern i: (3) Since the output unit is linear, E total is a quadratic function of the weights, which implies that there is a unique minimum. Any gradient based descent method can lead the search to this unique minimum. Linearity of the output unit implies that, minimizing E total in equation (3) is equivalent to solving the linear system M (0) w (0) = d in the least squares" sense, where (4) matrix M (0) =[I 1 ; ; I n+1 ]; vector d =[d 1 d 2 :::d P ] T ; and vector w (0) =[w 1 w 2 :::w n+1 ] T (5) In equation (4), vector d represents the target outputs for each the P samples, w (0) is a vector with n +1 components corresponding to the bias and the n weights of the output unit, and M (0) is a P (n+1) dimensional matrix whose columns are vectors I 1 ; ; I n+1. The jth componentofvector I m is the mth external input to the network for training sample j (First column I 1 corresponds to the bias unit). The minimum norm solution w (0) opt (which minimizes E total defined in equation (3)) to the system of equations (4)is unique and is given by w (0) opt = M (0)y d where M (0)y is the Moore-Penrose pseudoinverse of matrix M (0) (6) After this training, if the desired error bound is not met, then more units are installed one by one in a cascade topology [1, 10]. Every new unit that gets installed is connected to the output unit, so that after installing k hidden units, minimization of E total in equation (3) is equivalent to solving the linear system M (k) w (k) = μ d in the least-squares sense, where (7) matrix M (k) =[I 1 ; ; I n+1 ; H 1 ; ; H k ]; and vector w (k) =[w 1 w 2 :::w n+1 :::w n+1+k ] T (8) In (7) the superscript (k)" indicates that k hidden units have been installed so far. P dimensional vectors H 1 ; ; H k correspond to the outputs of the k hidden units: H j = f(u j ); j =1;:::;k(9) For instance, jth component of vector H i is the output of hidden unit i for pattern j. Note that every hidden unit in effect adds a new column to matrix M. A close examination of Cascade Correlation in its original [1] and modified form [10] indicate that every new hidden unit that is installed raises the rank of the matrix M, i.e., rank(m (k) ) = rank(»m (k 1). H k ) > rank(m (k 1) ) (10) As long as the hidden units' squashing function f (it must be non linear) and it's weights are such that it's output vector H is linearly independent of the output vectors of previously installed hidden units and the inputs to the network, the unit will further reduce error when it gets 1000
3 installed. This is the key to the success of any algorithm that generates a cascade architecture: it must utilize the nonlinearity f of the hidden units in such away that each hidden unit k upon installation generates an output vector H k which raises the rank of matrix M. This observation is the foundation of our algorithm which is presented next. III The Algorithm and its Performance on Benchmarks. If the task at hand specifies continuously variable real outputs, then the output units are assumed to be linear, i.e., y = f(u) = u. Otherwise, for discrete outputs (required for classification tasks) the squashing function of the output units is a threshold or Step function: where u =(weighted sum of inputs + bias) (11) 1 otherwise All hidden units are threshold units. The algorithm is summarized by the following steps: y = Step(u) =( +1 if u 0 Step 1 Connect all inputs to each output unit and train all weights feeding all output units to X minimize the objective function C= (d oi u oi ) 2 where sum over o covers all outputs (12) o i=1 Note that this is similar to the adaline algorithm, where the inputs to the units are used to calculate weight adjustments. If the specified error tolerance is not met, proceed to the next step. Step 2 Install hidden units one by one. Each hidden unit k =1; 2; is installed in three steps. 2.1 In the first step, the input of the new hidden unit is connected to all (external) network inputs as well as the outputs of all previously installed hidden units. It's output is not connected anywhere in the network. The input side connections of the unit are trained to minimize a discrepancy X D = (e op u kp ) 2 where e op = residual error at output unit o for pattern p = y op d op (13) o p=1 and u kp is the resultant input to hidden unit k (which is being installed) for pattern p. Once again, note that the input u of the unit being installed is used to calculate the weight adjustments. Hence, D is a quadratic function of the weights and minimization of D leads to a unique minimum solution for the weight values (which can also be obtained via the pseudoinverse). 2.2 Now examine if H k = f(u k ) = Step(u k ) is linearly independent of all the network inputs and the outputs of previous hidden units, i.e., if the vectors fi 1 ;:::;I n+1 ; H 1 ;:::;H k g, form a linearly independent set If they do, then proceed to part 3 of step 2. Otherwise, try other weights sets (via trial and error or random initializations), till the linear independence condition is met. We would like to point out that in all the benchmark simulations tried so far, including the highly non linear and complex two spirals classification task [1, 2], minimization of D in step (2.1) above has always led to a vector H k that is linearly independent of the previous ones. 2.3 Once the linear independence condition is met, the input side connections of the hidden unit are frozen forever. It's output is now connected to the network output units and all connections feeding output units are trained to minimize the objective function C defined in equation (12). Step 3 Iterate step 2, installing more units one at a time, till the desired error bound is met or some pre-determined maximum number of units is exceeded (which is deemed to be a failure). We have run this algorithm on several benchmarks. Illustrative results for some benchmarks from the CMU collection [2] are shown in Table 1 on the next page. IV Discussion and Conclusion. The algorithm successfully learns the benchmarks listed in the table (and several others which were omitted for the sake of brevity). It handles real valued inputs, thereby obviating their quantization and binary encoding. Besides the simplicity of threshold units, one of the main advantages 1001
4 of our method is that unique weight values can be obtained at each step, by seeking out the exact minimum of the corresponding objective function (C or D, both of which are quadratics). This implies that given a problem, it is possible to generate a fixed topology and weights irrespective Number of units No. of non output Rank of matrix M Benchmark (input, output); units = [inputs + at Test set hidden) 1 (bias) + hidden] successful termination errors Two Spirals (2, 1); % classification [1, 2] Sonar data (60, 1); % classification [4] Speaker independent (10, 4); % vowel recognition [11] Table 1: Performance of the algorithm on benchmarks from the CMU collection [2] of the random initializations. In fact the quadratic minimization can be achieved via the leastsquares method. Consequently, gradient calculations can be completely avoided. There are several well known linear algebra packages for least squares solution, singular value decomposition, etc, which can be easily incorporated in this algorithm. The main drawbacks (of the current version) are (i) the number of hidden units required is considerably larger than a corresponding net with sigmoidal units; and (ii) the generalization performance (in terms of percentage errors on the test set) is also not as good as that of a net with sigmoidal units. These outcomes can be expected, since the threshold function is a highly many-to-one and discontinuous function. A smooth continuous function like sigmoid will naturally give better interpolation than a discontinuous function such as a step. Several further extensions are possible. Selection of an intermediate objective function (instead of the discrepancy D) which will guarantee that the rank of M will get raised and expedite the convergence is being investigated. Other possibilities include a mixture of sigmoidal and threshold units along with other nonlinearities f (such as clamped linear, multiple steps, etc.) and layered architectures with restricted depth and fan-in [10]. References [1] Fahlman, S. E., and Lebiere, C. The Cascade Correlation Learning Architecture". In Neural Information Processing Systems 2, 1990, D. S. Touretzsky, Ed., Morgan Kaufman, pp [2] Fahlman, S. E. et. al. Neural Nets Learning Algorithms and Benchmarks Database. maintained by S. E. Fahlman et. al. at the Computer Science Dept., Carnegie Mellon University. [3] Frattale Mascioli, F. M., and Martinelli, G. A constructive algorithm for binary neural networks: the oil-spot algorithm". IEEE Transactions on Neural Networks, vol. 6, no. 3, May 1995, pp [4] Gorman, R. P., and Sejnowski, T. J. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets". Neural Networks, vol. 1, 1988, pp [5] Gray, D. L., and Michel, A. N. A training algorithm for binary feedforward neural networks". IEEE Transactions on Neural Networks, vol. 3, 1992, pp [6] Marchand, M., Golea, M., and Rujan, P. A convergence theorem for sequential learning in two layer perceptrons". Europhysics Letters, vol. 11, 1990, pp [7] Martinelli, G., and Mascioli, F. M. Cascade Perceptron". IEE Electronics Letters, vol. 28, 1992, pp [8] Martinelli, G., Mascioli, F. M., and Bei, G. Cascade neural network for binary mapping". IEEE Transactions on Neural Networks, vol. 4, 1993, pp [9] Muselli, M. On sequential construction of binary neural networks". IEEE Transactions on Neural Networks, vol. 6, no. 3, May 1995, pp
5 [10] Phatak, D. S., and Koren, I. Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture". IEEE Transactions on Neural Networks, vol. 5, Nov. 1994, pp [11] Robinson, A. J., and Fallside, F. A Dynamic Connectionist Model for Phoneme Recognition". In Proc. of neuro, June 1988, Paris, Jun
Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture
Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture D. S. Phatak and I. Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationConstructively Learning a Near-Minimal Neural Network Architecture
Constructively Learning a Near-Minimal Neural Network Architecture Justin Fletcher and Zoran ObradoviC Abetract- Rather than iteratively manually examining a variety of pre-specified architectures, a constructive
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationA *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,
The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationCharacter Recognition Using Convolutional Neural Networks
Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationPerformance analysis of a MLP weight initialization algorithm
Performance analysis of a MLP weight initialization algorithm Mohamed Karouia (1,2), Régis Lengellé (1) and Thierry Denœux (1) (1) Université de Compiègne U.R.A. CNRS 817 Heudiasyc BP 49 - F-2 Compiègne
More informationMATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons.
MATLAB representation of neural network Outline Neural network with single-layer of neurons. Neural network with multiple-layer of neurons. Introduction: Neural Network topologies (Typical Architectures)
More informationArtificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5
Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:
More informationLearning with Limited Numerical Precision Using the Cascade-Correlation Algorithm
Learning with Limited Numerical Precision Using the Cascade-Correlation Algorithm Markus Hoehfeld and Scott E. Fahlman May 3, 1991 CMU-CS-91-130 School of Computer Science Carnegie Mellon University Pittsburgh,
More informationIMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM
Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationQuery Learning Based on Boundary Search and Gradient Computation of Trained Multilayer Perceptrons*
J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query learning based on boundary search and gradient computation of trained multilayer perceptrons", Proceedings of the International Joint Conference on Neural
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationVisual object classification by sparse convolutional neural networks
Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.
More informationPerceptron-Based Oblique Tree (P-BOT)
Perceptron-Based Oblique Tree (P-BOT) Ben Axelrod Stephen Campos John Envarli G.I.T. G.I.T. G.I.T. baxelrod@cc.gatech sjcampos@cc.gatech envarli@cc.gatech Abstract Decision trees are simple and fast data
More information6. Linear Discriminant Functions
6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier
More informationLecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012
More informationExtreme Learning Machines. Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014
Extreme Learning Machines Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014 This presentation covers: Revision of Neural Network theory Introduction to Extreme Learning Machines ELM Early
More informationIn the Name of God. Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System
In the Name of God Lecture 17: ANFIS Adaptive Network-Based Fuzzy Inference System Outline ANFIS Architecture Hybrid Learning Algorithm Learning Methods that Cross-Fertilize ANFIS and RBFN ANFIS as a universal
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationAn Empirical Study of Software Metrics in Artificial Neural Networks
An Empirical Study of Software Metrics in Artificial Neural Networks WING KAI, LEUNG School of Computing Faculty of Computing, Information and English University of Central England Birmingham B42 2SU UNITED
More informationImplementation Feasibility of Convex Recursive Deletion Regions Using Multi-Layer Perceptrons
Implementation Feasibility of Convex Recursive Deletion Regions Using Multi-Layer Perceptrons CHE-CHERN LIN National Kaohsiung Normal University Department of Industrial Technology Education 116 HePing
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationFast Training of Multilayer Perceptrons
Fast Training of Multilayer Perceptrons Brijesh Verma, Member of IEEE & IASTED School of Information Technology Faculty of Engineering and Applied Science Griffith University, Gold Coast Campus Gold Coast,
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationIEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationIMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS
IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS BOGDAN M.WILAMOWSKI University of Wyoming RICHARD C. JAEGER Auburn University ABSTRACT: It is shown that by introducing special
More informationNeural Network Weight Selection Using Genetic Algorithms
Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationSupervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce
More informationOMBP: Optic Modified BackPropagation training algorithm for fast convergence of Feedforward Neural Network
2011 International Conference on Telecommunication Technology and Applications Proc.of CSIT vol.5 (2011) (2011) IACSIT Press, Singapore OMBP: Optic Modified BackPropagation training algorithm for fast
More informationCOMBINING NEURAL NETWORKS FOR SKIN DETECTION
COMBINING NEURAL NETWORKS FOR SKIN DETECTION Chelsia Amy Doukim 1, Jamal Ahmad Dargham 1, Ali Chekima 1 and Sigeru Omatu 2 1 School of Engineering and Information Technology, Universiti Malaysia Sabah,
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationCHAPTER VI BACK PROPAGATION ALGORITHM
6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted
More informationarxiv: v1 [cs.lg] 25 Jan 2018
A New Backpropagation Algorithm without Gradient Descent arxiv:1802.00027v1 [cs.lg] 25 Jan 2018 Varun Ranganathan Student at PES University varunranga1997@hotmail.com January 2018 S. Natarajan Professor
More informationSimulation of objective function for training of new hidden units in constructive Neural Networks
International Journal of Mathematics And Its Applications Vol.2 No.2 (2014), pp.23-28. ISSN: 2347-1557(online) Simulation of objective function for training of new hidden units in constructive Neural Networks
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationImage Compression: An Artificial Neural Network Approach
Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and
More informationMulti-Layered Perceptrons (MLPs)
Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node to a Perceptron A set of weights can be found for the above 5 connections which will enable the XOR of the inputs to
More informationCenter for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.
Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationA Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S.
American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629
More informationMULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of
MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS By CHINMAY RANE Presented to the Faculty of Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for
More informationUsing CODEQ to Train Feed-forward Neural Networks
Using CODEQ to Train Feed-forward Neural Networks Mahamed G. H. Omran 1 and Faisal al-adwani 2 1 Department of Computer Science, Gulf University for Science and Technology, Kuwait, Kuwait omran.m@gust.edu.kw
More informationA Learning Algorithm for Piecewise Linear Regression
A Learning Algorithm for Piecewise Linear Regression Giancarlo Ferrari-Trecate 1, arco uselli 2, Diego Liberati 3, anfred orari 1 1 nstitute für Automatik, ETHZ - ETL CH 8092 Zürich, Switzerland 2 stituto
More informationDr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically
Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,
More informationAn Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.
An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. Mohammad Mahmudul Alam Mia, Shovasis Kumar Biswas, Monalisa Chowdhury Urmi, Abubakar
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationNeural Networks Laboratory EE 329 A
Neural Networks Laboratory EE 329 A Introduction: Artificial Neural Networks (ANN) are widely used to approximate complex systems that are difficult to model using conventional modeling techniques such
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationClassroom Tips and Techniques: Least-Squares Fits. Robert J. Lopez Emeritus Professor of Mathematics and Maple Fellow Maplesoft
Introduction Classroom Tips and Techniques: Least-Squares Fits Robert J. Lopez Emeritus Professor of Mathematics and Maple Fellow Maplesoft The least-squares fitting of functions to data can be done in
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationBumptrees for Efficient Function, Constraint, and Classification Learning
umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A
More informationConcept of Curve Fitting Difference with Interpolation
Curve Fitting Content Concept of Curve Fitting Difference with Interpolation Estimation of Linear Parameters by Least Squares Curve Fitting by Polynomial Least Squares Estimation of Non-linear Parameters
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value
More informationNeural Networks (Overview) Prof. Richard Zanibbi
Neural Networks (Overview) Prof. Richard Zanibbi Inspired by Biology Introduction But as used in pattern recognition research, have little relation with real neural systems (studied in neurology and neuroscience)
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationNeural Networks: What can a network represent. Deep Learning, Fall 2018
Neural Networks: What can a network represent Deep Learning, Fall 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:
More informationNeural Networks: What can a network represent. Deep Learning, Spring 2018
Neural Networks: What can a network represent Deep Learning, Spring 2018 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning 2 Recap : NNets and the brain
More informationV.Petridis, S. Kazarlis and A. Papaikonomou
Proceedings of IJCNN 93, p.p. 276-279, Oct. 993, Nagoya, Japan. A GENETIC ALGORITHM FOR TRAINING RECURRENT NEURAL NETWORKS V.Petridis, S. Kazarlis and A. Papaikonomou Dept. of Electrical Eng. Faculty of
More informationNeural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationREGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA
REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA MACHINE LEARNING It is the science of getting computer to learn without being explicitly programmed. Machine learning is an area of artificial
More informationProximal operator and methods
Proximal operator and methods Master 2 Data Science, Univ. Paris Saclay Robert M. Gower Optimization Sum of Terms A Datum Function Finite Sum Training Problem The Training Problem Convergence GD I Theorem
More informationCursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,
More informationImage Classification using Fast Learning Convolutional Neural Networks
, pp.50-55 http://dx.doi.org/10.14257/astl.2015.113.11 Image Classification using Fast Learning Convolutional Neural Networks Keonhee Lee 1 and Dong-Chul Park 2 1 Software Device Research Center Korea
More informationA Population-Based Learning Algorithm Which Learns Both. Architectures and Weights of Neural Networks y. Yong Liu and Xin Yao
A Population-Based Learning Algorithm Which Learns Both Architectures and Weights of Neural Networks y Yong Liu and Xin Yao Computational Intelligence Group Department of Computer Science University College,
More informationUnit V. Neural Fuzzy System
Unit V Neural Fuzzy System 1 Fuzzy Set In the classical set, its characteristic function assigns a value of either 1 or 0 to each individual in the universal set, There by discriminating between members
More informationA Connection between Network Coding and. Convolutional Codes
A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source
More informationDeep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationMultidimensional scaling
Multidimensional scaling Lecture 5 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 Cinderella 2.0 2 If it doesn t fit,
More informationA Compensatory Wavelet Neuron Model
A Compensatory Wavelet Neuron Model Sinha, M., Gupta, M. M. and Nikiforuk, P.N Intelligent Systems Research Laboratory College of Engineering, University of Saskatchewan Saskatoon, SK, S7N 5A9, CANADA
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More information