Subgoal Chaining and the Local Minimum Problem

Size: px
Start display at page:

Download "Subgoal Chaining and the Local Minimum Problem"

Transcription

1 Subgoal Chaining and the Local Minimum roblem Jonathan. Lewis Michael K. Weir Department of Computer Science, University of St. Andrews, St. Andrews, Fife KY6 9SS, Scotland Abstract It is well known that performing gradient descent on fixed surfaces may result in poor travel through getting stuck in local minima and other surface features. Subgoal chaining in supervised learning is a method to improve travel for neural networks by directing local variation in the surface during training. This paper shows however that linear subgoal chains such as those used in ERA are not sufficient to overcome the local minimum problem and examines non-linear subgoal chains as a possible alternative. Introduction A problem long recognised as important for gradient descent techniques used in optimisation is that of local minima. The problem is how to avoid convergence to solutions of the minimisation condition F=f(S)=0 that do not correspond to the lowest value of F where F is a potential function over a state space S. An interesting technique, expanded range approximation (ERA), has recently been put forward [] which purports to deal with the problem for supervised feedforward neural networks where the potential function is Least Mean Square output error and the state space is the neural weight space. The ERA method consists of compressing the range of target values d p for the training set down to their mean value <d> for each output unit d = p= d p and then progressively expanding these compressed targets linearly back toward their original values. That is, a modified training set for the inputs x p is defined as S( λ ) = x p,{ d + λ( d p d )} (2) where the parameter λ is increased in regular steps from 0 to. In our own terminology, the value of λ corresponds to a particular subgoal setting and the increases in λ generate a linear subgoal chain from the mean-valued targets to the final goal targets for the original training set. There have been other approaches to achieving optimal learning such as in [2]. Our group has also developed a subgoal chain approach [3] to improve robustness through better goal direction. However what is interesting about () ERA is its simplicity and the claim made for it, namely that ERA "is guaranteed to succeed in avoiding local minima in a large class of problems". This claim appears to be supported emrically through ERA achieving a 00% success rate on the XOR problem. In this paper, we examine the above claim made for ERA in the avoidance of local minima both theoretically and emrically. In particular we show that the linear chaining unfortunately still fails to avoid convergence to local minima for the goal. Finally, our analysis and results suggest that subgoal chaining remains potentially an attractive approach provided the realisability of the subgoals is taken into account. 2 The Status of Attractors in Linear Subgoal Chains The designers of ERA use three main stages. The first is to begin training on the mid-range output state as the first subgoal defined by (). The mid-range output state corresponds to the global minimum weight states for this first subgoal which can be set exactly or trained iteratively. Some mid-range weight states for a N-H-M net are given exactly by w d H k = ln w jk f ( w0 j (3) d k j= = 0, i = N, j = H, k = M 0 k ) w ij where w ij are weights to hidden unit j from unit i, w jk are weights to output unit k from unit j and unit 0 is the bias unit. For a single layer N-M net the mid-range state is similar to (3) only without the sum, and w jk =0 for k= M and j= N The second stage is to make a small enough step along the linear chain so that the new subgoal s global minimum contains the mid-range state in its basin. The third stage is to repeat the same size of step to move iteratively along the chain until the goal is reached. The claim is made that the range can thereby be progressively expanded up to λ= without displacing the system from the global minimum at any step. This gives rise to the notion that local minima are avoided so that travel to the goal s global minimum is always successful.

2 Our main concern is with the third stage in the design. The main design feature is that the travel surface is changed with every subgoal. This surface change however may not entirely fit with the designers intentions. In particular there is the possibility of the attractor influencing the weight state transitions changing from being a global to a local minimum. In this event, the ERA method can no longer rely on passing from one global minimum to the next. The worst case is when the weight state is in a subgoal attractor basin which is that of a local minimum for the goal. In this situation, the local minimum is the most attractive state, whereafter no further progress can be made towards the goal through the linear subgoal chain. This point is illustrated in Figures & 2. Figure is a stylised view of weight space with a local minimum as the current state and a global minimum. Figure 2 is a stylised view of the output space corresponding to the weight space in Figure. In Figure 2, various paths such as the line L attempted by ERA are not realisable. In particular, paths from local minima which monotonically decrease error with respect to the goal do not exist, no matter what surface variation occurs. Other paths such as L2 or L3 which initially lead away from the goal exist but have an initial increase in error. The increase is not only with respect to the goal but any subgoal along L. Consequently, the weight state will converge to the local minimum rather than the global minimum when linear chaining is used. Figure Weight space with a local minimum W and global minimum W2. Dashed lines indicate error contours. L2 and L3 are paths from W to W2. In short, the designers of ERA do not allow for the fact that its linear subgoal chain may not have an associated connecting path to the goal, no matter how small the step size is. The consequence is that ERA's main claim to guarantee to succeed in avoiding local minima in a large class of problems is undermined. On the contrary, local minima are a major source of difficulty for ERA. 3 Emrical Examples of Goal-failure In this section we construct a number of examples which show ERA failing to attain the global minimum for the goal. These examples fall into a variety of categories, principally those where the network does and does not contain hidden units. Some examples for each category are minimal and artificial in order to show the principles of construction clearly. In a further example we also use a larger and more irregular data set to indicate the scope of the counter examples. The general principle of construction behind all the examples will be the same, namely to place a local minimum for the goal in the way of the training path of weight states being trained by the linear subgoal chain. For simplicity, this is mostly done by placing local minima at or near a weight state satisfying the initial ERA subgoal. This Figure 2 Output space with achievable outputs and 2 corresponding to weight states W and W2 in Figure. The concentric circles denote error contour lines with respect to the goal at 2. L is an unrealisable path if is a local minimum for the goal. L2 and L3 are realisable routes to the goal 2 from the local minimum, but not using a linear subgoal chain. makes it either certain or at least likely that the local minimum basin on the goal's error-weight surface contains ERA's mid-range weight state. 3. Example Construction rinciples By using a technique similar to one used by Brady [4] and examined in [5] it is possible to establish local minima in

3 LMS error. Many of our problems differ from Brady s though in using inseparable data so that the best fit to the data involves misclassification. The training set is divided between non-spoiler points and a relatively small set of spoiler points. Without the spoiler points, non-spoiler points, by definition, meet their targets exactly at the global minimum. When the spoilers are added there is no weight state where the new data set meets all of its targets exactly. Furthermore, the spoilers are set so that there is, apart from the global minimum, at least one local minimum for the goal at or near mid-range states satisfying ERA s initial subgoal. The local minimum occurs where the points all have exact or near mid-range values. The global minimum has the nonspoilers nearly meeting their targets exactly, leaving the fewer spoilers with relatively high error. Figure 3, explained in detail below, illustrates a design for a 2- net where the points have exact mid-range values at the local minimum. The local minimum has zero weight values in this case while the global minimum corresponds to a linear separation near the line H in Figure Counter Examples for Single-layer Nets The following counter examples are for a 2- net, where a local minimum for the goal is at or near ERA s mid-range state. A weight state for the mid range may be set to be a local minimum by first arranging the spoiler and non-spoiler points relative to one another so that = 0, i, j (4) wij where E is the LMS error over all the patterns and w ij is a weight from input i to output unit j as defined in (3). To check that the zero-valued gradients are minima, 2 nd order derivatives or convergence tests can be used. In order to ensure (4) holds we make the following observations. Firstly, E p ex pj = = pj inp wij = p pj w ij p= (5) = inp + inp pj pj where E p is the LMS error for pattern p, ex pj is the excitation of output unit j for pattern p, inp is the input i for pattern p. pj is defined for each pattern p and output unit j with a sigmoidal activation function as p pj = = ( T pj out pj ) out pj ( out pj ) (6) pj where T pj is the target for pattern p and output unit j, out pj is the output of unit j for pattern p. If we set the number of Figure 3 An inseparable input point set for a 2- net which creates a local minimum for the goal at the mid-range output state. There are 4 points in each class: 7 nonspoilers and spoiler. The global minimum corresponds to a linear separation near H. patterns for each class to be equal we establish the midrange output value as TAj + TBj out MR = (7) 2 where T Aj and T Bj are the goal targets for patterns of class A and B respectively. Using (6) and (7) then yields Aj = ( TAj out MR ) outmr ( out MR ) (8) = ( TBj outmr ) out MR ( out MR ) = Bj For the bias weight we have therefore created zero gradients, because inp = in (5) for a bias unit i. In order to create zero-valued error-weight gradients for weights connected from all input lines i we require in addition that inp = inp (9) p classa The training set in Figure (3) is designed to obey (7) and (9) and so has zero-valued error-weight gradients at the mid-range state. In contrast, the points in Figure 4 have been generated in a more random way to make the example less artificial. The latter input point set no longer has a local minimum at the mid-range exactly, but merely close to the mid-range. It is possible to do this for single layer nets by moving points slightly relative to the positions determined by (9) in a random fashion. Figure 5 shows a function approximation problem. This counter example is an exception to the design placing a local minimum at or near the mid-range state. The design is similar to one described in [4], which was modified so that the previous global minimum state was shifted in weight space and status to become a local minimum at the end of ERA's subgoal path. The goal targets of 0.29 and 0.77 created the desired local minimum.

4 Figure 4 An inseparable input point set for a 2- net which creates a local minimum for the goal near the mid-range output state. There are 25 points in each class including the 4 class A spoilers. H and H2 denote global and local minimum separation lines respectively. 3.3 Counter Examples for Multi-layer Nets The following counter example is for a 2-2- net. Figure 6 shows an example which has a local minimum for the goal at the mid-range state. Zero error-weight gradients are again created through equations. The mid-range weight states given by (3) constrain links from the hidden units j to carry the same output K to all other units over all the patterns. For the output units we therefore observe that w = jk = p classa p pk = pk out pj = p pk w jk p= (0) pk K + pk K where w jk is the weight from hidden unit j to output unit k, ex pk is the excitation of output unit k for pattern p and pk is in essence the same output unit delta as defined in (6) with k substituted for j in (6). With the same substitution made in (7) we obtain Ak = Bk () From (0) and () we can therefore see that for all links from the bias unit and hidden units we have zero-valued error-weight gradients for the output unit. Now we require = = 0 pj out (2) wij p= where w ij are weights to hidden unit j from previous layers, and out is the output of unit i to a hidden unit for pattern p. pj is a hidden unit delta for unit j. For a sigmoidal activation function it can be written as pj = pk w jk out pj ( out pj ) (3) k where out pj is the output of hidden unit j for pattern p. Figure 5 A function approximation example for a 2- net which creates a local minimum associated with the line H at the end of ERA s path from the mid-range to the goal. The global minimum hyperplane is near the line H2. Figure 6 A training set for a 2-2- net which creates a local minimum at the mid-range output state. There are 8 points in each class. The global minimum occurs near the 2 hidden unit hyperplanes associated with the lines H and H2. The local minimum hidden unit hyperplanes are parallel to the x-y plane. With w jk and out pj constant over all patterns p it follows from () that Aj = Bj (4) In conjunction, (2) and (4) applied recursively ensure that we have zero-valued error-weight gradients for all links from hidden units, and from the bias unit, to other hidden units. For links from input lines to hidden units to have zero-valued error-weight gradients, we now in addition require (9) to be satisfied. The input and target sets in Figure 6 have been designed to obey the conditions in (7) and (9) and hence have zerovalued error-weight gradients.

5 3.4 Experiments The problems were all run from 25 different random initial weight-states. Training was done using back-propagation. The learning rate was set to be very low, with no momentum being used, in order to encourage robust, i.e. successful, training where possible. The tolerance for determining successful training on both subgoal and goal targets was problem dependent and was set in order to distinguish between final local and global minimum states. The goal target values were 0.2 and 0.8 apart from the function approximation example where they were 0.29 and For ERA, 0 and 00 step subgoal chains were tested Single Layer Net Results In this section we present the results for the various test problems with a short discussion. The results summarised in Table show ERA failing completely on all single layer net problems whereas standard (Std) training can find the global minimum with up to 32% success. One can see that if ERA's mid-range state is a local minimum for the goal ERA is bound to fail to reach the global minimum (data set ). Failure also occurs when the mid-range state lies near a local minimum for the goal (data set 2) or a local minimum for the goal lies at the end of the training path directed by the linear subgoal chain (data set 3). Failing to reach the global minimum results in severe misclassification for the linearly inseparable problems ( and 2). For the function approximation example (data set 3), failure to reach the global minimum produces poor approximation. Data Training Learning % finding Average Cycles set method Rate Global for success Std ERA 0. 0 fails 2 Std ERA fails 3 Std ERA 0. 0 fails Table Results for single layer net experiments. Data sets, 2 and 3 refer to the training sets displayed in Figures 3, 4 and 5 respectively Multi Layer Net Results Table 2 shows the results for the multi-layer net experiments. One can see that for data set 4, which has zero-valued error-weight gradients, ERA has almost a 50% chance of finding the global minimum, whereas for the single layer counter examples ERA fails completely. Unlike the single layer case, there is more than one mid-range weight state now only some of which both obey (7) and (9) and are local minima. It would appear that roughly 50% of the mid-range weight states are local minima since we get roughly 50% failure for ERA. It should be noted that when ERA succeeds in finding the Data Training Learning Step % finding Average Cycles set method Rate Size Global for success 4 Std 0. N/A ERA ERA Table 2 Results for multi layer net experiments. Data set 4 refers to the counter example displayed in Figure 6. global minimum in data set 4 it takes a lot longer than a standard unchained technique due to initially shallow gradients at ERA's starting weight state. On data set 4 ERA takes roughly 2 to 4 times as many cycles as standard training and has a significantly lower success rate. 4.0 Conclusions The successful construction and testing of counter examples for linear subgoal chaining leads to the question of why the success and failure rates for ERA's linear subgoal chaining can be so extreme, i.e. 00% success for XOR and 00% failure for some counter examples. We surmise that the answer lies in ERA's starting conditions in particular and in the mechanism of linear subgoal chaining in general. ERA's starting conditions are such, that it is necessary to pass through or near the midrange state to begin with for every problem. This means that the outcome is dependent on the mid-range state rather than the initial weight-state. We have found that for XOR the first subgoal after the mid-range state is exactly realisable. The same is true for all subsequent subgoals with a successful path to the goal being the result. Hence all weight initialisations yield success. The counter examples causing failure placed a local minimum for the goal on ERA s travel path so that the state transitions converge towards it and become stuck. This resulted in ERA's failure every time, because, once convergent towards a local minimum for the goal, there is no significant progress towards the remaining subgoals. Deste appearances to the contrary, linear subgoal chaining is very similar to the standard unchained approach in the basis for its success and failure. That is, while it undoubtedly generates different travel paths to those of the unchained approach, due to using varying error-weight surfaces, it has similar attractor basins. What we mean by this is that linear subgoal chaining may be treated as having an underlying travel surface which is an amalgamation of the varying error-weight surfaces. The basins of this surface may differ in shape but are the same in number and have the same attractors. Success or failure for ERA therefore depends on whether the mid-range state is in the goal's attractor basin on the underlying travel surface or not, and step-size issues apart, nothing else. That is why it is all-or-none for some

6 problems. Linear subgoal chaining without the mid-range starting condition and initialised randomly may be expected to have a more variable success rate depending on the basin distribution for a problem. It has become clear that the linear subgoal chaining technique employed in [] can fail to reach the global minimum, just as standard unchained training can. Nonetheless we believe subgoal chaining potentially remains an attractive approach provided the realisability of subgoals can be taken into account. 5. Non-linear Subgoal Chaining The counter example in Figure 5 was used to test the feasibility of a non-linear subgoal chaining approach. The subgoal chains were derived from outputs taken along weight-space paths between the local and global minimum for the goal. The local and global minimum weight states were obtained beforehand in multiple training runs using a standard unchained approach. report on the value of this technique. References [] Gorse, D, Shepherd, A J and Taylor, J G. The new ERA in supervised learning. Neural Networks, 0(2): , 995. [2] Cetin, B C, Barhen, J and Burdick, J W. Terminal repeller unconstrained subenergy tunneling (TRUST) for fast global optimization. Journal of Optimization Theory and Applications, 77:97-26, 993. [3] Weir, M. & Fernandes, A. Tangent Hyperplanes and Subgoals as a Means of Controlling Direction in Goal Finding, roceedings of the World Conference on Neural Networks, Vol III: , San Diego California, 994. [4] Brady, M.L. et al, Back-ropagation Fails to Separate Where erceptrons Succeed, IEEE Transactions on Circuits and Systems, V36, No5, May 989. [5] Sontag, E.D., & Sussmann, H.J., Back-ropagation can give rise to spurious local minima even for networks without hidden layers, Complex Systems, 3:9-06, 989. The results for the test with non-linear chains are displayed in Table 3. It shows the non-linear subgoal chaining method (NLSG) manages to obtain the global minimum in 00% of the trials for data set, on which standard training succeeds in 8% of the trials and ERA in 0%. This is a major improvement on the standard unchained method and ERA. In terms of training cycles the non-linear chains perform similarly to standard training (see Table, data set ). The potential of non-linear subgoal chaining is confirmed by the experimental runs. We believe that such non linear subgoal chains may be found by performing adaptive subgoal chain shang during training, according to progress and realisability criteria. Data Training Learning % finding Average Cycles set method Rate Global for success NL SG Table 3 Results for non-linear subgoal chaining experiments. The training set for data set is displayed in Figure Summary This paper has presented results to show that linear subgoal chaining such as used in [] cannot overcome the local minimum problem. In doing so we have provided a mathematical model for designing training sets which have a local minimum at ERA's starting mid-range state and have proposed non-linear subgoal chaining as a feasible technique to overcome local minima. Non-linear subgoal chaining has been shown to be potentially very useful in overcoming local minima without causing any substantial loss in training speed. Such chaining is currently being developed and as our research progresses we intend to

Practical Tips for using Backpropagation

Practical Tips for using Backpropagation Practical Tips for using Backpropagation Keith L. Downing August 31, 2017 1 Introduction In practice, backpropagation is as much an art as a science. The user typically needs to try many combinations of

More information

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R. Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Learning internal representations

Learning internal representations CHAPTER 4 Learning internal representations Introduction In the previous chapter, you trained a single-layered perceptron on the problems AND and OR using the delta rule. This architecture was incapable

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

An Empirical Study of Software Metrics in Artificial Neural Networks

An Empirical Study of Software Metrics in Artificial Neural Networks An Empirical Study of Software Metrics in Artificial Neural Networks WING KAI, LEUNG School of Computing Faculty of Computing, Information and English University of Central England Birmingham B42 2SU UNITED

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

2. Neural network basics

2. Neural network basics 2. Neural network basics Next commonalities among different neural networks are discussed in order to get started and show which structural parts or concepts appear in almost all networks. It is presented

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

In this assignment, we investigated the use of neural networks for supervised classification

In this assignment, we investigated the use of neural networks for supervised classification Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

11/14/2010 Intelligent Systems and Soft Computing 1

11/14/2010 Intelligent Systems and Soft Computing 1 Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5 Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Multi-Layered Perceptrons (MLPs)

Multi-Layered Perceptrons (MLPs) Multi-Layered Perceptrons (MLPs) The XOR problem is solvable if we add an extra node to a Perceptron A set of weights can be found for the above 5 connections which will enable the XOR of the inputs to

More information

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Artificial Neuron Modelling Based on Wave Shape

Artificial Neuron Modelling Based on Wave Shape Artificial Neuron Modelling Based on Wave Shape Kieran Greer, Distributed Computing Systems, Belfast, UK. http://distributedcomputingsystems.co.uk Version 1.2 Abstract This paper describes a new model

More information

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class

More information

Artificial Neural Network based Curve Prediction

Artificial Neural Network based Curve Prediction Artificial Neural Network based Curve Prediction LECTURE COURSE: AUSGEWÄHLTE OPTIMIERUNGSVERFAHREN FÜR INGENIEURE SUPERVISOR: PROF. CHRISTIAN HAFNER STUDENTS: ANTHONY HSIAO, MICHAEL BOESCH Abstract We

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

Local and Global Minimum

Local and Global Minimum Local and Global Minimum Stationary Point. From elementary calculus, a single variable function has a stationary point at if the derivative vanishes at, i.e., 0. Graphically, the slope of the function

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

CHAPTER VI BACK PROPAGATION ALGORITHM

CHAPTER VI BACK PROPAGATION ALGORITHM 6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

Introduction to SNNS

Introduction to SNNS Introduction to SNNS Caren Marzban http://www.nhn.ou.edu/ marzban Introduction In this lecture we will learn about a Neural Net (NN) program that I know a little about - the Stuttgart Neural Network Simulator

More information

(Refer Slide Time: 00:02:00)

(Refer Slide Time: 00:02:00) Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 18 Polyfill - Scan Conversion of a Polygon Today we will discuss the concepts

More information

Constrained Optimization of the Stress Function for Multidimensional Scaling

Constrained Optimization of the Stress Function for Multidimensional Scaling Constrained Optimization of the Stress Function for Multidimensional Scaling Vydunas Saltenis Institute of Mathematics and Informatics Akademijos 4, LT-08663 Vilnius, Lithuania Saltenis@ktlmiilt Abstract

More information

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991. Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

STEREO-DISPARITY ESTIMATION USING A SUPERVISED NEURAL NETWORK

STEREO-DISPARITY ESTIMATION USING A SUPERVISED NEURAL NETWORK 2004 IEEE Workshop on Machine Learning for Signal Processing STEREO-DISPARITY ESTIMATION USING A SUPERVISED NEURAL NETWORK Y. V. Venkatesh, B. S. Venhtesh and A. Jaya Kumar Department of Electrical Engineering

More information

Techniques for Dealing with Missing Values in Feedforward Networks

Techniques for Dealing with Missing Values in Feedforward Networks Techniques for Dealing with Missing Values in Feedforward Networks Peter Vamplew, David Clark*, Anthony Adams, Jason Muench Artificial Neural Networks Research Group, Department of Computer Science, University

More information

Lab 2: Support vector machines

Lab 2: Support vector machines Artificial neural networks, advanced course, 2D1433 Lab 2: Support vector machines Martin Rehn For the course given in 2006 All files referenced below may be found in the following directory: /info/annfk06/labs/lab2

More information

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey. Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces

Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces Adaptive Robotics - Final Report Extending Q-Learning to Infinite Spaces Eric Christiansen Michael Gorbach May 13, 2008 Abstract One of the drawbacks of standard reinforcement learning techniques is that

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN

More information

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being

More information

Pattern Recognition Methods for Object Boundary Detection

Pattern Recognition Methods for Object Boundary Detection Pattern Recognition Methods for Object Boundary Detection Arnaldo J. Abrantesy and Jorge S. Marquesz yelectronic and Communication Engineering Instituto Superior Eng. de Lisboa R. Conselheiro Emídio Navarror

More information

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty) Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel

More information

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S.

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S. American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

Search direction improvement for gradient-based optimization problems

Search direction improvement for gradient-based optimization problems Computer Aided Optimum Design in Engineering IX 3 Search direction improvement for gradient-based optimization problems S Ganguly & W L Neu Aerospace and Ocean Engineering, Virginia Tech, USA Abstract

More information

Artificial Neural Network Methodology for Modelling and Forecasting Maize Crop Yield

Artificial Neural Network Methodology for Modelling and Forecasting Maize Crop Yield Agricultural Economics Research Review Vol. 21 January-June 2008 pp 5-10 Artificial Neural Network Methodology for Modelling and Forecasting Maize Crop Yield Rama Krishna Singh and Prajneshu * Biometrics

More information

Topology Optimization of Multiple Load Case Structures

Topology Optimization of Multiple Load Case Structures Topology Optimization of Multiple Load Case Structures Rafael Santos Iwamura Exectuive Aviation Engineering Department EMBRAER S.A. rafael.iwamura@embraer.com.br Alfredo Rocha de Faria Department of Mechanical

More information

Gesture Recognition using Neural Networks

Gesture Recognition using Neural Networks Gesture Recognition using Neural Networks Jeremy Smith Department of Computer Science George Mason University Fairfax, VA Email: jsmitq@masonlive.gmu.edu ABSTRACT A gesture recognition method for body

More information

4. Feedforward neural networks. 4.1 Feedforward neural network structure

4. Feedforward neural networks. 4.1 Feedforward neural network structure 4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required

More information

AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK

AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK AN APPROACH OF SEMIAUTOMATED ROAD EXTRACTION FROM AERIAL IMAGE BASED ON TEMPLATE MATCHING AND NEURAL NETWORK Xiangyun HU, Zuxun ZHANG, Jianqing ZHANG Wuhan Technique University of Surveying and Mapping,

More information

COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung

COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung POLYTECHNIC UNIVERSITY Department of Computer and Information Science COMPUTER SIMULATION OF COMPLEX SYSTEMS USING AUTOMATA NETWORKS K. Ming Leung Abstract: Computer simulation of the dynamics of complex

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph. Laura E. Carter-Greaves

Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph. Laura E. Carter-Greaves http://neuroph.sourceforge.net 1 Introduction Time Series prediction with Feed-Forward Neural Networks -A Beginners Guide and Tutorial for Neuroph Laura E. Carter-Greaves Neural networks have been applied

More information

Experimental Data and Training

Experimental Data and Training Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion

More information

Fast Learning for Big Data Using Dynamic Function

Fast Learning for Big Data Using Dynamic Function IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Fast Learning for Big Data Using Dynamic Function To cite this article: T Alwajeeh et al 2017 IOP Conf. Ser.: Mater. Sci. Eng.

More information

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve

More information

IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS

IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS IMPLEMENTATION OF RBF TYPE NETWORKS BY SIGMOIDAL FEEDFORWARD NEURAL NETWORKS BOGDAN M.WILAMOWSKI University of Wyoming RICHARD C. JAEGER Auburn University ABSTRACT: It is shown that by introducing special

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen

Lecture 13. Deep Belief Networks. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com 12 December 2012

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information

Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm

Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm Numerical Experiments with a Population Shrinking Strategy within a Electromagnetism-like Algorithm Ana Maria A. C. Rocha and Edite M. G. P. Fernandes Abstract This paper extends our previous work done

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

A Short SVM (Support Vector Machine) Tutorial

A Short SVM (Support Vector Machine) Tutorial A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange

More information

Hybrid PSO-SA algorithm for training a Neural Network for Classification

Hybrid PSO-SA algorithm for training a Neural Network for Classification Hybrid PSO-SA algorithm for training a Neural Network for Classification Sriram G. Sanjeevi 1, A. Naga Nikhila 2,Thaseem Khan 3 and G. Sumathi 4 1 Associate Professor, Dept. of CSE, National Institute

More information

Numerical Method in Optimization as a Multi-stage Decision Control System

Numerical Method in Optimization as a Multi-stage Decision Control System Numerical Method in Optimization as a Multi-stage Decision Control System B.S. GOH Institute of Mathematical Sciences University of Malaya 50603 Kuala Lumpur MLYSI gohoptimum@gmail.com bstract: - Numerical

More information

Character Recognition Using Convolutional Neural Networks

Character Recognition Using Convolutional Neural Networks Character Recognition Using Convolutional Neural Networks David Bouchain Seminar Statistical Learning Theory University of Ulm, Germany Institute for Neural Information Processing Winter 2006/2007 Abstract

More information

Training Digital Circuits with Hamming Clustering

Training Digital Circuits with Hamming Clustering IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 47, NO. 4, APRIL 2000 513 Training Digital Circuits with Hamming Clustering Marco Muselli, Member, IEEE, and Diego

More information

Adaptive Regularization. in Neural Network Filters

Adaptive Regularization. in Neural Network Filters Adaptive Regularization in Neural Network Filters Course 0455 Advanced Digital Signal Processing May 3 rd, 00 Fares El-Azm Michael Vinther d97058 s97397 Introduction The bulk of theoretical results and

More information

Massively Parallel Seesaw Search for MAX-SAT

Massively Parallel Seesaw Search for MAX-SAT Massively Parallel Seesaw Search for MAX-SAT Harshad Paradkar Rochester Institute of Technology hp7212@rit.edu Prof. Alan Kaminsky (Advisor) Rochester Institute of Technology ark@cs.rit.edu Abstract The

More information

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu

More information

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm Dr. Ian D. Wilson School of Technology, University of Glamorgan, Pontypridd CF37 1DL, UK Dr. J. Mark Ware School of Computing,

More information

Supplement to. Logic and Computer Design Fundamentals 4th Edition 1

Supplement to. Logic and Computer Design Fundamentals 4th Edition 1 Supplement to Logic and Computer esign Fundamentals 4th Edition MORE OPTIMIZTION Selected topics not covered in the fourth edition of Logic and Computer esign Fundamentals are provided here for optional

More information

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-&3 -(' ( +-   % '.+ % ' -0(+$, The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure

More information

NEURAL NETWORK VISUALIZATION

NEURAL NETWORK VISUALIZATION Neural Network Visualization 465 NEURAL NETWORK VISUALIZATION Jakub Wejchert Gerald Tesauro IB M Research T.J. Watson Research Center Yorktown Heights NY 10598 ABSTRACT We have developed graphics to visualize

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

Map Abstraction with Adjustable Time Bounds

Map Abstraction with Adjustable Time Bounds Map Abstraction with Adjustable Time Bounds Sourodeep Bhattacharjee and Scott D. Goodwin School of Computer Science, University of Windsor Windsor, N9B 3P4, Canada sourodeepbhattacharjee@gmail.com, sgoodwin@uwindsor.ca

More information

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout Tail Inequalities Wafi AlBalawi and Ashraf Osman Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV fwafi,osman@csee.wvu.edug 1 Routing in a Parallel Computer

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Accelerating the convergence speed of neural networks learning methods using least squares

Accelerating the convergence speed of neural networks learning methods using least squares Bruges (Belgium), 23-25 April 2003, d-side publi, ISBN 2-930307-03-X, pp 255-260 Accelerating the convergence speed of neural networks learning methods using least squares Oscar Fontenla-Romero 1, Deniz

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

11.1 Optimization Approaches

11.1 Optimization Approaches 328 11.1 Optimization Approaches There are four techniques to employ optimization of optical structures with optical performance constraints: Level 1 is characterized by manual iteration to improve the

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U Extending the Power and Capacity of Constraint Satisfaction Networks nchuan Zeng and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 8460 Email: zengx@axon.cs.byu.edu,

More information

Hidden Units. Sargur N. Srihari

Hidden Units. Sargur N. Srihari Hidden Units Sargur N. srihari@cedar.buffalo.edu 1 Topics in Deep Feedforward Networks Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems

More information

An Integer Recurrent Artificial Neural Network for Classifying Feature Vectors

An Integer Recurrent Artificial Neural Network for Classifying Feature Vectors An Integer Recurrent Artificial Neural Network for Classifying Feature Vectors Roelof K Brouwer PEng, PhD University College of the Cariboo, Canada Abstract: The main contribution of this report is the

More information

Non-linear Point Distribution Modelling using a Multi-layer Perceptron

Non-linear Point Distribution Modelling using a Multi-layer Perceptron Non-linear Point Distribution Modelling using a Multi-layer Perceptron P.D. Sozou, T.F. Cootes, C.J. Taylor and E.C. Di Mauro Department of Medical Biophysics University of Manchester M13 9PT email: pds@svl.smb.man.ac.uk

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer

More information

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

ORT EP R RCH A ESE R P A IDI!  #$$% &' (# $! R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

A Robust Method for Circle / Ellipse Extraction Based Canny Edge Detection

A Robust Method for Circle / Ellipse Extraction Based Canny Edge Detection International Journal of Research Studies in Science, Engineering and Technology Volume 2, Issue 5, May 2015, PP 49-57 ISSN 2349-4751 (Print) & ISSN 2349-476X (Online) A Robust Method for Circle / Ellipse

More information