Extensive research has been conducted, aimed at developing

Size: px
Start display at page:

Download "Extensive research has been conducted, aimed at developing"

Transcription

1 Chapter 4 Supervised Learning: Multilayer Networks II Extensive research has been conducted, aimed at developing improved supervised learning algorithms for feedforward networks. 4.1 Madalines A \Madaline" is a combination of many Adalines, trained an algorithm that follows the \Minimum Disturbance" principle. Learning algorithms for Madalines have gone through three stages of development: MRI (Madaline Rule I), MRII and MRIII. 1

2 2 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Adaline x 1 OR x 2 Adaline Figure 4.1: Madaline I network with two Adalines \Madaline I" has an adaptive hidden layer and a xed logic device (AND/OR/majority gate) at the outer layer. The goal of its training procedure is to decrease network error (at each input presentation) by making the smallest possible perturbation to the weights of some Adaline. Note that the Adaline output must change in sign in order to have any eect on the network output. If the net input into an Adaline is large, its output reversal requires a large amount ofchange in its incoming weights. Therefore, the algorithm attempts to nd an existing Adaline (hidden node) whose net input is smallest in magnitude, and whose output reversal would reduce

3 4.1. MADALINES 3 network error. This output reversal is accomplished by applying the LMS algorithm to weights on connections leading into that Adaline. The number of Adalines for which output reversal is necessary depends on the choice of the outermost node function. If the outermost node computes the OR of its inputs, then network output can be changed from 0 to 1by modifying a single Adaline, but changing network output from 1 to 0 requires modifying all the Adalines for which current outputis1. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Madaline II architecture uses Adalines with modiable weights at the output layer of the network, instead of xed logic devices. The weights are initialized to small random values, and training patterns are repeatedly presented. The algorithm successively modies weights in earlier layers before later ones, using the LMS or -LMS rule. Adalines whose weights have already been adjusted

4 4 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Adaline Adaline Adaline Figure 4.2: Madaline II with one hidden layer may be revisited, when new input patterns are presented. It is possible to adjust the weights of several Adalines simultaneously. Since the node functions are not dierentiable, there can be no explicit gradient descent. The algorithm may repeatedly cycle through the same sequence of states. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Madaline III networks are feedforward, with sigmoid node functions. Weights of all nodes are adapted in each iteration, using the MRIII training algorithm. Although the approach is equivalent to backpropagation, each weight change involves considerably more computation.

5 4.1. MADALINES 5 Algorithm MRIII repeat Present a training pattern i to the network Compute node outputs for h = 1 to the number of hidden layers, do for each nodek in layer h, do Let E be the current network error Let network error be E k when >0 (small) is added to the kth node's net input Adjust weights on connections to kth node using or w = ;i(e 2 k ; E2 )= w = ;ie(e k ; E)= end-for end-for until network error is satisfactory or upper bound on no. of iterations is reached. Figure 4.3: MRIII Training Algorithm for Madaline III

6 6 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II * ::::::::::::::::::::::::::::::::::::::::::::::::::* 4.2 Adaptive Multilayer Networks Small networks are more desirable than large networks that perform the same task because: 1. training is faster, 2. the number of parameters in the system is smaller, 3. fewer training samples are needed, 4. the system is more likely to generalize well for new test samples. There are three approaches to build a network of \optimal" size: 1. A large network may be built and then \pruned" by eliminating nodes and connections that can be considered unimportant. 2. Starting with a very small network, the size of the network is repeatedly increased by small increments

7 4.2. ADAPTIVE MULTILAYER NETWORKS 7 until performance is satisfactory. 3. A suciently large network is trained, and unimportant connections and nodes are then pruned, following which new nodes with random weights are re-introduced and the network is retrained. This process is continued until a network of acceptable size and performance levels is obtained, or further pruning attempts become unsuccessful. These methods are based on sound heuristics, but are not guaranteed to result in an optimal size network. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Network pruning algorithms train a large network and then repeatedly `prune' it until size and performance of the network are both satisfactory. Pruning algorithms use penalties to help determine if some of the nodes and links in a trained network can be removed, leading to a network with better generalization properties. A generic algorithm for such methods is described.

8 8 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Algorithm Network-Pruning: Train a network large enough for the current problem repeat Find a node or connection whose removal does not penalize performance beyond a predetermined level Delete this node or connection (Optional:) Retrain the resulting network until further pruning degrades performance excessively. Figure 4.4: Generic network pruning algorithm.

9 4.2. ADAPTIVE MULTILAYER NETWORKS 9 The following are some pruning procedures: 1. Connections associated with weights of small magnitude may be eliminated from the trained network. Nodes whose associated connections have small magnitude weights may also be pruned. 2. Connections whose existence does not signicantly aect network outputs (or error) may be pruned. These may be detected by examining the change in network output when a connection weightischanged to 0 or by testing is negligible, i.e., if the network outputs change little when a weight is modied. 3. Input nodes can be pruned if the resulting change in network output is negligible. This results in a reduction in relevant input dimensionality, by detecting whichnetwork inputs are unnecessary for output computation.

10 10 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II 4. Lecun, Denker and Solla (1990) identify the weights that can be pruned from the network, by examining the second derivatives of the error function. Note that w+1 2 (E00 )(w) 2 where E If a network has already been trained until a local minimum of E has been reached, using an algorithm such as backpropagation, 0, and the above equation simplies to E 1 2 (E00 )(w) 2 : Pruning a connection corresponds to changing the connection weight fromw to 0, i.e., w = ;w. The connection is pruned if E = 1 2 (E00 )(;w) 2 =(E 00 )w 2 =2 is small enough. The pruning criterion hence examines whether (E 00 )w 2 =2 is below some threshold. :

11 4.2. ADAPTIVE MULTILAYER NETWORKS 11 Marchand's algorithm obtains an \optimal" size network for classication problems, repeatedly adding a perceptron node to the hidden layer. Let T + k and T ; k represent the training samples of two classes that remain to be correctly classied at the kth iteration. At each iteration, a node's weights are trained such that either jt + k+1 j < jt + k j or jt ; k+1 j < jt ; k (T ; k+1 [ T + k+1 ) (T ; k [ T + k j, and ), ensuring that the algorithm terminates eventually at the mth step, when either T ; m or T + m is empty. Eventually, wehave two sets of hidden units (H ; and H + ), that together give the correct answers on all samples of each class. The connection from the kth hidden node to the output node carries the weight w k = 8 < : 1=2 k if the kth node belongs to H + ;1=2 k if the kth node belongs to H ; The above weight assignment scheme ensures that nodes added later do not modify the correct results obtain in earlier iterations of the algorithm.

12 12 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II * ::::::::::::::::::::::::::::::::::::::::::::::::::* Upstart algorithm develops additional \subnets" for samples misclassied in earlier iterations. Each new node is trained using the pocket algorithm. If any samples are misclassied by a node, two subnets are introduced between the input nodes and other nodes in the network. Each subnet attempts to correctly separate samples of one class misclassied by the \parent" node from all samples of the other class for which the parent node was invoked. This is a simpler task than the one confronted by the parent node, ensuring that the recursive algorithm terminates. Large magnitude weights are attached to the connections from new nodes to the parent node, in such away that neither of the new nodes can aect performance of the parent nodeon samples correctly classied by the parent alone. The upstart algorithm can be extended to introduce a small network module at a time, instead of a single node, as in the \block-start" algorithm.

13 4.2. ADAPTIVE MULTILAYER NETWORKS 13 * ::::::::::::::::::::::::::::::::::::::::::::::::::* Neural Tree algorithm develops a decision tree, in which the output of a node (2 f0 1g) determines whether the nal answer is to come from its left subtree or its right subtree. Each node is trained using the LMS or pocket algorithm. A node is a leaf node (new subtrees are not required) if all or most of the corresponding training samples belong to the same class. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Cascade correlation: This algorithm has two important features: (1) the cascade architecture development, and (2) correlation learning. Moreover, the architecture is not strictly feedforward. In the cascade architecture development process, new single-node hidden layers are successively added to a steadily growing layered neural network until performance is judged adequate. Each node may employ a nonlinear node function such asthehyperbolic tangent, whose output lies in the closed interval [;1:0 1:0].

14 14 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Fahlman and Lebiere suggest using the Quickprop learning algorithm. When a node is added, its input weights are trained rst. Then all the weights on the connections to the output layer are trained while leaving other weights unchanged. Weights to each new hidden node are trained to maximize covariance with current network error, i.e., to maximize S(w new )= KX k=1 PX p=1 (x new p ; x new )(E k p ; Ek ) where w new is the vector of weights to the new node from all the pre-existing input and hidden units, x new p is the output of the new node for the pth input, E k p is the error of the kth output node for the pth sample before the new node is added, and x new and Ek are averages (over the training set).

15 4.2. ADAPTIVE MULTILAYER NETWORKS 15 Output node Output node Output node First hidden node First hidden node x y 1 x y 1 x y 1 Input nodes Input nodes Input nodes Output node Output node Second hidden node Second hidden node First hidden node First hidden node x y 1 x y 1 Input nodes Input nodes Figure 4.5: (modied Fig.4.19 from text) Cascade correlation example solid lines indicate connections currently being modied.

16 16 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Tiling algorithm develops a multilayer feedforward network, successively adding new hidden layers into which ow the outputs of the previous layer. Within each layer, a \master" node is rst trained using the pocket algorithm, and then \ancillary" nodes are successively added to separate samples of dierent classes that are not yet separated by nodes currently in that layer. Each layer is constructed such that for every pair of training samples that belong to dierent classes, some node in that layer produces dierent outputs. * ::::::::::::::::::::::::::::::::::::::::::::::::::* 4.3 Prediction Networks Prediction problems constitute a special subclass of function approximation problems, in which the values of variables need to be determined from values at previous instants. Two classes of neural networks have been used for prediction tasks: recurrent networks and feedforward networks.

17 4.3. PREDICTION NETWORKS 17 * ::::::::::::::::::::::::::::::::::::::::::::::::::* Recurrent Networks Recurrent neural networks contain connections from output nodes to hidden layer and/or input layer nodes, and they allow interconnections between nodes of the same layer, particularly between the nodes of hidden layers. Rumelhart, Hinton, and Williams's (1986) training procedure is essentially the same as the backpropagation algorithm. They view recurrent networks as feedforward networks with a large number of layers. Each layer is thought of as representing a time delay in the network. Using this approach, the fully connected neural network with three nodes is considered equivalent to a feedforward neural network with k hidden layers, for some value of k. Weights in dierent layers are constrained to be identical, to capture the structure of a recurrent network: w (` `;1) i j = w (`;1 `;2) i j.

18 18 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Output Output Output Time w 3,1 k+1 w 1,1 k w 1, w 1,1. w 3,1 1 w 1,3 Input Input Input 0 (a) (b) Figure 4.6: (a) Fully connected recurrent neural network with 3 nodes (b) Equivalent feedforward version, for Rumelhart's training procedure.

19 4.3. PREDICTION NETWORKS 19 Output Output Input Figure 4.7: Recurrent network with hidden nodes, to which Williams and Zipser's algorithm can be applied. Williams and Zipser proposed another training procedure for a recurrent network with hidden nodes. The net input to the kth node consists of inputs from other nodes as well as external inputs. The output of each node depends on outputs of other nodes at the previous instant: o k (t +1)=f(net k (t)). * ::::::::::::::::::::::::::::::::::::::::::::::::::* Feedforward networks for forecasting The generic network model consists of a preliminary preprocessing component that transforms an external

20 20 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Algorithm Recurrent Network Training: Assume randomly chosen weights, t = 0, k i j =0 for each i j k: while MSE is unsatisfactory and k (t i j bounds are not exceeded, do Modify the weights: w i j (t) = X k2u (d k (t) ; o k i j where U is the set of nodes with a specied target value d k (t) = f 0 (net k (t)) " X `2U w k i j # + i k z j (t) Increment t end-while. Figure 4.8: Williams and Zipser's Recurrent network training algorithm

21 4.3. PREDICTION NETWORKS 21 input vector x(t) into a preprocessed vector x(t). The feedforward network is trained to compute the desired output value for a specic input x(t). x(t) Preprocessor x(t) Predicted Neural x(t+1) Network Predictor Figure 4.9: Generic neural network model for prediction. Tapped delay-line neural network (TDNN): Consider a prediction task where x(t) is to be predicted from x(t ; 1) x(t ; 2): In this simple case, x at time t consists of a single input x(t), and x at time t consists of the vector (x(t ; 1) x(t ; 2)) supplied as input to the feedforward network. For this example, preprocessing consists merely of storing past values of the variable and supplying them to the network along with the latest value. Many preprocessing transformations for prediction problems can be described as: x i (t) = tx =0 c i (t ; )x()

22 22 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II where c is a known \kernel" function. For example, in \exponential trace memories," c i (j) =(1; i ) j i, where i 2 [;1 1]. The kernel function for \discrete-time gamma memories" is c i (j) = 8 < : ; j l i (1 ; i ) l i +1 j;l i i 0 otherwise if j l i where delay l i is a non-negative integer and i 2 [0 1]. * ::::::::::::::::::::::::::::::::::::::::::::::::::* 4.4 Radial Basis Functions (RBF) A function is radially symmetric (or is an RBF) if its output depends on the distance of the input vector from a stored vector specic to that function. Neural networks whose node functions are radially symmetric functions are referred to as RBF-nets. RBF nodes compute a non-increasing function of distance, with (u 1 ) (u 2 ) whenever u 1 <u 2. The Gaussian, g (u) / e ;(u=c)2

23 4.4. RADIAL BASIS FUNCTIONS (RBF) 23 is the most widely used RBF. RBF-nets are generally called upon for use in function approximation problems, particularly for interpolation. In many function approximation problems, we needto determine the behavior of the function at a new input, given the behavior of the function at training samples. Such problems are often solved by linear interpolation. (x 1 :11) (x 2 :8) D 2 D 3 (x 3 :9) (x 6 :17) (x D 5 x 8 :10) 0 (x 7 :15) D 4 (x 5 :8) (x 4 :3) Figure 4.10: D j is the Euclidean distance between x j and x 0. (x 5 : 8) indicates that f(x 5 )=8. The four nearest observations can be used for interpolation at x 0,giving (8D ;1 2 +9D ;1 3 +3D ;1 4 +8D ;1 5 )=(D ;1 2 + D ;1 3 + D ;1 4 + D ;1 5 ). * ::::::::::::::::::::::::::::::::::::::::::::::::::* If nodes in an RBF-net with linear radial basis node functions 0 (D) = D ;1 corresponding to each of the

24 24 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II training samples x p for p =1 ::: P 0 surround the new input x, the output of the network at x is proportional to 1 X P 0 p d p 0 (kx ; x p k) where d p = f(x p ) is the output corresponding to x p, the p-th pattern. To avoid the problem of deciding which of the training samples do surround the new input, we may use the output 1 where d 1 ::: d P P PX p=1 d p (kx ; x p k) are the outputs corresponding to the entire set of training samples, and is any RBF. For network size to be reasonably small, we cannot have one node to representeach x p. Hence similar training samples are clustered together, and output NX o = 1 N i=1 where N is the number of clusters, i is the center of ' i (k i ; xk) the ith cluster, and ' i is the desired mean output of all samples of the ith cluster.

25 4.4. RADIAL BASIS FUNCTIONS (RBF) 25 Training involves learning the values of w 1 = ' 1 N ::: w N = ' N N minimizing E = PX p=1 E p = PX p=1 1 ::: N (d p ; o p ) 2 : Gradient descent suggests the following update rules: w i = i (d p ; o p )(kx p ; i k)and i j = ; i j w i (d p ; o p )R 0 (kx p ; i k 2 )(x p j ; i j ) where R is dened for convenience such thatr(z 2 ) = (z). Note that i and i j may dier for each w i and i j. This learning algorithm requires considerable computation. A faster alternative, \partially o-line training," consists of two steps: 1. Some clustering procedure is used to estimate the centroid ( i ) and spread for each cluster. 2. Using one node per cluster (with xed i ), gradient descent one is invoked to nd weights (w i ). * ::::::::::::::::::::::::::::::::::::::::::::::::::*

26 26 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II 4.5 Polynomial Networks Many practical problems require computing or approximating functions that are polynomials of the input variables, a task that can require many nodes and extensive training if familiar node functions (sigmoids, Gaussians, etc.) are employed. Networks whose node functions allow them to directly compute polynomials and functions of polynomials are referred to as \polynomial networks". The following are some examples. 1. Higher-order networks contain no hidden layers, but have nodes referred to as \higher-order processing units" that perform complex computations. A higherorder node applies a nonlinear function f to a polynomial in the input variables, and the resulting node output is f(w 0 + X j 1 w j1 i j1 + :::+ X j 1 j 2 ::: j k w j1 j 2 ::: j k i j1 i j2 :::i jk )

27 4.5. POLYNOMIAL NETWORKS 27 The LMS training algorithm can be applied to train such a network. However, the number of weights required increases rapidly with the input dimensionality n and polynomial degree k. 2. Sigma-pi networks contain \sigma-pi units" that apply nonlinear functions to sums of weighted products of input variables: f(w 0 + :::+ X j 1 6=j 2 6=j 3 6=j 1 w j1 j 2 j 3 i j1 i j2 i j3 + :::) This model does not allow terms with higher powers of input variables, therefore sigma-pi networks are not universal approximators. 3. \Product units" are nodes that compute products n j=1 ip j i j, where each p j i is an integer whose value is to be determined. Networks may contain these in addition to some \ordinary" nodes that apply sigmoids or step functions to their net input. The learning algorithm must determine both the weights and exponents p ji for product units, in addition to deter-

28 28 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II mining weights for ordinary nodes. Backpropagation training may beused,andlearningisslow. 4. Networks with functional link architecture conduct linear regression, estimating the weights in P j w j j (i), where each \basis function" j is chosen from a predened set of components (such as linear functions, products and sinusoidal functions). 5. Pi-sigma networks contain a hidden layer, in which each hidden node computes w k 0 + P j w k ji j, and the output node computes f Q k(w k 0 + P j w k ji j ) : Weights between the input layer and the hidden layer are adapted during the training process, and can be trained quickly using the LMS algorithm. 6. Ridge-polynomial networks consist of components that generalize Pi-sigma networks, and are trained using an adaptive network construction algorithm. These networks have universal approximation capabilities. * ::::::::::::::::::::::::::::::::::::::::::::::::::*

29 4.6. REGULARIZATION Regularization Regularization involves optimizing a function E +jp j 2, where E is the original cost (or error) function, P is a \stabilizer" that incorporates a priori problem-specic requirements or constraints, and is a constant that controls the relative importance of E and P. Regularization can be implemented explicitly or implicitly. Examples of explicit regularization: Introduction of P = P j w2 j into the cost function penalizes large magnitude weights. A weight decay term may be introduced into the generalized delta rule, so that the weight change w = ;(@E=@w) ; w depends on the negative gradient of the mean square error as well as the current weight. This favors the development ofnetworks with smaller weight magnitudes.

30 30 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II \Smoothing" penalties may beintroduced into the error function, e.g., terms proportional to j 2 E=wi 2 j and other higher-order derivatives of the error function. Such penalties prevent a network's output function from having very high curvature, and may prevent anetwork from over-specializing on the training data to account for outliers in the data. Examples of implicit regularization: Training data may be perturbed articially, using Gaussian noise. This is equivalent to using a stabilizer that imposes a smoothness constraint on the derivative of the squared error function with respect to the input variables. Adding random noise to the weights imposes a smoothness constraint on the derivatives of the squared error function with respect to the weights. RBF-nets constitute a special case of networks that accomplish regularization.

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES

APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

A Population-Based Learning Algorithm Which Learns Both. Architectures and Weights of Neural Networks y. Yong Liu and Xin Yao

A Population-Based Learning Algorithm Which Learns Both. Architectures and Weights of Neural Networks y. Yong Liu and Xin Yao A Population-Based Learning Algorithm Which Learns Both Architectures and Weights of Neural Networks y Yong Liu and Xin Yao Computational Intelligence Group Department of Computer Science University College,

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms

Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms Computer Science Technical Reports Computer Science 995 Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms ChunHsien Chen Iowa State University R. G. Parekh Iowa

More information

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II

Classifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II Advances in Neural Information Processing Systems 7. (99) The MIT Press, Cambridge, MA. pp.949-96 Unsupervised Classication of 3D Objects from D Views Satoshi Suzuki Hiroshi Ando ATR Human Information

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism) Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent

More information

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5 Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:

More information

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control. Neuro-Remodeling via Backpropagation of Utility K. Wendy Tang and Girish Pingle 1 Department of Electrical Engineering SUNY at Stony Brook, Stony Brook, NY 11794-2350. ABSTRACT Backpropagation of utility

More information

For Monday. Read chapter 18, sections Homework:

For Monday. Read chapter 18, sections Homework: For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron

More information

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,

More information

4. Feedforward neural networks. 4.1 Feedforward neural network structure

4. Feedforward neural networks. 4.1 Feedforward neural network structure 4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

Lab 2: Support Vector Machines

Lab 2: Support Vector Machines Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates

More information

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998 Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer

More information

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R. Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial Accelerated Learning on the Connection Machine Diane J. Cook Lawrence B. Holder University of Illinois Beckman Institute 405 North Mathews, Urbana, IL 61801 Abstract The complexity of most machine learning

More information

Constructively Learning a Near-Minimal Neural Network Architecture

Constructively Learning a Near-Minimal Neural Network Architecture Constructively Learning a Near-Minimal Neural Network Architecture Justin Fletcher and Zoran ObradoviC Abetract- Rather than iteratively manually examining a variety of pre-specified architectures, a constructive

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

The recursive element of the Upstart algorithm.

The recursive element of the Upstart algorithm. The Upstart Algorithm : a method for constructing and training feed-forward neural networks Marcus Frean Department of Physics and Centre for Cognitive Science Edinburgh University, The Kings Buildings

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being

More information

A NEW METHOD FOR MULTILAYER NEURAL NETWORKS. A Thesis. Submitted to the Graduate School. of the University of Notre Dame

A NEW METHOD FOR MULTILAYER NEURAL NETWORKS. A Thesis. Submitted to the Graduate School. of the University of Notre Dame A NEW METHOD FOR CONSTRUCTING AND TRAINING MULTILAYER NEURAL NETWORKS A Thesis Submitted to the Graduate School of the University of Notre Dame in Partial Fulllment of the Requirements for the Degree of

More information

Reinforcement Control via Heuristic Dynamic Programming. K. Wendy Tang and Govardhan Srikant. and

Reinforcement Control via Heuristic Dynamic Programming. K. Wendy Tang and Govardhan Srikant. and Reinforcement Control via Heuristic Dynamic Programming K. Wendy Tang and Govardhan Srikant wtang@ee.sunysb.edu and gsrikant@ee.sunysb.edu Department of Electrical Engineering SUNY at Stony Brook, Stony

More information

Basis Functions. Volker Tresp Summer 2017

Basis Functions. Volker Tresp Summer 2017 Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

2. Neural network basics

2. Neural network basics 2. Neural network basics Next commonalities among different neural networks are discussed in order to get started and show which structural parts or concepts appear in almost all networks. It is presented

More information

Artificial Neural Networks MLP, RBF & GMDH

Artificial Neural Networks MLP, RBF & GMDH Artificial Neural Networks MLP, RBF & GMDH Jan Drchal drchajan@fel.cvut.cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Radial Basis Function Networks Adrian Horzyk Preface Radial Basis Function Networks (RBFN) are a kind of artificial neural networks that use radial basis functions (RBF) as activation

More information

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and

Using Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University) Chapter 6 Supervised Learning II: Backpropagation and Beyond Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)

More information

Pattern Classification Algorithms for Face Recognition

Pattern Classification Algorithms for Face Recognition Chapter 7 Pattern Classification Algorithms for Face Recognition 7.1 Introduction The best pattern recognizers in most instances are human beings. Yet we do not completely understand how the brain recognize

More information

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

1. Approximation and Prediction Problems

1. Approximation and Prediction Problems Neural and Evolutionary Computing Lab 2: Neural Networks for Approximation and Prediction 1. Approximation and Prediction Problems Aim: extract from data a model which describes either the depence between

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Model learning for robot control: a survey

Model learning for robot control: a survey Model learning for robot control: a survey Duy Nguyen-Tuong, Jan Peters 2011 Presented by Evan Beachly 1 Motivation Robots that can learn how their motors move their body Complexity Unanticipated Environments

More information

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,

More information

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an

2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an Competitive Neural Trees for Vector Quantization Sven Behnke and Nicolaos B. Karayiannis Department of Mathematics Department of Electrical and Computer Science and Computer Engineering Martin-Luther-University

More information

Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture

Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture D. S. Phatak and I. Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,

More information

Multi Layer Perceptron with Back Propagation. User Manual

Multi Layer Perceptron with Back Propagation. User Manual Multi Layer Perceptron with Back Propagation User Manual DAME-MAN-NA-0011 Issue: 1.3 Date: September 03, 2013 Author: S. Cavuoti, M. Brescia Doc. : MLPBP_UserManual_DAME-MAN-NA-0011-Rel1.3 1 INDEX 1 Introduction...

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

Introduction to Neural Networks: Structure and Training

Introduction to Neural Networks: Structure and Training Introduction to Neural Networks: Structure and Training Qi-Jun Zhang Department of Electronics Carleton University, Ottawa, ON, Canada A Quick Illustration Example: Neural Network Model for Delay Estimation

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Constructive Feedforward Neural. ASurvey. Tin-Yau Kwok & Dit-Yan Yeung. Hong Kong University of Science and Technology. Clear Water Bay, Kowloon

Constructive Feedforward Neural. ASurvey. Tin-Yau Kwok & Dit-Yan Yeung. Hong Kong University of Science and Technology. Clear Water Bay, Kowloon Constructive Feedforward Neural Networks for Regression Problems: ASurvey Tin-Yau Kwok & Dit-Yan Yeung Technical Report HKUST-CS95-43 September 1995 Department of Computer Science Hong Kong University

More information

Learning via Optimization

Learning via Optimization Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines

More information

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class

More information

Neuro-Fuzzy Inverse Forward Models

Neuro-Fuzzy Inverse Forward Models CS9 Autumn Neuro-Fuzzy Inverse Forward Models Brian Highfill Stanford University Department of Computer Science Abstract- Internal cognitive models are useful methods for the implementation of motor control

More information

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds.

Contrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds. Constrained K-Means Clustering P. S. Bradley K. P. Bennett A. Demiriz Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. Redmond, WA 98052 Renselaer

More information

Constructive Neural Network Learning Algorithms for Multi-Category Pattern Classification

Constructive Neural Network Learning Algorithms for Multi-Category Pattern Classification Computer Science Technical Reports Computer Science 12-5-1995 Constructive Neural Network Learning Algorithms for Multi-Category Pattern Classification Raesh G. Parekh Iowa State University Jihoon Yang

More information

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005 Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997

More information

11/14/2010 Intelligent Systems and Soft Computing 1

11/14/2010 Intelligent Systems and Soft Computing 1 Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Introduction to ANSYS DesignXplorer

Introduction to ANSYS DesignXplorer Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters

More information

CHAPTER VI BACK PROPAGATION ALGORITHM

CHAPTER VI BACK PROPAGATION ALGORITHM 6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted

More information

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Networks for Control. California Institute of Technology. Pasadena, CA Abstract Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract

More information

A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems

A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems Chapter 5 A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems 5.1 Introduction Many researchers have proposed pruning algorithms in numerous ways to optimize

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4. Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning

More information

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991. Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

In this assignment, we investigated the use of neural networks for supervised classification

In this assignment, we investigated the use of neural networks for supervised classification Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric

More information

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U Extending the Power and Capacity of Constraint Satisfaction Networks nchuan Zeng and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 8460 Email: zengx@axon.cs.byu.edu,

More information

Training Digital Circuits with Hamming Clustering

Training Digital Circuits with Hamming Clustering IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 47, NO. 4, APRIL 2000 513 Training Digital Circuits with Hamming Clustering Marco Muselli, Member, IEEE, and Diego

More information

6. Backpropagation training 6.1 Background

6. Backpropagation training 6.1 Background 6. Backpropagation training 6.1 Background To understand well how a feedforward neural network is built and it functions, we consider its basic first steps. We return to its history for a while. In 1949

More information

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models

More information

Research on outlier intrusion detection technologybased on data mining

Research on outlier intrusion detection technologybased on data mining Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development

More information

DESIGN OF NEURAL NETWORKS FOR FAST CONVERGENCE AND ACCURACY: DYNAMICS AND CONTROL

DESIGN OF NEURAL NETWORKS FOR FAST CONVERGENCE AND ACCURACY: DYNAMICS AND CONTROL DESIGN OF NEURAL NETWORKS FOR FAST CONVERGENCE AND ACCURACY: DYNAMICS AND CONTROL Peiman G. Maghami and Dean W. Sparks, Jr. Abstract-A procedure for the design and training of artificial neural networks,

More information

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Exercise: Training Simple MLP by Backpropagation. Using Netlab. Exercise: Training Simple MLP by Backpropagation. Using Netlab. Petr Pošík December, 27 File list This document is an explanation text to the following script: demomlpklin.m script implementing the beckpropagation

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract

Richard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract Developing Population Codes By Minimizing Description Length Richard S Zemel 1 Georey E Hinton University oftoronto & Computer Science Department The Salk Institute, CNL University oftoronto 0 North Torrey

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

The task of inductive learning from examples is to nd an approximate definition

The task of inductive learning from examples is to nd an approximate definition 1 Initializing Neural Networks using Decision Trees Arunava Banerjee 1.1 Introduction The task of inductive learning from examples is to nd an approximate definition for an unknown function f(x), given

More information