Extensive research has been conducted, aimed at developing
|
|
- Dorthy Sparks
- 5 years ago
- Views:
Transcription
1 Chapter 4 Supervised Learning: Multilayer Networks II Extensive research has been conducted, aimed at developing improved supervised learning algorithms for feedforward networks. 4.1 Madalines A \Madaline" is a combination of many Adalines, trained an algorithm that follows the \Minimum Disturbance" principle. Learning algorithms for Madalines have gone through three stages of development: MRI (Madaline Rule I), MRII and MRIII. 1
2 2 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Adaline x 1 OR x 2 Adaline Figure 4.1: Madaline I network with two Adalines \Madaline I" has an adaptive hidden layer and a xed logic device (AND/OR/majority gate) at the outer layer. The goal of its training procedure is to decrease network error (at each input presentation) by making the smallest possible perturbation to the weights of some Adaline. Note that the Adaline output must change in sign in order to have any eect on the network output. If the net input into an Adaline is large, its output reversal requires a large amount ofchange in its incoming weights. Therefore, the algorithm attempts to nd an existing Adaline (hidden node) whose net input is smallest in magnitude, and whose output reversal would reduce
3 4.1. MADALINES 3 network error. This output reversal is accomplished by applying the LMS algorithm to weights on connections leading into that Adaline. The number of Adalines for which output reversal is necessary depends on the choice of the outermost node function. If the outermost node computes the OR of its inputs, then network output can be changed from 0 to 1by modifying a single Adaline, but changing network output from 1 to 0 requires modifying all the Adalines for which current outputis1. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Madaline II architecture uses Adalines with modiable weights at the output layer of the network, instead of xed logic devices. The weights are initialized to small random values, and training patterns are repeatedly presented. The algorithm successively modies weights in earlier layers before later ones, using the LMS or -LMS rule. Adalines whose weights have already been adjusted
4 4 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Adaline Adaline Adaline Figure 4.2: Madaline II with one hidden layer may be revisited, when new input patterns are presented. It is possible to adjust the weights of several Adalines simultaneously. Since the node functions are not dierentiable, there can be no explicit gradient descent. The algorithm may repeatedly cycle through the same sequence of states. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Madaline III networks are feedforward, with sigmoid node functions. Weights of all nodes are adapted in each iteration, using the MRIII training algorithm. Although the approach is equivalent to backpropagation, each weight change involves considerably more computation.
5 4.1. MADALINES 5 Algorithm MRIII repeat Present a training pattern i to the network Compute node outputs for h = 1 to the number of hidden layers, do for each nodek in layer h, do Let E be the current network error Let network error be E k when >0 (small) is added to the kth node's net input Adjust weights on connections to kth node using or w = ;i(e 2 k ; E2 )= w = ;ie(e k ; E)= end-for end-for until network error is satisfactory or upper bound on no. of iterations is reached. Figure 4.3: MRIII Training Algorithm for Madaline III
6 6 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II * ::::::::::::::::::::::::::::::::::::::::::::::::::* 4.2 Adaptive Multilayer Networks Small networks are more desirable than large networks that perform the same task because: 1. training is faster, 2. the number of parameters in the system is smaller, 3. fewer training samples are needed, 4. the system is more likely to generalize well for new test samples. There are three approaches to build a network of \optimal" size: 1. A large network may be built and then \pruned" by eliminating nodes and connections that can be considered unimportant. 2. Starting with a very small network, the size of the network is repeatedly increased by small increments
7 4.2. ADAPTIVE MULTILAYER NETWORKS 7 until performance is satisfactory. 3. A suciently large network is trained, and unimportant connections and nodes are then pruned, following which new nodes with random weights are re-introduced and the network is retrained. This process is continued until a network of acceptable size and performance levels is obtained, or further pruning attempts become unsuccessful. These methods are based on sound heuristics, but are not guaranteed to result in an optimal size network. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Network pruning algorithms train a large network and then repeatedly `prune' it until size and performance of the network are both satisfactory. Pruning algorithms use penalties to help determine if some of the nodes and links in a trained network can be removed, leading to a network with better generalization properties. A generic algorithm for such methods is described.
8 8 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Algorithm Network-Pruning: Train a network large enough for the current problem repeat Find a node or connection whose removal does not penalize performance beyond a predetermined level Delete this node or connection (Optional:) Retrain the resulting network until further pruning degrades performance excessively. Figure 4.4: Generic network pruning algorithm.
9 4.2. ADAPTIVE MULTILAYER NETWORKS 9 The following are some pruning procedures: 1. Connections associated with weights of small magnitude may be eliminated from the trained network. Nodes whose associated connections have small magnitude weights may also be pruned. 2. Connections whose existence does not signicantly aect network outputs (or error) may be pruned. These may be detected by examining the change in network output when a connection weightischanged to 0 or by testing is negligible, i.e., if the network outputs change little when a weight is modied. 3. Input nodes can be pruned if the resulting change in network output is negligible. This results in a reduction in relevant input dimensionality, by detecting whichnetwork inputs are unnecessary for output computation.
10 10 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II 4. Lecun, Denker and Solla (1990) identify the weights that can be pruned from the network, by examining the second derivatives of the error function. Note that w+1 2 (E00 )(w) 2 where E If a network has already been trained until a local minimum of E has been reached, using an algorithm such as backpropagation, 0, and the above equation simplies to E 1 2 (E00 )(w) 2 : Pruning a connection corresponds to changing the connection weight fromw to 0, i.e., w = ;w. The connection is pruned if E = 1 2 (E00 )(;w) 2 =(E 00 )w 2 =2 is small enough. The pruning criterion hence examines whether (E 00 )w 2 =2 is below some threshold. :
11 4.2. ADAPTIVE MULTILAYER NETWORKS 11 Marchand's algorithm obtains an \optimal" size network for classication problems, repeatedly adding a perceptron node to the hidden layer. Let T + k and T ; k represent the training samples of two classes that remain to be correctly classied at the kth iteration. At each iteration, a node's weights are trained such that either jt + k+1 j < jt + k j or jt ; k+1 j < jt ; k (T ; k+1 [ T + k+1 ) (T ; k [ T + k j, and ), ensuring that the algorithm terminates eventually at the mth step, when either T ; m or T + m is empty. Eventually, wehave two sets of hidden units (H ; and H + ), that together give the correct answers on all samples of each class. The connection from the kth hidden node to the output node carries the weight w k = 8 < : 1=2 k if the kth node belongs to H + ;1=2 k if the kth node belongs to H ; The above weight assignment scheme ensures that nodes added later do not modify the correct results obtain in earlier iterations of the algorithm.
12 12 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II * ::::::::::::::::::::::::::::::::::::::::::::::::::* Upstart algorithm develops additional \subnets" for samples misclassied in earlier iterations. Each new node is trained using the pocket algorithm. If any samples are misclassied by a node, two subnets are introduced between the input nodes and other nodes in the network. Each subnet attempts to correctly separate samples of one class misclassied by the \parent" node from all samples of the other class for which the parent node was invoked. This is a simpler task than the one confronted by the parent node, ensuring that the recursive algorithm terminates. Large magnitude weights are attached to the connections from new nodes to the parent node, in such away that neither of the new nodes can aect performance of the parent nodeon samples correctly classied by the parent alone. The upstart algorithm can be extended to introduce a small network module at a time, instead of a single node, as in the \block-start" algorithm.
13 4.2. ADAPTIVE MULTILAYER NETWORKS 13 * ::::::::::::::::::::::::::::::::::::::::::::::::::* Neural Tree algorithm develops a decision tree, in which the output of a node (2 f0 1g) determines whether the nal answer is to come from its left subtree or its right subtree. Each node is trained using the LMS or pocket algorithm. A node is a leaf node (new subtrees are not required) if all or most of the corresponding training samples belong to the same class. * ::::::::::::::::::::::::::::::::::::::::::::::::::* Cascade correlation: This algorithm has two important features: (1) the cascade architecture development, and (2) correlation learning. Moreover, the architecture is not strictly feedforward. In the cascade architecture development process, new single-node hidden layers are successively added to a steadily growing layered neural network until performance is judged adequate. Each node may employ a nonlinear node function such asthehyperbolic tangent, whose output lies in the closed interval [;1:0 1:0].
14 14 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Fahlman and Lebiere suggest using the Quickprop learning algorithm. When a node is added, its input weights are trained rst. Then all the weights on the connections to the output layer are trained while leaving other weights unchanged. Weights to each new hidden node are trained to maximize covariance with current network error, i.e., to maximize S(w new )= KX k=1 PX p=1 (x new p ; x new )(E k p ; Ek ) where w new is the vector of weights to the new node from all the pre-existing input and hidden units, x new p is the output of the new node for the pth input, E k p is the error of the kth output node for the pth sample before the new node is added, and x new and Ek are averages (over the training set).
15 4.2. ADAPTIVE MULTILAYER NETWORKS 15 Output node Output node Output node First hidden node First hidden node x y 1 x y 1 x y 1 Input nodes Input nodes Input nodes Output node Output node Second hidden node Second hidden node First hidden node First hidden node x y 1 x y 1 Input nodes Input nodes Figure 4.5: (modied Fig.4.19 from text) Cascade correlation example solid lines indicate connections currently being modied.
16 16 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Tiling algorithm develops a multilayer feedforward network, successively adding new hidden layers into which ow the outputs of the previous layer. Within each layer, a \master" node is rst trained using the pocket algorithm, and then \ancillary" nodes are successively added to separate samples of dierent classes that are not yet separated by nodes currently in that layer. Each layer is constructed such that for every pair of training samples that belong to dierent classes, some node in that layer produces dierent outputs. * ::::::::::::::::::::::::::::::::::::::::::::::::::* 4.3 Prediction Networks Prediction problems constitute a special subclass of function approximation problems, in which the values of variables need to be determined from values at previous instants. Two classes of neural networks have been used for prediction tasks: recurrent networks and feedforward networks.
17 4.3. PREDICTION NETWORKS 17 * ::::::::::::::::::::::::::::::::::::::::::::::::::* Recurrent Networks Recurrent neural networks contain connections from output nodes to hidden layer and/or input layer nodes, and they allow interconnections between nodes of the same layer, particularly between the nodes of hidden layers. Rumelhart, Hinton, and Williams's (1986) training procedure is essentially the same as the backpropagation algorithm. They view recurrent networks as feedforward networks with a large number of layers. Each layer is thought of as representing a time delay in the network. Using this approach, the fully connected neural network with three nodes is considered equivalent to a feedforward neural network with k hidden layers, for some value of k. Weights in dierent layers are constrained to be identical, to capture the structure of a recurrent network: w (` `;1) i j = w (`;1 `;2) i j.
18 18 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Output Output Output Time w 3,1 k+1 w 1,1 k w 1, w 1,1. w 3,1 1 w 1,3 Input Input Input 0 (a) (b) Figure 4.6: (a) Fully connected recurrent neural network with 3 nodes (b) Equivalent feedforward version, for Rumelhart's training procedure.
19 4.3. PREDICTION NETWORKS 19 Output Output Input Figure 4.7: Recurrent network with hidden nodes, to which Williams and Zipser's algorithm can be applied. Williams and Zipser proposed another training procedure for a recurrent network with hidden nodes. The net input to the kth node consists of inputs from other nodes as well as external inputs. The output of each node depends on outputs of other nodes at the previous instant: o k (t +1)=f(net k (t)). * ::::::::::::::::::::::::::::::::::::::::::::::::::* Feedforward networks for forecasting The generic network model consists of a preliminary preprocessing component that transforms an external
20 20 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II Algorithm Recurrent Network Training: Assume randomly chosen weights, t = 0, k i j =0 for each i j k: while MSE is unsatisfactory and k (t i j bounds are not exceeded, do Modify the weights: w i j (t) = X k2u (d k (t) ; o k i j where U is the set of nodes with a specied target value d k (t) = f 0 (net k (t)) " X `2U w k i j # + i k z j (t) Increment t end-while. Figure 4.8: Williams and Zipser's Recurrent network training algorithm
21 4.3. PREDICTION NETWORKS 21 input vector x(t) into a preprocessed vector x(t). The feedforward network is trained to compute the desired output value for a specic input x(t). x(t) Preprocessor x(t) Predicted Neural x(t+1) Network Predictor Figure 4.9: Generic neural network model for prediction. Tapped delay-line neural network (TDNN): Consider a prediction task where x(t) is to be predicted from x(t ; 1) x(t ; 2): In this simple case, x at time t consists of a single input x(t), and x at time t consists of the vector (x(t ; 1) x(t ; 2)) supplied as input to the feedforward network. For this example, preprocessing consists merely of storing past values of the variable and supplying them to the network along with the latest value. Many preprocessing transformations for prediction problems can be described as: x i (t) = tx =0 c i (t ; )x()
22 22 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II where c is a known \kernel" function. For example, in \exponential trace memories," c i (j) =(1; i ) j i, where i 2 [;1 1]. The kernel function for \discrete-time gamma memories" is c i (j) = 8 < : ; j l i (1 ; i ) l i +1 j;l i i 0 otherwise if j l i where delay l i is a non-negative integer and i 2 [0 1]. * ::::::::::::::::::::::::::::::::::::::::::::::::::* 4.4 Radial Basis Functions (RBF) A function is radially symmetric (or is an RBF) if its output depends on the distance of the input vector from a stored vector specic to that function. Neural networks whose node functions are radially symmetric functions are referred to as RBF-nets. RBF nodes compute a non-increasing function of distance, with (u 1 ) (u 2 ) whenever u 1 <u 2. The Gaussian, g (u) / e ;(u=c)2
23 4.4. RADIAL BASIS FUNCTIONS (RBF) 23 is the most widely used RBF. RBF-nets are generally called upon for use in function approximation problems, particularly for interpolation. In many function approximation problems, we needto determine the behavior of the function at a new input, given the behavior of the function at training samples. Such problems are often solved by linear interpolation. (x 1 :11) (x 2 :8) D 2 D 3 (x 3 :9) (x 6 :17) (x D 5 x 8 :10) 0 (x 7 :15) D 4 (x 5 :8) (x 4 :3) Figure 4.10: D j is the Euclidean distance between x j and x 0. (x 5 : 8) indicates that f(x 5 )=8. The four nearest observations can be used for interpolation at x 0,giving (8D ;1 2 +9D ;1 3 +3D ;1 4 +8D ;1 5 )=(D ;1 2 + D ;1 3 + D ;1 4 + D ;1 5 ). * ::::::::::::::::::::::::::::::::::::::::::::::::::* If nodes in an RBF-net with linear radial basis node functions 0 (D) = D ;1 corresponding to each of the
24 24 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II training samples x p for p =1 ::: P 0 surround the new input x, the output of the network at x is proportional to 1 X P 0 p d p 0 (kx ; x p k) where d p = f(x p ) is the output corresponding to x p, the p-th pattern. To avoid the problem of deciding which of the training samples do surround the new input, we may use the output 1 where d 1 ::: d P P PX p=1 d p (kx ; x p k) are the outputs corresponding to the entire set of training samples, and is any RBF. For network size to be reasonably small, we cannot have one node to representeach x p. Hence similar training samples are clustered together, and output NX o = 1 N i=1 where N is the number of clusters, i is the center of ' i (k i ; xk) the ith cluster, and ' i is the desired mean output of all samples of the ith cluster.
25 4.4. RADIAL BASIS FUNCTIONS (RBF) 25 Training involves learning the values of w 1 = ' 1 N ::: w N = ' N N minimizing E = PX p=1 E p = PX p=1 1 ::: N (d p ; o p ) 2 : Gradient descent suggests the following update rules: w i = i (d p ; o p )(kx p ; i k)and i j = ; i j w i (d p ; o p )R 0 (kx p ; i k 2 )(x p j ; i j ) where R is dened for convenience such thatr(z 2 ) = (z). Note that i and i j may dier for each w i and i j. This learning algorithm requires considerable computation. A faster alternative, \partially o-line training," consists of two steps: 1. Some clustering procedure is used to estimate the centroid ( i ) and spread for each cluster. 2. Using one node per cluster (with xed i ), gradient descent one is invoked to nd weights (w i ). * ::::::::::::::::::::::::::::::::::::::::::::::::::*
26 26 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II 4.5 Polynomial Networks Many practical problems require computing or approximating functions that are polynomials of the input variables, a task that can require many nodes and extensive training if familiar node functions (sigmoids, Gaussians, etc.) are employed. Networks whose node functions allow them to directly compute polynomials and functions of polynomials are referred to as \polynomial networks". The following are some examples. 1. Higher-order networks contain no hidden layers, but have nodes referred to as \higher-order processing units" that perform complex computations. A higherorder node applies a nonlinear function f to a polynomial in the input variables, and the resulting node output is f(w 0 + X j 1 w j1 i j1 + :::+ X j 1 j 2 ::: j k w j1 j 2 ::: j k i j1 i j2 :::i jk )
27 4.5. POLYNOMIAL NETWORKS 27 The LMS training algorithm can be applied to train such a network. However, the number of weights required increases rapidly with the input dimensionality n and polynomial degree k. 2. Sigma-pi networks contain \sigma-pi units" that apply nonlinear functions to sums of weighted products of input variables: f(w 0 + :::+ X j 1 6=j 2 6=j 3 6=j 1 w j1 j 2 j 3 i j1 i j2 i j3 + :::) This model does not allow terms with higher powers of input variables, therefore sigma-pi networks are not universal approximators. 3. \Product units" are nodes that compute products n j=1 ip j i j, where each p j i is an integer whose value is to be determined. Networks may contain these in addition to some \ordinary" nodes that apply sigmoids or step functions to their net input. The learning algorithm must determine both the weights and exponents p ji for product units, in addition to deter-
28 28 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II mining weights for ordinary nodes. Backpropagation training may beused,andlearningisslow. 4. Networks with functional link architecture conduct linear regression, estimating the weights in P j w j j (i), where each \basis function" j is chosen from a predened set of components (such as linear functions, products and sinusoidal functions). 5. Pi-sigma networks contain a hidden layer, in which each hidden node computes w k 0 + P j w k ji j, and the output node computes f Q k(w k 0 + P j w k ji j ) : Weights between the input layer and the hidden layer are adapted during the training process, and can be trained quickly using the LMS algorithm. 6. Ridge-polynomial networks consist of components that generalize Pi-sigma networks, and are trained using an adaptive network construction algorithm. These networks have universal approximation capabilities. * ::::::::::::::::::::::::::::::::::::::::::::::::::*
29 4.6. REGULARIZATION Regularization Regularization involves optimizing a function E +jp j 2, where E is the original cost (or error) function, P is a \stabilizer" that incorporates a priori problem-specic requirements or constraints, and is a constant that controls the relative importance of E and P. Regularization can be implemented explicitly or implicitly. Examples of explicit regularization: Introduction of P = P j w2 j into the cost function penalizes large magnitude weights. A weight decay term may be introduced into the generalized delta rule, so that the weight change w = ;(@E=@w) ; w depends on the negative gradient of the mean square error as well as the current weight. This favors the development ofnetworks with smaller weight magnitudes.
30 30 CHAPTER 4. SUPERVISED LEARNING: MULTILAYER NETWORKS II \Smoothing" penalties may beintroduced into the error function, e.g., terms proportional to j 2 E=wi 2 j and other higher-order derivatives of the error function. Such penalties prevent a network's output function from having very high curvature, and may prevent anetwork from over-specializing on the training data to account for outliers in the data. Examples of implicit regularization: Training data may be perturbed articially, using Gaussian noise. This is equivalent to using a stabilizer that imposes a smoothness constraint on the derivative of the squared error function with respect to the input variables. Adding random noise to the weights imposes a smoothness constraint on the derivatives of the squared error function with respect to the weights. RBF-nets constitute a special case of networks that accomplish regularization.
APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES
APPLICATION OF THE FUZZY MIN-MAX NEURAL NETWORK CLASSIFIER TO PROBLEMS WITH CONTINUOUS AND DISCRETE ATTRIBUTES A. Likas, K. Blekas and A. Stafylopatis National Technical University of Athens Department
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationA Population-Based Learning Algorithm Which Learns Both. Architectures and Weights of Neural Networks y. Yong Liu and Xin Yao
A Population-Based Learning Algorithm Which Learns Both Architectures and Weights of Neural Networks y Yong Liu and Xin Yao Computational Intelligence Group Department of Computer Science University College,
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationAnalysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms
Computer Science Technical Reports Computer Science 995 Analysis of Decision Boundaries Generated by Constructive Neural Network Learning Algorithms ChunHsien Chen Iowa State University R. G. Parekh Iowa
More informationClassifier C-Net. 2D Projected Images of 3D Objects. 2D Projected Images of 3D Objects. Model I. Model II
Advances in Neural Information Processing Systems 7. (99) The MIT Press, Cambridge, MA. pp.949-96 Unsupervised Classication of 3D Objects from D Views Satoshi Suzuki Hiroshi Ando ATR Human Information
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationArtificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5
Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:
More informationNeuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.
Neuro-Remodeling via Backpropagation of Utility K. Wendy Tang and Girish Pingle 1 Department of Electrical Engineering SUNY at Stony Brook, Stony Brook, NY 11794-2350. ABSTRACT Backpropagation of utility
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationThis leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section
An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,
More information4. Feedforward neural networks. 4.1 Feedforward neural network structure
4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationLab 2: Support Vector Machines
Articial neural networks, advanced course, 2D1433 Lab 2: Support Vector Machines March 13, 2007 1 Background Support vector machines, when used for classication, nd a hyperplane w, x + b = 0 that separates
More informationJ. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998
Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationMODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS
MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationLinear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons
Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,
More information3 Nonlinear Regression
CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationof Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial
Accelerated Learning on the Connection Machine Diane J. Cook Lawrence B. Holder University of Illinois Beckman Institute 405 North Mathews, Urbana, IL 61801 Abstract The complexity of most machine learning
More informationConstructively Learning a Near-Minimal Neural Network Architecture
Constructively Learning a Near-Minimal Neural Network Architecture Justin Fletcher and Zoran ObradoviC Abetract- Rather than iteratively manually examining a variety of pre-specified architectures, a constructive
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationThe recursive element of the Upstart algorithm.
The Upstart Algorithm : a method for constructing and training feed-forward neural networks Marcus Frean Department of Physics and Centre for Cognitive Science Edinburgh University, The Kings Buildings
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More information3 Nonlinear Regression
3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationA NEW METHOD FOR MULTILAYER NEURAL NETWORKS. A Thesis. Submitted to the Graduate School. of the University of Notre Dame
A NEW METHOD FOR CONSTRUCTING AND TRAINING MULTILAYER NEURAL NETWORKS A Thesis Submitted to the Graduate School of the University of Notre Dame in Partial Fulllment of the Requirements for the Degree of
More informationReinforcement Control via Heuristic Dynamic Programming. K. Wendy Tang and Govardhan Srikant. and
Reinforcement Control via Heuristic Dynamic Programming K. Wendy Tang and Govardhan Srikant wtang@ee.sunysb.edu and gsrikant@ee.sunysb.edu Department of Electrical Engineering SUNY at Stony Brook, Stony
More informationBasis Functions. Volker Tresp Summer 2017
Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationIEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999 1271 Inverting Feedforward Neural Networks Using Linear and Nonlinear Programming Bao-Liang Lu, Member, IEEE, Hajime Kita, and Yoshikazu
More informationBumptrees for Efficient Function, Constraint, and Classification Learning
umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More information2. Neural network basics
2. Neural network basics Next commonalities among different neural networks are discussed in order to get started and show which structural parts or concepts appear in almost all networks. It is presented
More informationArtificial Neural Networks MLP, RBF & GMDH
Artificial Neural Networks MLP, RBF & GMDH Jan Drchal drchajan@fel.cvut.cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical
More informationCOMPUTATIONAL INTELLIGENCE
COMPUTATIONAL INTELLIGENCE Radial Basis Function Networks Adrian Horzyk Preface Radial Basis Function Networks (RBFN) are a kind of artificial neural networks that use radial basis functions (RBF) as activation
More informationUsing Local Trajectory Optimizers To Speed Up Global. Christopher G. Atkeson. Department of Brain and Cognitive Sciences and
Using Local Trajectory Optimizers To Speed Up Global Optimization In Dynamic Programming Christopher G. Atkeson Department of Brain and Cognitive Sciences and the Articial Intelligence Laboratory Massachusetts
More informationAdaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh
Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,
More informationNeural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)
Chapter 6 Supervised Learning II: Backpropagation and Beyond Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)
More informationPattern Classification Algorithms for Face Recognition
Chapter 7 Pattern Classification Algorithms for Face Recognition 7.1 Introduction The best pattern recognizers in most instances are human beings. Yet we do not completely understand how the brain recognize
More informationIMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM
Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More information1. Approximation and Prediction Problems
Neural and Evolutionary Computing Lab 2: Neural Networks for Approximation and Prediction 1. Approximation and Prediction Problems Aim: extract from data a model which describes either the depence between
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationModel learning for robot control: a survey
Model learning for robot control: a survey Duy Nguyen-Tuong, Jan Peters 2011 Presented by Evan Beachly 1 Motivation Robots that can learn how their motors move their body Complexity Unanticipated Environments
More informationCSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks Part IV 1 Function approximation MLP is both a pattern classifier and a function approximator As a function approximator,
More information2. CNeT Architecture and Learning 2.1. Architecture The Competitive Neural Tree has a structured architecture. A hierarchy of identical nodes form an
Competitive Neural Trees for Vector Quantization Sven Behnke and Nicolaos B. Karayiannis Department of Mathematics Department of Electrical and Computer Science and Computer Engineering Martin-Luther-University
More informationConnectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture
Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture D. S. Phatak and I. Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,
More informationMulti Layer Perceptron with Back Propagation. User Manual
Multi Layer Perceptron with Back Propagation User Manual DAME-MAN-NA-0011 Issue: 1.3 Date: September 03, 2013 Author: S. Cavuoti, M. Brescia Doc. : MLPBP_UserManual_DAME-MAN-NA-0011-Rel1.3 1 INDEX 1 Introduction...
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationIntroduction to Neural Networks: Structure and Training
Introduction to Neural Networks: Structure and Training Qi-Jun Zhang Department of Electronics Carleton University, Ottawa, ON, Canada A Quick Illustration Example: Neural Network Model for Delay Estimation
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationConstructive Feedforward Neural. ASurvey. Tin-Yau Kwok & Dit-Yan Yeung. Hong Kong University of Science and Technology. Clear Water Bay, Kowloon
Constructive Feedforward Neural Networks for Regression Problems: ASurvey Tin-Yau Kwok & Dit-Yan Yeung Technical Report HKUST-CS95-43 September 1995 Department of Computer Science Hong Kong University
More informationLearning via Optimization
Lecture 7 1 Outline 1. Optimization Convexity 2. Linear regression in depth Locally weighted linear regression 3. Brief dips Logistic Regression [Stochastic] gradient ascent/descent Support Vector Machines
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationNeuro-Fuzzy Inverse Forward Models
CS9 Autumn Neuro-Fuzzy Inverse Forward Models Brian Highfill Stanford University Department of Computer Science Abstract- Internal cognitive models are useful methods for the implementation of motor control
More informationContrained K-Means Clustering 1 1 Introduction The K-Means clustering algorithm [5] has become a workhorse for the data analyst in many diverse elds.
Constrained K-Means Clustering P. S. Bradley K. P. Bennett A. Demiriz Microsoft Research Dept. of Mathematical Sciences One Microsoft Way Dept. of Decision Sciences and Eng. Sys. Redmond, WA 98052 Renselaer
More informationConstructive Neural Network Learning Algorithms for Multi-Category Pattern Classification
Computer Science Technical Reports Computer Science 12-5-1995 Constructive Neural Network Learning Algorithms for Multi-Category Pattern Classification Raesh G. Parekh Iowa State University Jihoon Yang
More informationLocally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005
Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationIntroduction to ANSYS DesignXplorer
Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters
More informationCHAPTER VI BACK PROPAGATION ALGORITHM
6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted
More informationNetworks for Control. California Institute of Technology. Pasadena, CA Abstract
Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract
More informationA Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems
Chapter 5 A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems 5.1 Introduction Many researchers have proposed pruning algorithms in numerous ways to optimize
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationData Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.
Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that
More informationIntroduction to Machine Learning
Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning
More informationCenter for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.
Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationIn this assignment, we investigated the use of neural networks for supervised classification
Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric
More informationA B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U
Extending the Power and Capacity of Constraint Satisfaction Networks nchuan Zeng and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 8460 Email: zengx@axon.cs.byu.edu,
More informationTraining Digital Circuits with Hamming Clustering
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 47, NO. 4, APRIL 2000 513 Training Digital Circuits with Hamming Clustering Marco Muselli, Member, IEEE, and Diego
More information6. Backpropagation training 6.1 Background
6. Backpropagation training 6.1 Background To understand well how a feedforward neural network is built and it functions, we consider its basic first steps. We return to its history for a while. In 1949
More informationSoft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an
Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models
More informationResearch on outlier intrusion detection technologybased on data mining
Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development
More informationDESIGN OF NEURAL NETWORKS FOR FAST CONVERGENCE AND ACCURACY: DYNAMICS AND CONTROL
DESIGN OF NEURAL NETWORKS FOR FAST CONVERGENCE AND ACCURACY: DYNAMICS AND CONTROL Peiman G. Maghami and Dean W. Sparks, Jr. Abstract-A procedure for the design and training of artificial neural networks,
More informationExercise: Training Simple MLP by Backpropagation. Using Netlab.
Exercise: Training Simple MLP by Backpropagation. Using Netlab. Petr Pošík December, 27 File list This document is an explanation text to the following script: demomlpklin.m script implementing the beckpropagation
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationDr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically
Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,
More informationNonparametric Regression
Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis
More informationAccelerometer Gesture Recognition
Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationRichard S. Zemel 1 Georey E. Hinton North Torrey Pines Rd. Toronto, ONT M5S 1A4. Abstract
Developing Population Codes By Minimizing Description Length Richard S Zemel 1 Georey E Hinton University oftoronto & Computer Science Department The Salk Institute, CNL University oftoronto 0 North Torrey
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationThe task of inductive learning from examples is to nd an approximate definition
1 Initializing Neural Networks using Decision Trees Arunava Banerjee 1.1 Introduction The task of inductive learning from examples is to nd an approximate definition for an unknown function f(x), given
More information