Multi Layer Perceptron trained by Quasi Newton learning rule

Size: px
Start display at page:

Download "Multi Layer Perceptron trained by Quasi Newton learning rule"

Transcription

1 Multi Layer Perceptron trained by Quasi Newton learning rule Feed-forward neural networks provide a general framework for representing nonlinear functional mappings between a set of input variables and a set of output variables (Bishop 2006). One can achieve this goal by representing the nonlinear function of many variables by a composition of non-linear activation functions of one variable: = () () (1) A Multi-Layer Perceptron may be represented by a graph: the input layer (x i ) is made of a number of perceptrons equal to the number of input variables (d); the output layer, on the other hand, will have as many neurons as the output variables (K). The network may have an arbitrary number of hidden layers (in most cases one) which in turn may have an arbitrary number of perceptrons (M). In a fully connected feed-forward network each node of a layer is connected to all the nodes in the adjacent layers. Each connection is represented by an adaptive weight which represents the strength of the synaptic connection between neurons (w kj (l) ). The response of each perceptron to the inputs is represented by a non-linear function g, referred to as the activation function. Notice that the above equation assumes a linear activation function for neurons in the output layer. We shall refer to the topology of an MLP and to the weights matrix of its connections as to the model. In order to find the model that best fits the data, one has to provide the network with a set of examples: the training phase thus requires the KB, i.e. the training set. The learning rule of our MLP is the Quasi Newton Algorithm (QNA). In general Quasi Newton Algorithms (QNA) are variable metric methods used to find local maxima and minima of functions (Davidon 1968) and, in the case of MLP s they can be used to find the stationary (i.e. the zero gradient) point of the learning function. The Newton method is the general basis for a whole family of so called Quasi Newton methods. One of those methods, implemented here is the L-BFGS algorithm (Byrd et al. 1994, Broyden C. G. 1970, Fletcher R. 1970, Goldfarb D. 1970, Shanno D. F. 1970). More rigorously, the QNA is an optimization of learning rule, also because, as described below, the implementation is based on a statistical approximation of the Hessian by cyclic gradient calculation, that, as said in the previous section, is at the base of Back Propagation (BP; Bishop 2006) method. As known, the classical Newton method uses the Hessian of a function. The step of the method is defined as a product of an inverse Hessian matrix and a function gradient. If the function is a positive definite quadratic form, we can reach the function minimum in one step. In case of an indefinite quadratic form (which has no minimum), we will reach the maximum or saddle point. In short, the method finds the stationary point of a quadratic form. In practice, we usually have functions which are not quadratic forms. If such a function is smooth, it is sufficiently good described by a quadratic form in the minimum neighborhood. However, the Newton method can converge both to a minimum and a maximum (taking a step into the direction of a function increasing). Quasi Newton methods solve this problem as follows: they use a positive definite approximation instead of a Hessian. If Hessian is positive definite, we make the step using the Newton method. If Hessian is indefinite, we modify it to make it positive definite, and then perform a step using the Newton method. The step is always performed in the direction of the function decrement. In case of a positive definite Hessian, we use it to generate a quadratic surface approximation. This should make the convergence better. If Hessian is indefinite, we just move to where function decreases. Some modifications of Quasi Newton methods perform a precise linear minimum search along the indicated line, but it is proved that it is enough to sufficiently decrease the function value, and not necessary to find a precise minimum value. The L-BFGS algorithm tries to perform a step using the Newton method. If it does not lead to a function value decreasing, it lessens the step length to find a lesser function value. Up to here it seems quite simple but it is not! The Hessian of a function isn't always available and in many cases is too much complicated. More often we can only calculate the function gradient. Therefore, the following operation is used: the Hessian of a function is generated on the basis of the N consequent gradient calculations, and the Quasi Newton step is performed.

2 There is a special formulas which allows to iteratively get a Hessian approximation. On each approximation step, the matrix remains positive definite. The algorithm L-BFGS does not generate the Hessian, but directly its inverse matrix, so we don't have to waste time to invert the Hessian. In order to better understand the process at the base of Quasi Newton method, let s start from the classical Gradient Descent Algorithm (Back Propagation, Bishop 2006). By using the standard GDA, the direction of each updating step is calculated through the error descent gradient, while the length is determined by the learning rate. A more sophisticated approach could be to move towards the negative direction of the gradient (line search direction) not by a fixed length, but up to reach the minimum of the function along that direction. This is possible by calculating the descent gradient and analyzing it with the variation of the learning rate (Brescia 2012). Let suppose that at step t the current weight vector is w(t ) and consider a search direction d(t) = E(t). If we select the parameter λ in order to minimize ()= () + (). The new weight vector can be then expressed as: () = () + () (2) The problem of line search is in practice a single dimension minimization problem. A simple solution could be to move E(λ) by varying λ in small intervals, to evaluate the error function at each new position and to stop when the error starts to decrease. There exist many other methods to solve this problem. For example the parabolic search of a minimum calculates the parabolic curve crossing pre-defined learning rate points. The minimum d of the parabolic curve is a good approximation of the minimum of E(λ) and it can be reached by considering the parabolic curve crossing the fixed points with the lowest error values. There are also the trust region based strategies to find a minimum of an error function, which main concept is to iteratively growing or contracting the region of the function by adjusting a quadratic model function which better approximates the error function. In this sense this technique is considered dual to line search, because it tries to find the best size of the region by preliminarily fixing the moving step (the opposite of the line search strategy that always chooses the step direction before to select the step size), (Celis et al. 1985). Up to now we have supposed that the optimal search direction for the method based on the line search is given at each step by the negative gradient. That s not always true! If the minimization is done along the negative gradient, next search direction (the new gradient) will be orthogonal to the previous one. In fact, note that when the line search founds the minimum, then we have: (() + () )=0 (3) and hence, () () =0 (4) where g(t + 1) E(t + 1). By selecting further directions equal to the negative gradient, there should be obtained some oscillations on the error function that slow down the convergence process. The solution could be to select further more directions such that the gradient component, parallel to the previous search direction (that is zero), remains unchanged at each step. Let supposed to have already minimized in respect of the direction d(t) starting from the point w(t ) and reaching the point w(t + 1). In the point w(t + 1) the (4) is ( () ) () =0 and by choosing () to preserve the gradient component parallel to () equal to zero, it is possible to build a sequence of directions d in such a way that each direction is conjugated to the previous on the dimension w of the search space (conjugate gradients method), (Golub et al. 1999).

3 In presence of a square error function, an algorithm with such technique has a weight update of the form: () = () +! () () (5) with! () = (#)$ % (#) (#)$ &(#) (6) Furthermore, d can be obtained for the first time by the negative gradient and then as linear combination of the current gradient and of the previous search directions: () = () +' () () (7) with ' () = %(#())$ & (#) (#)$ &(#) (8) This algorithm founds the minimum of a square error function in almost w steps. On the contrary, the computational cost of each step is high, because in order to determine the values of α and β, we have to refer to the hessian matrix Η, highly expensive in terms of calculations. But fortunately, the coefficients α and β can be obtained from analytical expressions that do not use the Hessian matrix explicitly. For example the term β can be calculated in one of the following ways: 1) expression of Polak-Ribiere: ' () = %(#())$ (% (#()) *% (#) ) % (#)$ % (#) 2) expression of Hestenes-Sitefel: ' () = %(#())$ (% (#()) *% (#) ) (#)$ (% (#()) *% (#) ) 3) expression of Fletcher-Reeves: ' () = %(#())$ % (#()) % (#)$ % (#) These expressions are equivalent if the error function is square-typed, otherwise they assume different values. Typically the Polak-Ribiere equation obtains better results, because, if the algorithm is slow and the consequent gradients are quite similar between them, this equation produces values of β such that the search direction tends to assume the negative gradient direction (Vetterling et al. 1992). And this corresponds to a restart of the procedure. Concerning the parameter α, its value can be obtained by using the line search method directly. The method of conjugate gradients reduces the number of steps to minimize the error up to a maximum of w because there could be almost w conjugate directions in a w -dimensional space. In practice however, the algorithm is slower because, during the learning process, the property conjugate of the search directions tend to deteriorate. It is useful, to avoid the deterioration, to restart the algorithm after steps, by resetting the search direction with the negative gradient direction. By using a local square approximation of the error function, we can obtain an expression for the minimum position. The gradient in every point w is in fact given by: E = H (w w*) (9) where w corresponds to the minimum of the error function, which satisfies the condition: w = w H 1 E (10)

4 The vector H 1 E is known as Newton direction and it is the base for a variety of optimization strategies, such as for instance the QNA which instead of calculating the H matrix and then its inverse, uses a series of intermediate steps of lower computational cost to generate a sequence of matrices which are more and more accurate approximations of H 1. From the Newton formula (10) we note that the weight vectors on steps t and t+1 are correlated to the correspondent gradients by the formula: () () =. * () () (11) known as Quasi Newton Condition. The approximation G is therefore built to satisfy this condition. The formula for G is: / () =/ () + 00$ 2 (#) 11 $ 2 (#) +3 / () $ 1 1 $ 2 (#) (12) 1 where the vectors are 5= () (), 3= () () and 4= /() 36 3 / () (13) 3 By using the identity matrix to initialize the procedure is equivalent to consider, step by step, the direction of the negative gradient while, at each next step, the direction Gg is for sure a descent direction. The above expression could carry the search out of the interval of validity for the squared approximation. The solution is hence to use the line search to found the minimum of function along the search direction. By using such system, the weight updating expression (5) would be formulated as follows: () = () +! () / () () (14) where α is obtained by the line search. The following algorithm shows the MLP trained by QNA method. Let us consider a generic MLP with () the weight vector at time (t). 1) Initialize all weights () with small random values (typically normalized in [-1, 1]), set constant 7, set t = 0 and / () =8; 2) Present to the network all training set and calculate ( () ) as the error function for the current weight configuration; 3) If t=0 a) then () = () b) else () = / (*) (*) 4) Calculate () = ()! () where! is obtained by line search expression (6); 5) Calculate G () with equation (12); 6) If () >7 then t=t+1 and goto 2, else STOP One of main advantage of QNA, compared with conjugate gradients, is that the line search does not require the calculation of α with an high precision, because it is not a critical parameter. On the contrary, the downside is that it requires a big amount of memory to calculate the matrix G w w for large w. One way to reduce the required memory is to replace at each step the matrix G with a unitary matrix. With such replacement and multiplying by g (the current gradient), we obtain:

5 () = () +;5+<3 (15) Note that if the line search returns exact values, then the above equation produces mutually conjugate directions. A and B are scalar values defined as ;= 1+ 1$ 1 % (#()) 0 $ 1 0$ + 1$ % (#()) and <= 0$ % (#()) 0 $ 1 0 $ 1 0 $ 1 (16) As discussed we use a slightly modified version of the QNA, known as L-QNA or Limited memory QNA (L-BFGS, Nocedal 1980). The algorithm of MLP with L-QNA is the following: Let us consider a generic MLP with () the weight vector at time (t). 1) Initialize all weights () with small random values (typically normalized in [-1, 1]), set constant 7, set t = 0; 2) Present to the network all training set and calculate ( () ) as the error function for the current weight configuration; 3) If t=0 a) then () = () b) else () = (*) +;5+<3, where 5= () (), 3= () () 4) Calculate () = ()! () where! is obtained by line search equation (6); 5) Calculate A and B for the next iteration, as reported in (16); 6) If () >7 then t=t+1 and goto 2, else STOP Note that for approximate values of alpha the algorithm works well anyway. During the exploration of the parameter space, in order to find the minimum error direction, QNA starts in the wrong direction. This direction is chosen because at the first step the method has to follow the error gradient and so it takes the direction of steepest descent. However, in subsequent steps, it incorporates information from the gradient at the steps taken to build up an approximate model of the Hessian. As known, all line search methods, being based on techniques searching the minimum error by exploring the error function surface, are likely to get stuck in a local minimum. Along the research in the field many solutions have been proposed (Floudas and Jongen 2005). By incorporating a random component into the weight updating is one general way to escape the local minimum. Also Genetic Algorithms have been employed to deal with this problem, by proceedings through multiple initial weight settings and recombining trained weights during the process (Fu 1994). But the cost of both approaches is the prolonged duration time of training. In order to accelerate the convergence of GDA, Newton s method uses the information on the second-order derivatives. QNA is able to better optimize the convergence time by approximating secondorder information with first-order terms (Shanno 1990). By having the information of the second derivatives, QNA is able to avoid local minima of the error function and to be more precise in the error function trend follow-up, revealing a natural capability to find the absolute minimum error of the optimization problem. However this last feature could be a downside of the model, especially when the signal-to-noise ratio of data is very poor. But with clean data, such as in presence of high quality spectroscopic redshifts, used for model training, the QNA performances result extremely precise. In the L-BFGS version of the algorithm, in case of big dimensions, the amount of memory required to store a Hessian is too big, along with the machine time required to process it. Therefore, instead of using a complete number of gradient values to generate a Hessian, we can use a smaller number of values. On the one hand, the convergence slows down. On the other hand, the performance could even grow up. At first sight, this

6 statement seems to be paradoxical. But it contains no contradictions: the convergence is measured by a number of iterations, whereas the performance depends on the number of processor's time units spent to calculate the result. Related to the computational cost there is also the strategy adopted in terms of stopping criteria of the method. As known, the process of adjusting the weights based on the gradients is repeated until a minimum is reached. In practice, one has to decide the stopping condition of the algorithm. More in general, there are several criteria. Among them the most used are: (i) the algorithm could be terminated after the gradient is sufficiently small (by definition the gradient will be zero at a minimum); (ii) based on the error to be minimized, in terms of a fixed threshold; (iii) based on the cross validation. The cross validation can be used to monitor generalization performance during training and to terminate the algorithm when there is no more improvement. The basic mechanism consists into dividing data in a train and test set. The network is trained on the training set and its performances are evaluated on the test set. Statistically significant results come out by trying multiple independent data partitions and averaging the performance. The first two criteria mentioned above are mainly sensitive to the choice of specific parameters and may lead to poor results if the parameters are improperly set. The cross validation do not suffer of such drawback. It can avoid overfitting the data and is able to improve the generalization performance of the model. However it is much more computationally expensive. REFERENCES Bishop, C.M., Pattern Recognition and Machine Learning, 2006, Springer ISBN Brescia, M., New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases. 2012, Contribution to the Volume Horizons in Computer Science Research, Thomas S. Clary (eds.), Series Horizons in Computer Science Vol. 7, Nova Science Publishers, ISBN: Broyden, C. G., The convergence of a class of double-rank minimization algorithms. 1970, Journal of the Institute of Mathematics and Its Applications, Vol. 6, pp Byrd, R.H et al., 1994, Mathematical Programming, 63, 4, pp Celis, M.; Dennis, J. E.; Tapia, R. A., A trust region strategy for nonlinear equality constrained optimization. 1985, in Numerical Optimization, P. Boggs, R. Byrd and R. Schnabel eds, SIAM, Philadelphia USA, pp Davidon, W.C., Comput. J. 10, 406 (1968) Fletcher, R., A New Approach to Variable Metric Algorithms. 1970, Computer Journal, Vol. 13, pp Floudas, C. A.; Jongen, H. Th., Global Optimization: Local Minima and Transition Points. 2005, Journal of Global Optimization, Vol. 32, Number 3, Fu, Limin, Neural Networks in Computer Intelligence. 1994, E.M. Munson and L. Goldberg Editors, McGraw-Hill NY Goldfarb, D., A Family of Variable Metric Updates Derived by Variational Means. 1970, Mathematics of Computation, Vol. 24, pp Golub, G.H.; Ye, Q., Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration. 1999, SIAM Journal of Scientific Computation, Vol. 21, pp

7 Nocedal, J., Updating Quasi-Newton Matrices with Limited Storage. 1980, Mathematics of Computation, Vol. 35, pp Shanno, D. F., Conditioning of quasi-newton methods for function minimization. 1970, Mathematics of Computation, Vol. 24, pp Shanno, D.F., Recent Advances in Numerical Techniques for large-scale optimization. 1990, in Neural Networks for Control, MIT Press, Cambridge MA Vetterling, T.; Flannery, B.P., Conjugate Gradients Methods in Multidimensions. 1992, Numerical Recipes in C - The Art of Scientific Computing, W. H. Press and S. A. Teukolsky Eds, Cambridge University Press; 2nd edition.

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.0 Date: July 28, 2011 Author: M. Brescia, S. Riccardi Doc. : BetaRelease_Model_MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.0

More information

Multi Layer Perceptron trained by Quasi Newton Algorithm

Multi Layer Perceptron trained by Quasi Newton Algorithm Multi Layer Perceptron trained by Quasi Newton Algorithm MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.2 Author: M. Brescia, S. Riccardi Doc. : MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.2 1 Index 1 Introduction...

More information

Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network

Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network MLPQNA/LEMON User Manual DAME-MAN-NA-0015 Issue: 1.3 Author: M. Brescia, S. Riccardi Doc. : MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.3

More information

Classical Gradient Methods

Classical Gradient Methods Classical Gradient Methods Note simultaneous course at AMSI (math) summer school: Nonlin. Optimization Methods (see http://wwwmaths.anu.edu.au/events/amsiss05/) Recommended textbook (Springer Verlag, 1999):

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani Conjugate Direction Methods 2 Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov:

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Experimental Data and Training

Experimental Data and Training Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion

More information

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient Optimization Last time Root finding: definition, motivation Algorithms: Bisection, false position, secant, Newton-Raphson Convergence & tradeoffs Example applications of Newton s method Root finding in

More information

10.7 Variable Metric Methods in Multidimensions

10.7 Variable Metric Methods in Multidimensions 10.7 Variable Metric Methods in Multidimensions 425 *fret=dbrent(ax,xx,bx,f1dim,df1dim,tol,&xmin); for (j=1;j

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

arxiv: v1 [cs.cv] 2 May 2016

arxiv: v1 [cs.cv] 2 May 2016 16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer

More information

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer

David G. Luenberger Yinyu Ye. Linear and Nonlinear. Programming. Fourth Edition. ö Springer David G. Luenberger Yinyu Ye Linear and Nonlinear Programming Fourth Edition ö Springer Contents 1 Introduction 1 1.1 Optimization 1 1.2 Types of Problems 2 1.3 Size of Problems 5 1.4 Iterative Algorithms

More information

A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines

A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines A projected Hessian matrix for full waveform inversion Yong Ma and Dave Hale, Center for Wave Phenomena, Colorado School of Mines SUMMARY A Hessian matrix in full waveform inversion (FWI) is difficult

More information

CHAPTER VI BACK PROPAGATION ALGORITHM

CHAPTER VI BACK PROPAGATION ALGORITHM 6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted

More information

Newton and Quasi-Newton Methods

Newton and Quasi-Newton Methods Lab 17 Newton and Quasi-Newton Methods Lab Objective: Newton s method is generally useful because of its fast convergence properties. However, Newton s method requires the explicit calculation of the second

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD

A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD 1 st International Conference From Scientific Computing to Computational Engineering 1 st IC SCCE Athens, 8 10 September, 2004 c IC SCCE A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE

More information

Tested Paradigm to Include Optimization in Machine Learning Algorithms

Tested Paradigm to Include Optimization in Machine Learning Algorithms Tested Paradigm to Include Optimization in Machine Learning Algorithms Aishwarya Asesh School of Computing Science and Engineering VIT University Vellore, India International Journal of Engineering Research

More information

Recapitulation on Transformations in Neural Network Back Propagation Algorithm

Recapitulation on Transformations in Neural Network Back Propagation Algorithm International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 4 (2013), pp. 323-328 International Research Publications House http://www. irphouse.com /ijict.htm Recapitulation

More information

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points

More information

Constrained and Unconstrained Optimization

Constrained and Unconstrained Optimization Constrained and Unconstrained Optimization Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Oct 10th, 2017 C. Hurtado (UIUC - Economics) Numerical

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Optimization. there will solely. any other methods presented can be. saved, and the. possibility. the behavior of. next point is to.

Optimization. there will solely. any other methods presented can be. saved, and the. possibility. the behavior of. next point is to. From: http:/ //trond.hjorteland.com/thesis/node1.html Optimization As discussed briefly in Section 4.1, the problem we are facing when searching for stationaryy values of the action given in equation (4.1)

More information

Dynamic Analysis of Structures Using Neural Networks

Dynamic Analysis of Structures Using Neural Networks Dynamic Analysis of Structures Using Neural Networks Alireza Lavaei Academic member, Islamic Azad University, Boroujerd Branch, Iran Alireza Lohrasbi Academic member, Islamic Azad University, Boroujerd

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

A large number of user subroutines and utility routines is available in Abaqus, that are all programmed in Fortran. Subroutines are different for

A large number of user subroutines and utility routines is available in Abaqus, that are all programmed in Fortran. Subroutines are different for 1 2 3 A large number of user subroutines and utility routines is available in Abaqus, that are all programmed in Fortran. Subroutines are different for implicit (standard) and explicit solvers. Utility

More information

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles INTERNATIONAL JOURNAL OF MATHEMATICS MODELS AND METHODS IN APPLIED SCIENCES Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles M. Fernanda P.

More information

10.6 Conjugate Gradient Methods in Multidimensions

10.6 Conjugate Gradient Methods in Multidimensions 420 Chapter 10. Minimization or Maximization of Functions CITED REFERENCES AND FURTHER READING: Brent, R.P. 1973, Algorithms for Minimization without Derivatives (Englewood Cliffs, NJ: Prentice- Hall),

More information

Solving IK problems for open chains using optimization methods

Solving IK problems for open chains using optimization methods Proceedings of the International Multiconference on Computer Science and Information Technology pp. 933 937 ISBN 978-83-60810-14-9 ISSN 1896-7094 Solving IK problems for open chains using optimization

More information

Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method

Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method Accelerating the Hessian-free Gauss-Newton Full-waveform Inversion via Preconditioned Conjugate Gradient Method Wenyong Pan 1, Kris Innanen 1 and Wenyuan Liao 2 1. CREWES Project, Department of Geoscience,

More information

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING DAVID G. LUENBERGER Stanford University TT ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California London Don Mills, Ontario CONTENTS

More information

A Study on the Optimization Methods for Optomechanical Alignment

A Study on the Optimization Methods for Optomechanical Alignment A Study on the Optimization Methods for Optomechanical Alignment Ming-Ta Yu a, Tsung-Yin Lin b *, Yi-You Li a, and Pei-Feng Shu a a Dept. of Mech. Eng., National Chiao Tung University, Hsinchu 300, Taiwan,

More information

APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING

APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING APPLIED OPTIMIZATION WITH MATLAB PROGRAMMING Second Edition P. Venkataraman Rochester Institute of Technology WILEY JOHN WILEY & SONS, INC. CONTENTS PREFACE xiii 1 Introduction 1 1.1. Optimization Fundamentals

More information

Numerical Method in Optimization as a Multi-stage Decision Control System

Numerical Method in Optimization as a Multi-stage Decision Control System Numerical Method in Optimization as a Multi-stage Decision Control System B.S. GOH Institute of Mathematical Sciences University of Malaya 50603 Kuala Lumpur MLYSI gohoptimum@gmail.com bstract: - Numerical

More information

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization 6th WSEAS International Conference on SYSTEM SCIENCE and SIMULATION in ENGINEERING, Venice, Italy, November 21-23, 2007 18 Performance Evaluation of an Interior Point Filter Line Search Method for Constrained

More information

Logistic Regression

Logistic Regression Logistic Regression ddebarr@uw.edu 2016-05-26 Agenda Model Specification Model Fitting Bayesian Logistic Regression Online Learning and Stochastic Optimization Generative versus Discriminative Classifiers

More information

Training of Neural Networks. Q.J. Zhang, Carleton University

Training of Neural Networks. Q.J. Zhang, Carleton University Training of Neural Networks Notation: x: input of the original modeling problem or the neural network y: output of the original modeling problem or the neural network w: internal weights/parameters of

More information

Optimization. (Lectures on Numerical Analysis for Economists III) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 20, 2018

Optimization. (Lectures on Numerical Analysis for Economists III) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 20, 2018 Optimization (Lectures on Numerical Analysis for Economists III) Jesús Fernández-Villaverde 1 and Pablo Guerrón 2 February 20, 2018 1 University of Pennsylvania 2 Boston College Optimization Optimization

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

Adaptive Regularization. in Neural Network Filters

Adaptive Regularization. in Neural Network Filters Adaptive Regularization in Neural Network Filters Course 0455 Advanced Digital Signal Processing May 3 rd, 00 Fares El-Azm Michael Vinther d97058 s97397 Introduction The bulk of theoretical results and

More information

Lecture 6 - Multivariate numerical optimization

Lecture 6 - Multivariate numerical optimization Lecture 6 - Multivariate numerical optimization Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University February 13, 2014 1 / 36 Table of Contents 1 Plotting functions of two variables

More information

Neural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018

Neural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018 Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2018 1 Story so far Neural networks are universal approximators Can model any odd thing Provided they have the right architecture We must

More information

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited. page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5

More information

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods.

25. NLP algorithms. ˆ Overview. ˆ Local methods. ˆ Constrained optimization. ˆ Global methods. ˆ Black-box methods. CS/ECE/ISyE 524 Introduction to Optimization Spring 2017 18 25. NLP algorithms ˆ Overview ˆ Local methods ˆ Constrained optimization ˆ Global methods ˆ Black-box methods ˆ Course wrap-up Laurent Lessard

More information

For Monday. Read chapter 18, sections Homework:

For Monday. Read chapter 18, sections Homework: For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

DAta Mining Exploration Project

DAta Mining Exploration Project Pag. 1 of 31 DAta Mining Exploration Project General Purpose Multi Layer Perceptron Neural Network (trained by Back Propagation & Quasi-Newton) Data Mining Model User Manual DAME-MAN-NA-0008 Doc. : Issue:

More information

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Research on Evaluation Method of Product Style Semantics Based on Neural Network Research Journal of Applied Sciences, Engineering and Technology 6(23): 4330-4335, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: September 28, 2012 Accepted:

More information

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R. Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric

More information

Efficient training algorithms for a class of shunting inhibitory convolutional neural networks

Efficient training algorithms for a class of shunting inhibitory convolutional neural networks University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 Efficient training algorithms for a class of shunting inhibitory

More information

Available online Journal of Scientific and Engineering Research, 2017, 4(5):1-6. Research Article

Available online   Journal of Scientific and Engineering Research, 2017, 4(5):1-6. Research Article Available online www.jsaer.com, 2017, 4(5):1-6 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR Through the Analysis of the Advantages and Disadvantages of the Several Methods of Design of Infinite

More information

CS281 Section 3: Practical Optimization

CS281 Section 3: Practical Optimization CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical

More information

A Compensatory Wavelet Neuron Model

A Compensatory Wavelet Neuron Model A Compensatory Wavelet Neuron Model Sinha, M., Gupta, M. M. and Nikiforuk, P.N Intelligent Systems Research Laboratory College of Engineering, University of Saskatchewan Saskatoon, SK, S7N 5A9, CANADA

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,

More information

Full waveform inversion by deconvolution gradient method

Full waveform inversion by deconvolution gradient method Full waveform inversion by deconvolution gradient method Fuchun Gao*, Paul Williamson, Henri Houllevigue, Total), 2012 Lei Fu Rice University November 14, 2012 Outline Introduction Method Implementation

More information

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS

MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS MODIFIED KALMAN FILTER BASED METHOD FOR TRAINING STATE-RECURRENT MULTILAYER PERCEPTRONS Deniz Erdogmus, Justin C. Sanchez 2, Jose C. Principe Computational NeuroEngineering Laboratory, Electrical & Computer

More information

Laboratory exercise. Laboratory experiment OPT-1 Nonlinear Optimization

Laboratory exercise. Laboratory experiment OPT-1 Nonlinear Optimization Fachgebiet Simulation und Optimale Prozesse Fakultät für Informatik und Automatisierung Institut für Automatisierungsund Systemtechnik Laboratory exercise Laboratory experiment OPT-1 Nonlinear Optimization

More information

Accelerating the convergence speed of neural networks learning methods using least squares

Accelerating the convergence speed of neural networks learning methods using least squares Bruges (Belgium), 23-25 April 2003, d-side publi, ISBN 2-930307-03-X, pp 255-260 Accelerating the convergence speed of neural networks learning methods using least squares Oscar Fontenla-Romero 1, Deniz

More information

Application of Neural Networks to b-quark jet detection in Z b b

Application of Neural Networks to b-quark jet detection in Z b b Application of Neural Networks to b-quark jet detection in Z b b Stephen Poprocki REU 25 Department of Physics, The College of Wooster, Wooster, Ohio 44691 Advisors: Gustaaf Brooijmans, Andy Haas Nevis

More information

Solving Optimization and Inverse Problems in Remote Sensing by using Evolutionary Algorithms

Solving Optimization and Inverse Problems in Remote Sensing by using Evolutionary Algorithms Technical University Munich Faculty for civil engineering and land surveying Remote Sensing Technology Prof. Dr.-Ing. Richard Bamler Solving Optimization and Inverse Problems in Remote Sensing by using

More information

Numerical Optimization: Introduction and gradient-based methods

Numerical Optimization: Introduction and gradient-based methods Numerical Optimization: Introduction and gradient-based methods Master 2 Recherche LRI Apprentissage Statistique et Optimisation Anne Auger Inria Saclay-Ile-de-France November 2011 http://tao.lri.fr/tiki-index.php?page=courses

More information

Inversion after depth imaging

Inversion after depth imaging Robin P. Fletcher *, Stewart Archer, Dave Nichols, and Weijian Mao, WesternGeco Summary In many areas, depth imaging of seismic data is required to construct an accurate view of the reservoir structure.

More information

An Asynchronous Implementation of the Limited Memory CMA-ES

An Asynchronous Implementation of the Limited Memory CMA-ES An Asynchronous Implementation of the Limited Memory CMA-ES Viktor Arkhipov, Maxim Buzdalov, Anatoly Shalyto ITMO University 49 Kronverkskiy prosp. Saint-Petersburg, Russia, 197101 Email: {arkhipov, buzdalov}@rain.ifmo.ru,

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest

More information

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Regression

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

Characterizing Improving Directions Unconstrained Optimization

Characterizing Improving Directions Unconstrained Optimization Final Review IE417 In the Beginning... In the beginning, Weierstrass's theorem said that a continuous function achieves a minimum on a compact set. Using this, we showed that for a convex set S and y not

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Quantitative Macroeconomics Raül Santaeulàlia-Llopis MOVE-UAB and Barcelona GSE Fall 2018 Raül Santaeulàlia-Llopis (MOVE-UAB,BGSE) QM: Numerical Optimization Fall 2018 1 / 46 1 Introduction

More information

XFSL: A TOOL FOR SUPERVISED LEARNING OF FUZZY SYSTEMS

XFSL: A TOOL FOR SUPERVISED LEARNING OF FUZZY SYSTEMS XFSL: A TOOL FOR SUPERVISED LEARNING OF FUZZY SYSTEMS F. J. Moreno Velo I. Baturone S. Sánchez Solano A. Barriga Instituto de Microelectrónica de Sevilla - Centro Nacional de Microelectrónica Avda. Reina

More information

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S.

A Novel Technique for Optimizing the Hidden Layer Architecture in Artificial Neural Networks N. M. Wagarachchi 1, A. S. American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

A Survey of Basic Deterministic, Heuristic, and Hybrid Methods for Single-Objective Optimization and Response Surface Generation

A Survey of Basic Deterministic, Heuristic, and Hybrid Methods for Single-Objective Optimization and Response Surface Generation Orlande/Thermal Measurements and Inverse Techniques K12031_C010 Page Proof page 355 21.12.2010 4:56am Compositor Name: PG1421 10 A Survey of Basic Deterministic, Heuristic, and Hybrid Methods for Single-Objective

More information

THE NEURAL NETWORKS: APPLICATION AND OPTIMIZATION APPLICATION OF LEVENBERG-MARQUARDT ALGORITHM FOR TIFINAGH CHARACTER RECOGNITION

THE NEURAL NETWORKS: APPLICATION AND OPTIMIZATION APPLICATION OF LEVENBERG-MARQUARDT ALGORITHM FOR TIFINAGH CHARACTER RECOGNITION International Journal of Science, Environment and Technology, Vol. 2, No 5, 2013, 779 786 ISSN 2278-3687 (O) THE NEURAL NETWORKS: APPLICATION AND OPTIMIZATION APPLICATION OF LEVENBERG-MARQUARDT ALGORITHM

More information

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism) Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent

More information

Search direction improvement for gradient-based optimization problems

Search direction improvement for gradient-based optimization problems Computer Aided Optimum Design in Engineering IX 3 Search direction improvement for gradient-based optimization problems S Ganguly & W L Neu Aerospace and Ocean Engineering, Virginia Tech, USA Abstract

More information

Multivariate Numerical Optimization

Multivariate Numerical Optimization Jianxin Wei March 1, 2013 Outline 1 Graphics for Function of Two Variables 2 Nelder-Mead Simplex Method 3 Steepest Descent Method 4 Newton s Method 5 Quasi-Newton s Method 6 Built-in R Function 7 Linear

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION 6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm

More information

Conditional Random Fields for Word Hyphenation

Conditional Random Fields for Word Hyphenation Conditional Random Fields for Word Hyphenation Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu February 12,

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

B553 Lecture 12: Global Optimization

B553 Lecture 12: Global Optimization B553 Lecture 12: Global Optimization Kris Hauser February 20, 2012 Most of the techniques we have examined in prior lectures only deal with local optimization, so that we can only guarantee convergence

More information

Solving for dynamic user equilibrium

Solving for dynamic user equilibrium Solving for dynamic user equilibrium CE 392D Overall DTA problem 1. Calculate route travel times 2. Find shortest paths 3. Adjust route choices toward equilibrium We can envision each of these steps as

More information

AN UNCONSTRAINED NONLINEAR OPTIMIZATION SOLVER: A USER S GUIDE

AN UNCONSTRAINED NONLINEAR OPTIMIZATION SOLVER: A USER S GUIDE AN UNONSTRAINED NONLINEAR OPTIMIZATION SOLVER: A USER S GUIDE 1. Introduction This modular software package has been designed to solve the unconstrained nonlinear optimization problem. This problem entails

More information

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS ITERATIVE LIEAR SOLVERS. Objectives The goals of the laboratory workshop are as follows: to learn basic properties of iterative methods for solving linear least squares problems, to study the properties

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Lecture 5: Optimization of accelerators in simulation and experiments. X. Huang USPAS, Jan 2015

Lecture 5: Optimization of accelerators in simulation and experiments. X. Huang USPAS, Jan 2015 Lecture 5: Optimization of accelerators in simulation and experiments X. Huang USPAS, Jan 2015 1 Optimization in simulation General considerations Optimization algorithms Applications of MOGA Applications

More information

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft

More information

Radial Basis Function Neural Network Classifier

Radial Basis Function Neural Network Classifier Recognition of Unconstrained Handwritten Numerals by a Radial Basis Function Neural Network Classifier Hwang, Young-Sup and Bang, Sung-Yang Department of Computer Science & Engineering Pohang University

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information