Training of Neural Networks. Q.J. Zhang, Carleton University

Size: px
Start display at page:

Download "Training of Neural Networks. Q.J. Zhang, Carleton University"

Transcription

1 Training of Neural Networks

2 Notation: x: input of the original modeling problem or the neural network y: output of the original modeling problem or the neural network w: internal weights/parameters of the neural network m: number of outputs of the model y = f(x, w) : neural network model d: data for y (e.g., trainining data)

3 Define Model Input-Output Define model input-output (x, y), for example, x: physical/geometrical parameters of the component y: S-parameters of the component

4 Data Generation: (a)generate (x,y) samples: ( x k, yk ), k = 1, 2,, P, such that the finished NN best (accurately) represent original x~y problem (b)data generator Measurement : for each given xk, measure values of yk, k=1,2,, p Simulation: for each given x k, use a simulator to calculate, k=1,2,, p y k

5 Comparison of Neural Network Based Microwave Model Development Using Data from Two Types of Data Generators Basis of Comparison Neural Model Development Using Measurement Data Neural Model Development Using Sim ulation D ata A vailability of Problem Theory-Equations Model can be developed even if the theory-equations are not known, or difficult to im plem ent in CAD. Model can be developed only for the problem s that have theory that is implemented in a simulator. Assum ptions Input Parameter Sweep N o assum ptions are involved and the model could include all the effects, e.g., 3D-fullwave effects, fringing effects etc. Data generation could either be expensive or infeasible, if a geometrical param eter, e.g., transistor gate-length needs to be sampled/changed. Often involves assum ptions and the m odel will be lim ited by the assumptions made by the simulator, e.g., 2.5D E M. Relatively easier to sweep any param eter in the sim ulator, because the changes are num erical and not physical/m anual.

6 Comparison of Neural Network Based Microwave Model Development Using Data from Two Types of Data Generators (continued) Basis of Comparison Neural Model Development Using Measurement Data Neural Model Development Using Simulation Data Sources of Small and Large/Gross Errors Equipm ent limitations and tolerances. Accuracy limitations and nonconvergence of simulations. Feasibility of Getting Desired Output Development of models is possible for measurable responses only. For example, drain charge of an FET may not be easy to measure. Any response can be modeled as long as it can be computed by the simulator.

7 Data Generation: (c) Range of x to be sampled For Testing Data and Validation data: xmin x max : Should represent the user-intended range in which the NN is to be used by user For Training Data: Default range of x samples should be equal to userintended range, or if feasible, slightly beyond the user intended range

8 Data Generation - where data should be sampled x3 x1 x2 (d) Distribution of x samples Uniform grid distribution Non-uniform grid distribution Design of Experiments (DOE) methodology central-composite design 2 n factorial design Star distribution Random distribution

9 Data Generation (continued): (e) Number of samples P -- Theoretical factor: For grid distribution case: Shannon s Theorem For random distribution case: statistical confidence

10 Input / Output Scaling The orders of magnitude of various x and d values in microwave applications can be very different from one another. Scaling of training data is desirable for efficient neural network training The data can be scaled such that various x (or d ) have similar order of magnitude

11 Input / Output Scaling: Notation: x and y -- Original x and y x~ and ~ y -- Scaled x and y x, x max min -- Obtained from data x~ max, x ~ min -- Dictated by NN trainer Linear scale Scale formula: De-scaled formula: Log scale Scale formula: De-scale formula: x~ = x~ x x~ = x = min = x x + x x x min max ln( x x e min max min ( x ~ ~ x ~ x + ~ x ~ x x min x~ min + ) min max min ( x x~ max min ) x min )

12 Illustration of Data Scaling y d Scaling De-scaling Training data Scaled Data Neural network Trained Neural network model Scaling Scaling x x x x Data generation Data scaling Neural network training Finished model for user

13 Divide Data into Training Set, Validation Set and Testing Set Notation: P total number of data samples generated D Set for all data, D = {1, 2,, P} T r Training data set V -- Validation data set -- Test data set T e Ideally: Each data set ( Tr, V, T e ) should be an adequate representation of original y = f (x) problem in the entire range. Three sets have no overlap. x min ~ x max

14 Divide Data into Training Set, Validation Set and Testing Set Case 1: When original data is quite sufficient, split D into non-overlapping sets Case 2: When data is very limited, duplicate D, such that Tr = V = Te = D Case 3: Split data D into 2 sets.

15 Training / Validation and Testing Training error: Validation error: Test error: q 1 V k m 1 j q j min j max jk k j V y y d ) w, x ( y m size(v) 1 ) w ( E = = q 1 T k m 1 j q j min j max jk k j e T e e y y d ) w, x ( y m ) size(t 1 ) w ( E = = q 1 Tr k m 1 j q j min j max jk k j Tr y y d ) w, x ( y m size(tr) 1 ) w ( E = =

16 Where to Use Each Error Criteria ET r Training error: The training error E Tr (w) and its derivative w are used to determine how to update w during training Validation error: The validation error E V (w) is used as a stopping criteria during training, i.e., to determine if training is sufficient. Test error: The test error E Te (w) is used after training has finished to provide a final assessment of the quality of the trained neural network. Test error is not involved during training.

17 Flow-chart Showing Neural Network Training, Validation and Testing Update neural network weight parameters using a gradient-based algorithm (e.g., BP, backpropagation) quasi-newton) Compute derivatives of training error w.r.t. neural ANN weights network internal weights Evaluate training error Perform feedforward computation for all samples in training set No Assign random initial values for all the weight parameters Perform feedforward computation for all samples in validation set Evaluate validation error Desired accuracy achieved? Select a neural network structure, e.g., MLP START Evaluate test error as an independent quality measure of for the ANN trained model neural reliability network Perform feedforward computation for all samples in test set Yes STOP Training

18 Initial Value of NN Weights Before Training MLP: small random values RBF/Wavelet: estimate center & width of RBF or translation & dilation of Wavelet Knowledge Based NN: use physical/electrical experience

19 Overlearning Definition (strict): Math ET 0 r EV >> E T r Observation: NN memorized training data, but can not generalize well Possible reasons: a) Too many hidden neurons b) Not enough training data Actions: a) Add training data b) Delete hidden neurons c) Backup/retrieve previous solution

20 Neural Network Over-Learning output (y) Neural network Training data Validation data input (x)

21 Underlearning Definition (strict): Math E Tr >> 0 Observation: NN can not even represent the problem at training points Possible reasons: a) Not enough hidden neurons b) Training stuck at local solution c) Not enough training Actions: a) Add hidden neurons b) More training c) Perturb solution, then train

22 Neural Network Under-Learning Output (y) Neural network Training data Validation data Input (x)

23 Perfect Learning: Definition (strict): Math E Tr E V 0 Observation: generalized well

24 Perfect Learning of Neural Networks output (y) Neural network Training data Validation data input (x)

25 Types of Training Sample-by-sample (or online) training: ANN weights are updated every time a training sample is presented to the network, i.e., weight update is based on training error from that sample Batch-mode (or offline) training: ANN weights are updated after each epoch, i.e., weight update is based on training error from all the samples in training data set where an epoch is defined as a stage of ANN training that involves presentation of all the samples in the training data set to the neural network once for the purpose of learning Supervised training: using (x & y) data for training Un-supervised training: using only x data for training

26 Neural Network Training The error between training data and neural network outputs is fedback to the neural network to guide the internal weight update of the network d Training Error - y Training Data Neural Network w x

27 Training Problem Statement: Given training data (xk,dk ),k Tr Validation data (xk,dk ), k V NN model y(x, w) Find values of w, such that validation error is minimized min ( epoch) E V epoch where y j( x k, w (epoch)) 1 m EV (epoch) = PV m k V j= 1 ymax j y P V = size(v ) w( epoch) = w( epoch 1) + w( epoch 1) w epoch=0 = user s / software initial guess min j d jk q 1 q w( epoch 1) is the update determined by the optimization algorithm (training algorithm) which minimizes the training error

28 Steps of Gradient Based Training Algorithms: Step 1: w= initial guess epoch = 0 Step 2: If E V (epoch) < ε (given accuracy criteria) or epoch > max_epoch (max number of epochs), stop ( w) T Step 3: Calculate ET r (w) and using partial or all training Data w E r Step 4: Use optimization algorithm to find w w+ w w Step 5: If all training data are used, then epoch= epoch+ 1, go to Step 2, else go to Step 3

29 Update w in Gradient-based Methods w = η h where h is the direction of the update of w η is the step size of the update of w Gradient based methods use information of ET r (w) and to determine update direction of w. ( w) ET r w Step size η is determined by: Small fixed constant set by user Adaptive constant during training Line minimization method to find best value of η

30 Line Minimization Problem Statement Let a scalar function of one variable be defined as φ(η) = E Tr (w + η h) Given present value of w and direction h Find η such that φ(η) is minimized. Solution method: (1-dimensional optimization method): Sectioning Golden section method methods Fibonacci method Bisection method Quadratic method Interpolation methods Cubic method

31 Back-Propagation (BP), (Rumelhart, Hinton & Williams, 1986) We use the negative gradient direction: h = - for w = η h The neural network weights are updated during training as: or w E = w η where η is called the learning rate α is called the momentum factor T r ( w w ET ( w ) r w = w η + α w w ) epoch 1 ( w) ET r w

32 Determining η and α for BP : Set η and α as fixed constant η and α can be adaptive η = c / epoch, c is a constant STC (Darkens, 1992) c epoch 1 + ( ) ( η0 τ η = η0 c epoch ( ) ( ) + τ ( η0 τ where are user defined η 0, τ,c ) epoch τ ) 2 Delta-bar-delta (Jacobs, 1988) (a) A η i for each weight w in of NN, w i (b) η is adjusted during training using present and previous E information of T r ( w ) w i w i ET r ( w ) = wi ηi w

33 Conjugate Gradient Method w = η h ET ( w ) r Let E = w Then h(epoch) = E + γ h( epoch 1) where h = E γ = 0 E( epoch ) 2 E( epoch 1 ) 2 (Fletcher/Reeves) ( E( epoch ) E( epoch 1 )) γ = 2 E( epoch 1 ) Speed: generally fast than BP Memory: A few vectors of N W long, where weights/parameters in w T E( epoch ) Line minimization method η is determined by Trust Region method (Polak Ribiere) N W is the total # of NN

34 Quasi-Newton Method Let H be the Hessian matrix of E Tr w.r.t. w B be the inverse of H Weight update: w = η h, where Use information of w and g to approximate B: B B epoch= 0 epoch = = B I epoch 1 (DFP formula) where w = w(epoch) w(epoch-1) g = E( epoch) E( epoch 1) T w w + T w g B h = B E epoch 1 T g g g B T B epoch 1 epoch 1 g Speed: fast Main Memory Needed: 2 N W (Large)

35 Levenberg-Marquardt Method Obtain w from solving linear equations, e.g. T ( J J + µ I ) w = where e = [ e 1, e 2,..., e Ne ] T, J is Jacobian, J T e T e J = ( ) w T e i = δ > 0 Typical Levenberg Marquardt µ = 0 Gauss Newton jk = y j y ( x k max, j, w) d y jk min, j This method is good if problems. e can be very small, e.g., small residue Computation needs LU decomposition 2 Main Memory Needed:, (Large) N W

36 Other Training Methods Huber-Quasi-Newton Similar to Quasi-Newton, except that the error function for training is based on Huber function, and not the conventional least square error function. The Huber formulation allows the training algorithm to robustly handle both small random errors and accidental large error in training data.

37 Other Training Methods (continued) Simplex Method using information (w) only The method starts with several initial guesses of w. These initial points form a simplex in the w space. The method then iteratively updates the simplex using basic moves such as reflection, expansion, and contraction, according to the information of (w) at vertices of the simplex The error (w) generally decreases as the simplex is updated. ET r ET r ET r

38 Other Training Methods (continued) Genetic Algorithm using information E Tr (w) only, searching for global minimum The algorithm starts with several initial points of w called a population of w A fitness value is defined for each w such that w with lower error E Tr (w) has a high fitness w points with high fitness values are more likely selected as parents from whom new points of w called offspring are produced This process continues, and fitness among the population improves

39 Other Training Methods (continued) Simulated Annealing Algorithm using information E Tr (w) only, also searching for global minimum The method uses a temperature parameter to control the training process In early stages of training, temperature value is high and new w points with high error values could be accepted, thus avoiding being trapped in local minimum In late stages of training, temperature value is low and only the w points leading to a reduction in error are accepted

40 Qualitative Comparison of Different Algorithms Convergence Rate fast slow more less Levenberg Marquardt (for small-residue problems) Quasi-Newton Conjugate Gradient BP Memory need and effort in implementation

41 Comparison of Training Algorithms for 3-Conductor Microstrip Line Example (5 input neurons, 28 hidden neurons, 5 output neurons) Training Algorithm No. of Epochs Training Error (%) Avg. Test Error (%) CPU (in Sec) Adaptive Backpropagation Conjugate-gradient Quasi-Newton Levenberg- Marquardt

42 Comparison of Training Algorithms for MESFET Example (4 input neurons, 60 hidden neurons, 8 output neurons) Training Algorithm No of Epochs Training Error (%) Ave. Test Error (%) CPU (in Sec) Adaptive Backpropagation Conjugate Gradient Quasi-Newton Levenberg- Marquardt

Introduction to Neural Networks: Structure and Training

Introduction to Neural Networks: Structure and Training Introduction to Neural Networks: Structure and Training Qi-Jun Zhang Department of Electronics Carleton University, Ottawa, ON, Canada A Quick Illustration Example: Neural Network Model for Delay Estimation

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient Optimization Last time Root finding: definition, motivation Algorithms: Bisection, false position, secant, Newton-Raphson Convergence & tradeoffs Example applications of Newton s method Root finding in

More information

Classical Gradient Methods

Classical Gradient Methods Classical Gradient Methods Note simultaneous course at AMSI (math) summer school: Nonlin. Optimization Methods (see http://wwwmaths.anu.edu.au/events/amsiss05/) Recommended textbook (Springer Verlag, 1999):

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,

More information

Multi Layer Perceptron trained by Quasi Newton learning rule

Multi Layer Perceptron trained by Quasi Newton learning rule Multi Layer Perceptron trained by Quasi Newton learning rule Feed-forward neural networks provide a general framework for representing nonlinear functional mappings between a set of input variables and

More information

Adaptive Regularization. in Neural Network Filters

Adaptive Regularization. in Neural Network Filters Adaptive Regularization in Neural Network Filters Course 0455 Advanced Digital Signal Processing May 3 rd, 00 Fares El-Azm Michael Vinther d97058 s97397 Introduction The bulk of theoretical results and

More information

Neural Network Structures and Training Algorithms for RF and Microwave Applications

Neural Network Structures and Training Algorithms for RF and Microwave Applications Neural Network Structures and Training Algorithms for RF and Microwave Applications Fang Wang, Vijaya K. Devabhaktuni, Changgeng Xi, Qi-Jun Zhang Department of Electronics, Carleton University, Ottawa,

More information

SYSTEMS OF NONLINEAR EQUATIONS

SYSTEMS OF NONLINEAR EQUATIONS SYSTEMS OF NONLINEAR EQUATIONS Widely used in the mathematical modeling of real world phenomena. We introduce some numerical methods for their solution. For better intuition, we examine systems of two

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Introduction to optimization methods and line search

Introduction to optimization methods and line search Introduction to optimization methods and line search Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi How to find optimal solutions? Trial and error widely used in practice, not efficient and

More information

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited. page v Preface xiii I Basics 1 1 Optimization Models 3 1.1 Introduction... 3 1.2 Optimization: An Informal Introduction... 4 1.3 Linear Equations... 7 1.4 Linear Optimization... 10 Exercises... 12 1.5

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

CHAPTER VI BACK PROPAGATION ALGORITHM

CHAPTER VI BACK PROPAGATION ALGORITHM 6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted

More information

Recapitulation on Transformations in Neural Network Back Propagation Algorithm

Recapitulation on Transformations in Neural Network Back Propagation Algorithm International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 4 (2013), pp. 323-328 International Research Publications House http://www. irphouse.com /ijict.htm Recapitulation

More information

Experimental Data and Training

Experimental Data and Training Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Conjugate Direction Methods Barnabás Póczos & Ryan Tibshirani Conjugate Direction Methods 2 Books to Read David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming Nesterov:

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence

More information

Camera calibration. Robotic vision. Ville Kyrki

Camera calibration. Robotic vision. Ville Kyrki Camera calibration Robotic vision 19.1.2017 Where are we? Images, imaging Image enhancement Feature extraction and matching Image-based tracking Camera models and calibration Pose estimation Motion analysis

More information

Introduction to Optimization Problems and Methods

Introduction to Optimization Problems and Methods Introduction to Optimization Problems and Methods wjch@umich.edu December 10, 2009 Outline 1 Linear Optimization Problem Simplex Method 2 3 Cutting Plane Method 4 Discrete Dynamic Programming Problem Simplex

More information

APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES

APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES FIG WORKING WEEK 2008 APPLICATION OF A MULTI- LAYER PERCEPTRON FOR MASS VALUATION OF REAL ESTATES Tomasz BUDZYŃSKI, PhD Artificial neural networks the highly sophisticated modelling technique, which allows

More information

A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD

A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD 1 st International Conference From Scientific Computing to Computational Engineering 1 st IC SCCE Athens, 8 10 September, 2004 c IC SCCE A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE

More information

NN-GVEIN: Neural Network-Based Modeling of Velocity Profile inside Graft-To-Vein Connection

NN-GVEIN: Neural Network-Based Modeling of Velocity Profile inside Graft-To-Vein Connection Proceedings of 5th International Symposium on Intelligent Manufacturing Systems, Ma1y 29-31, 2006: 854-862 Sakarya University, Department of Industrial Engineering1 NN-GVEIN: Neural Network-Based Modeling

More information

Characterizing Improving Directions Unconstrained Optimization

Characterizing Improving Directions Unconstrained Optimization Final Review IE417 In the Beginning... In the beginning, Weierstrass's theorem said that a continuous function achieves a minimum on a compact set. Using this, we showed that for a convex set S and y not

More information

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization

Performance Evaluation of an Interior Point Filter Line Search Method for Constrained Optimization 6th WSEAS International Conference on SYSTEM SCIENCE and SIMULATION in ENGINEERING, Venice, Italy, November 21-23, 2007 18 Performance Evaluation of an Interior Point Filter Line Search Method for Constrained

More information

MATH3016: OPTIMIZATION

MATH3016: OPTIMIZATION MATH3016: OPTIMIZATION Lecturer: Dr Huifu Xu School of Mathematics University of Southampton Highfield SO17 1BJ Southampton Email: h.xu@soton.ac.uk 1 Introduction What is optimization? Optimization is

More information

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences Energy Minimization -Non-Derivative Methods -First Derivative Methods Background Image Courtesy: 3dciencia.com visual life sciences Introduction Contents Criteria to start minimization Energy Minimization

More information

Efficient Iterative Semi-supervised Classification on Manifold

Efficient Iterative Semi-supervised Classification on Manifold . Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Radial Basis Function Networks Adrian Horzyk Preface Radial Basis Function Networks (RBFN) are a kind of artificial neural networks that use radial basis functions (RBF) as activation

More information

Constrained and Unconstrained Optimization

Constrained and Unconstrained Optimization Constrained and Unconstrained Optimization Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Oct 10th, 2017 C. Hurtado (UIUC - Economics) Numerical

More information

Automatic Adaptation of Learning Rate for Backpropagation Neural Networks

Automatic Adaptation of Learning Rate for Backpropagation Neural Networks Automatic Adaptation of Learning Rate for Backpropagation Neural Networks V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece.

More information

Neural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018

Neural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018 Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2018 1 Story so far Neural networks are universal approximators Can model any odd thing Provided they have the right architecture We must

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Quantitative Macroeconomics Raül Santaeulàlia-Llopis MOVE-UAB and Barcelona GSE Fall 2018 Raül Santaeulàlia-Llopis (MOVE-UAB,BGSE) QM: Numerical Optimization Fall 2018 1 / 46 1 Introduction

More information

An Improved Backpropagation Method with Adaptive Learning Rate

An Improved Backpropagation Method with Adaptive Learning Rate An Improved Backpropagation Method with Adaptive Learning Rate V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, Division of Computational Mathematics

More information

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Research on Evaluation Method of Product Style Semantics Based on Neural Network Research Journal of Applied Sciences, Engineering and Technology 6(23): 4330-4335, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: September 28, 2012 Accepted:

More information

COMPUTATIONAL NEURAL NETWORKS FOR GEOPHYSICAL DATA PROCESSING

COMPUTATIONAL NEURAL NETWORKS FOR GEOPHYSICAL DATA PROCESSING SEISMIC EXPLORATION Volume 30 COMPUTATIONAL NEURAL NETWORKS FOR GEOPHYSICAL DATA PROCESSING edited by Mary M. POULTON Department of Mining & Geological Engineering Computational Intelligence & Visualization

More information

Introduction to Design Optimization: Search Methods

Introduction to Design Optimization: Search Methods Introduction to Design Optimization: Search Methods 1-D Optimization The Search We don t know the curve. Given α, we can calculate f(α). By inspecting some points, we try to find the approximated shape

More information

Multivariate Numerical Optimization

Multivariate Numerical Optimization Jianxin Wei March 1, 2013 Outline 1 Graphics for Function of Two Variables 2 Nelder-Mead Simplex Method 3 Steepest Descent Method 4 Newton s Method 5 Quasi-Newton s Method 6 Built-in R Function 7 Linear

More information

Pattern Classification Algorithms for Face Recognition

Pattern Classification Algorithms for Face Recognition Chapter 7 Pattern Classification Algorithms for Face Recognition 7.1 Introduction The best pattern recognizers in most instances are human beings. Yet we do not completely understand how the brain recognize

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

arxiv: v1 [cs.cv] 2 May 2016

arxiv: v1 [cs.cv] 2 May 2016 16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer

More information

Humanoid Robotics. Least Squares. Maren Bennewitz

Humanoid Robotics. Least Squares. Maren Bennewitz Humanoid Robotics Least Squares Maren Bennewitz Goal of This Lecture Introduction into least squares Use it yourself for odometry calibration, later in the lecture: camera and whole-body self-calibration

More information

Solving for dynamic user equilibrium

Solving for dynamic user equilibrium Solving for dynamic user equilibrium CE 392D Overall DTA problem 1. Calculate route travel times 2. Find shortest paths 3. Adjust route choices toward equilibrium We can envision each of these steps as

More information

Recent Developments in Model-based Derivative-free Optimization

Recent Developments in Model-based Derivative-free Optimization Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being

More information

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018 CS 395T Lecture 12: Feature Matching and Bundle Adjustment Qixing Huang October 10 st 2018 Lecture Overview Dense Feature Correspondences Bundle Adjustment in Structure-from-Motion Image Matching Algorithm

More information

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles

Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles INTERNATIONAL JOURNAL OF MATHEMATICS MODELS AND METHODS IN APPLIED SCIENCES Comparison of Interior Point Filter Line Search Strategies for Constrained Optimization by Performance Profiles M. Fernanda P.

More information

Introduction to Design Optimization: Search Methods

Introduction to Design Optimization: Search Methods Introduction to Design Optimization: Search Methods 1-D Optimization The Search We don t know the curve. Given α, we can calculate f(α). By inspecting some points, we try to find the approximated shape

More information

Fast Learning for Big Data Using Dynamic Function

Fast Learning for Big Data Using Dynamic Function IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Fast Learning for Big Data Using Dynamic Function To cite this article: T Alwajeeh et al 2017 IOP Conf. Ser.: Mater. Sci. Eng.

More information

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Image Registration Lecture 4: First Examples

Image Registration Lecture 4: First Examples Image Registration Lecture 4: First Examples Prof. Charlene Tsai Outline Example Intensity-based registration SSD error function Image mapping Function minimization: Gradient descent Derivative calculation

More information

Spatial Variation of Sea-Level Sea level reconstruction

Spatial Variation of Sea-Level Sea level reconstruction Spatial Variation of Sea-Level Sea level reconstruction Biao Chang Multimedia Environmental Simulation Laboratory School of Civil and Environmental Engineering Georgia Institute of Technology Advisor:

More information

Dynamic Analysis of Structures Using Neural Networks

Dynamic Analysis of Structures Using Neural Networks Dynamic Analysis of Structures Using Neural Networks Alireza Lavaei Academic member, Islamic Azad University, Boroujerd Branch, Iran Alireza Lohrasbi Academic member, Islamic Azad University, Boroujerd

More information

A VARIANT OF BACK-PROPAGATION ALGORITHM FOR MULTILAYER FEED-FORWARD NETWORK. Anil Ahlawat, Sujata Pandey

A VARIANT OF BACK-PROPAGATION ALGORITHM FOR MULTILAYER FEED-FORWARD NETWORK. Anil Ahlawat, Sujata Pandey International Conference «Information Research & Applications» - i.tech 2007 1 A VARIANT OF BACK-PROPAGATION ALGORITHM FOR MULTILAYER FEED-FORWARD NETWORK Anil Ahlawat, Sujata Pandey Abstract: In this

More information

Newton and Quasi-Newton Methods

Newton and Quasi-Newton Methods Lab 17 Newton and Quasi-Newton Methods Lab Objective: Newton s method is generally useful because of its fast convergence properties. However, Newton s method requires the explicit calculation of the second

More information

YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1)

YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1) YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1) Algebra and Functions Quadratic Functions Equations & Inequalities Binomial Expansion Sketching Curves Coordinate Geometry Radian Measures Sine and

More information

Diffuse Optical Tomography, Inverse Problems, and Optimization. Mary Katherine Huffman. Undergraduate Research Fall 2011 Spring 2012

Diffuse Optical Tomography, Inverse Problems, and Optimization. Mary Katherine Huffman. Undergraduate Research Fall 2011 Spring 2012 Diffuse Optical Tomography, Inverse Problems, and Optimization Mary Katherine Huffman Undergraduate Research Fall 11 Spring 12 1. Introduction. This paper discusses research conducted in order to investigate

More information

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R. Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

B553 Lecture 12: Global Optimization

B553 Lecture 12: Global Optimization B553 Lecture 12: Global Optimization Kris Hauser February 20, 2012 Most of the techniques we have examined in prior lectures only deal with local optimization, so that we can only guarantee convergence

More information

Hybrid Learning of Feedforward Neural Networks for Regression Problems. Xing Wu

Hybrid Learning of Feedforward Neural Networks for Regression Problems. Xing Wu Hybrid Learning of Feedforward Neural Networks for Regression Problems by Xing Wu A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the

More information

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING

INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING INTRODUCTION TO LINEAR AND NONLINEAR PROGRAMMING DAVID G. LUENBERGER Stanford University TT ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California London Don Mills, Ontario CONTENTS

More information

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. Mohammad Mahmudul Alam Mia, Shovasis Kumar Biswas, Monalisa Chowdhury Urmi, Abubakar

More information

M. Sc. (Artificial Intelligence and Machine Learning)

M. Sc. (Artificial Intelligence and Machine Learning) Course Name: Advanced Python Course Code: MSCAI 122 This course will introduce students to advanced python implementations and the latest Machine Learning and Deep learning libraries, Scikit-Learn and

More information

Computational Methods. Constrained Optimization

Computational Methods. Constrained Optimization Computational Methods Constrained Optimization Manfred Huber 2010 1 Constrained Optimization Unconstrained Optimization finds a minimum of a function under the assumption that the parameters can take on

More information

CHAPTER 4 DETERMINATION OF OPTIMAL PATTERN RECOGNITION TECHNIQUE BY SOFTCOMPUTING ANALYSIS

CHAPTER 4 DETERMINATION OF OPTIMAL PATTERN RECOGNITION TECHNIQUE BY SOFTCOMPUTING ANALYSIS 70 CHAPTER 4 DETERMINATION OF OPTIMAL PATTERN RECOGNITION TECHNIQUE BY SOFTCOMPUTING ANALYSIS 4.1 INTRODUCTION Pattern recognition is the act of taking in raw data and taking action based on the category

More information

Multilayer Feed-forward networks

Multilayer Feed-forward networks Multi Feed-forward networks 1. Computational models of McCulloch and Pitts proposed a binary threshold unit as a computational model for artificial neuron. This first type of neuron has been generalized

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics

1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics 1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-4: Constrained optimization Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428 June

More information

Introduction to Objective Analysis

Introduction to Objective Analysis Chapter 4 Introduction to Objective Analysis Atmospheric data are routinely collected around the world but observation sites are located rather randomly from a spatial perspective. On the other hand, most

More information

Trust Region Methods for Training Neural Networks

Trust Region Methods for Training Neural Networks Trust Region Methods for Training Neural Networks by Colleen Kinross A thesis presented to the University of Waterloo in fulllment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

Today s class. Roots of equation Finish up incremental search Open methods. Numerical Methods, Fall 2011 Lecture 5. Prof. Jinbo Bi CSE, UConn

Today s class. Roots of equation Finish up incremental search Open methods. Numerical Methods, Fall 2011 Lecture 5. Prof. Jinbo Bi CSE, UConn Today s class Roots of equation Finish up incremental search Open methods 1 False Position Method Although the interval [a,b] where the root becomes iteratively closer with the false position method, unlike

More information

6. Backpropagation training 6.1 Background

6. Backpropagation training 6.1 Background 6. Backpropagation training 6.1 Background To understand well how a feedforward neural network is built and it functions, we consider its basic first steps. We return to its history for a while. In 1949

More information

CS281 Section 3: Practical Optimization

CS281 Section 3: Practical Optimization CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical

More information

LECTURE NOTES Non-Linear Programming

LECTURE NOTES Non-Linear Programming CEE 6110 David Rosenberg p. 1 Learning Objectives LECTURE NOTES Non-Linear Programming 1. Write out the non-linear model formulation 2. Describe the difficulties of solving a non-linear programming model

More information

Math 2 Final Exam Study Guide. Translate down 2 units (x, y-2)

Math 2 Final Exam Study Guide. Translate down 2 units (x, y-2) Math 2 Final Exam Study Guide Name: Unit 2 Transformations Translation translate Slide Moving your original point to the left (-) or right (+) changes the. Moving your original point up (+) or down (-)

More information

Gradient Methods for Machine Learning

Gradient Methods for Machine Learning Gradient Methods for Machine Learning Nic Schraudolph Course Overview 1. Mon: Classical Gradient Methods Direct (gradient-free), Steepest Descent, Newton, Levenberg-Marquardt, BFGS, Conjugate Gradient

More information

Optimization of Components using Sensitivity and Yield Information

Optimization of Components using Sensitivity and Yield Information Optimization of Components using Sensitivity and Yield Information Franz Hirtenfelder, CST AG franz.hirtenfelder@cst.com www.cst.com CST UGM 2010 April 10 1 Abstract Optimization is a typical component

More information

CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK IN GAIT RECOGNITION

CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK IN GAIT RECOGNITION 75 CHAPTER 6 COUNTER PROPAGATION NEURAL NETWORK IN GAIT RECOGNITION 6.1 INTRODUCTION Counter propagation network (CPN) was developed by Robert Hecht-Nielsen as a means to combine an unsupervised Kohonen

More information

NMath Analysis User s Guide

NMath Analysis User s Guide NMath Analysis User s Guide Version 2.0 CenterSpace Software Corvallis, Oregon NMATH ANALYSIS USER S GUIDE 2009 Copyright CenterSpace Software, LLC. All Rights Reserved. The correct bibliographic reference

More information

A Deeper Insight of Neural Network Spatial Interaction Model s Performance Trained with Different Training Algorithms

A Deeper Insight of Neural Network Spatial Interaction Model s Performance Trained with Different Training Algorithms A Deeper Insight of Neural Network Spatial Interaction Model s Performance Trained with Different Training Algorithms Gusri YALDI Civil Engineering Department, Padang State Polytechnic, Padang, Indonesia

More information

AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENETIC ALGORITHMS

AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENETIC ALGORITHMS AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENETIC ALGORITHMS Seyed Abolfazl Shahzadehfazeli 1, Zainab Haji Abootorabi,3 1 Parallel Processing Laboratory, Yazd University,

More information

Programs. Introduction

Programs. Introduction 16 Interior Point I: Linear Programs Lab Objective: For decades after its invention, the Simplex algorithm was the only competitive method for linear programming. The past 30 years, however, have seen

More information

Lecture 5: Optimization of accelerators in simulation and experiments. X. Huang USPAS, Jan 2015

Lecture 5: Optimization of accelerators in simulation and experiments. X. Huang USPAS, Jan 2015 Lecture 5: Optimization of accelerators in simulation and experiments X. Huang USPAS, Jan 2015 1 Optimization in simulation General considerations Optimization algorithms Applications of MOGA Applications

More information

Fast Training of Multilayer Perceptrons

Fast Training of Multilayer Perceptrons Fast Training of Multilayer Perceptrons Brijesh Verma, Member of IEEE & IASTED School of Information Technology Faculty of Engineering and Applied Science Griffith University, Gold Coast Campus Gold Coast,

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,

More information

RIMT IET, Mandi Gobindgarh Abstract - In this paper, analysis the speed of sending message in Healthcare standard 7 with the use of back

RIMT IET, Mandi Gobindgarh Abstract - In this paper, analysis the speed of sending message in Healthcare standard 7 with the use of back Global Journal of Computer Science and Technology Neural & Artificial Intelligence Volume 13 Issue 3 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

More information

Machine Learning written examination

Machine Learning written examination Institutionen för informationstenologi Olle Gällmo Universitetsadjunt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Machine Learning written examination Friday, June 10, 2011 8 00-13 00 Allowed help

More information

Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Network and Fuzzy Simulation

Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Network and Fuzzy Simulation .--- Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Networ and Fuzzy Simulation Abstract - - - - Keywords: Many optimization problems contain fuzzy information. Possibility

More information

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of

MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS CHINMAY RANE. Presented to the Faculty of Graduate School of MULTILAYER PERCEPTRON WITH ADAPTIVE ACTIVATION FUNCTIONS By CHINMAY RANE Presented to the Faculty of Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for

More information

Applying Neural Network Architecture for Inverse Kinematics Problem in Robotics

Applying Neural Network Architecture for Inverse Kinematics Problem in Robotics J. Software Engineering & Applications, 2010, 3: 230-239 doi:10.4236/jsea.2010.33028 Published Online March 2010 (http://www.scirp.org/journal/jsea) Applying Neural Network Architecture for Inverse Kinematics

More information