MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

Size: px
Start display at page:

Download "MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms"


1 MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points or observations for which we know the desired output, expressed in terms of categorical classes, numerical or logical variables or as generic observed description of any real problem. The desired output is in fact providing some level of supervision in that it is used by the learning model to adjust parameters or make decisions allowing it to predict correct output for new data. Finally, when the algorithm is able to correctly predict observations we define it a classifier. Some classifiers are also capable of providing results in a more probabilistic sense, i.e. a probability of a data point belonging to class. We usually refer to such model behavior as regression. A typical workflow for supervised learning is shown in the diagram of Figure 1. Figure 1 The typical workflow based on supervised machine learning. The process is based on the following main steps: Pre-processing of data. First we need to build input patterns that are appropriate for feeding into our supervised learning algorithm. This includes scaling and preparation of data; Create data sets for training and evaluation. This is done by randomly splitting the universe of data patterns. The training set is made of the data used by the classifier to learn their internal feature correlations, whereas the evaluation set is used to validate the already trained model in order to get an error rate (or other validation measures) that can help to identify the performance and accuracy of the classifier. Typically, you will use more training data than validation data; Training of the model. We execute the model on the training data set. The output result consists of a model that (in the successful case) has learned how to predict the outcome when new unknown data are submitted; Validation. After we have created the model, it is of course required a test of its performance accuracy, completeness and contamination (or its dual, the purity). It is particularly crucial to do this on data that the model has not seen yet. This is main reason why on previous steps we separated the data set into training patterns and a subset of the data not used for training. We intend to verify and 1

2 measure the generalization capabilities of the model. It is very easy to learn every single combination of input vectors and their mappings to the output as observed on the training data, and we can achieve a very low error in doing that, but how does the very same rules or mappings perform on new data that may have different input to output mappings? If the classification error of the validation set is higher than the training error, then we have to go back and adjust model parameters. The reason could be that the model has essentially memorized the answers seen in the training data, failing its generalization capabilities. This is a typical behavior in case of overfitting, and there are various techniques for overcoming it; Use (Run). If validation was successful, the model has correctly learned the underlying real problem. So far we can proceed to use the model to classify/predict new data. As suggested by its name, the tool MLPQNA-LEMON is referred to two kinds of supervised ML models. Both are based on the same topological architecture (for instance the Multi Layer Perceptron neural network), forking at level of the backward learning phase, where it is possible to choose between the QNA or LEMON rule. The QNA (Quasi Newton Algorithm) learning rule belongs to the Newton s methods aimed at finding the stationary point of a function through a statistical approximation of the Hessian of the training error obtained by a cyclic gradient calculation. MLPQNA makes use of the known L-BFGS algorithm (Limited memory Broyden Fletcher Goldfarb Shanno), originally designed for problems with a wide parameter space. LEMON (LEvenberg-Marquardt Optimization Network) is based on the modified Levenberg-Marquardt method which makes use of the exact Hessian of the error function (and not of its linearized approximation). For networks with up to several hundreds of weights this algorithm is comparable with the QNA (often faster). But its main advantage is that it does not require stopping criteria. This method almost always converges exactly to one of the minima of a function. 2 Implementation information The implementation is in C++. It has been tested and validated on 64-bit machines running MS Windows 7/8/10 or Linux Ubuntu/SL. It does not require external packages (it embeds some third-party packages as local libraries) except for some DLLs (Dynamic Link Libraries) in the case of Windows OS. Its execution foresees a set of command-line options. These are organized into options, use cases and functional use cases. MLPQNA rule is based on the L-BFGS algorithm (limited memory BFGS), a quasi-newton method with fixed iteration cost O(Npatterns Weights) and moderate memory requirements O(Weights). This algorithm is ideally suitable for solving large-scale problems, and is quite good at dealing with problems of average and small dimensions. LEMON is based on the modified Levenberg-Marquardt method, using the exact Hessian of the error function (NOT linearized approximation). For a networks with up to several hundreds of weights this algorithm is comparable with L-BFGS (often it is faster than the L-BFGS). But its main advantage is that it does not require at all that stopping criteria be specified (absence of two QNA parameters wstep and iterations). This method will almost always converge exactly to the one of the minima of a function. Nevertheless, there are also things putting it at a disadvantage when solving large scale problems: high iteration cost, equal to O(NPatterns Weights 2 ) and high memory requirements, equal to O(Weights 2 ). 3 Learning rules The learning algorithm for a MLP must update the network weights in order to minimize the error function by following any of the mentioned rules (QNA or LEMON). 2

3 Among various learning parameters, the most important is concerning the concept of regularization. The implemented MLPQNA and LEMON models use Tikhonov regularization (AKA weight decay). When the regularization factor is accurately chosen, then generalization error of the trained neural network can be improved, and training can be accelerated. However, the a-priori choice of the best decay parameter is impossible and strongly depending on the specific problem. Its selection must be done through a heuristic trial-and-error process. Therefore, if it is unknown what decay regularization value to choose (as usual), it could be experimented the values within the range of (weak regularization) up to 100 (very strong regularization). It should be searched through the values, starting with the minimum and making the Decay value 3 to 10 times as much at each step, while checking, by cross-validation the network's generalization error. It should be noted that if the Decay specified value is too small (less than 0.001), it will be automatically increased up to the permissible minimum: the MLPQNA+LEMON package always implements at least minimum regularization of a task. In order to achieve the weight decay rule, we minimize more complex merit function: f = E + λs 2. Here E is a training set error, S is a sum of squares of network weights, and decay coefficient λ controls amount of smoothing applied to the network. Optimization is performed from the initial point and until the successful stopping of the optimizer. The figure below shows us a spectrum of neural networks trained with different values of λ from zero value (no regularization) to infinitely large λ. It can be seen that we control tendency to overfit by continuously changing λ. Zero λ corresponds to overfitted network. Infinitely large λ gives us underfitted network, equal to some constant. Between these extreme values there is a range of networks which reproduce dataset with different degrees of precision and smoothness. Again, as shown, the perfect network is outside of this range. We can choose good neural network by tuning λ weight decay coefficient. Optimal lambda can be selected by using test set or cross-validation (in this case all dataset can be used for training). Figure 2 learning trend by optimization as conditioned by the decay parameter. For classification problems we use a MLP network with a linear output layer and SOFTMAX-normalization as output evaluation. The network output is considered nonnegative, and the sum of all output (provided by all neurons of the output layer) should be strictly equal to one, permitting using them as the probability that the input vector will be referred to one of the classes (in the extreme case, outputs of the trained network are converging to these probabilities). The number of outputs in such a network is always no less than two (which is a restriction imposed by the elementary logic). 3

4 There are two basic views commonly held in statistics on how a classification problem solution should look like. The first viewpoint is that any object shall refer to one and only one of the classes. For example, if classification is in question, then "spam" and "non-spam" classes can be distinguished. There can be some uncertainty in the classification (an can be somewhat similar to spam), but only the terminal decision, whether it is spam or non-spam, will be returned. The second approach, the one used by the MLPQNA+LEMON package, consists in obtaining a vector of posterior probabilities, that is, a vector having component parts equal to probabilities that the object belongs to each class. The algorithm does not take any decision on the classification of an . It just notifies how much probability there is that a particular is spam, and how much probability there is that it is not. And the decision making based on this information is transferred to the user. The second approach is more flexible than the first one, and it is more reasonable. How does the classification algorithm happen to know about the order of priority the user is sticking to? In some cases, it is necessary to minimize the error made in one of the classes, e.g., the misclassification of an as spam. Then the will be classified as spam only in that case if there is very little probability (e.g., less than 0.05%) that it is NON spam. In other cases, all classes are equal to each other, and a class with a maximum conditional probability can just be chosen. Therefore, the outcome of any classification algorithm of our package is a posterior probability vector, instead of the class which an object can be put into. After the model is built, the error on a test (or training) set needs to be estimated. To estimate regression results, three measures of error can be used, that is, a root-mean-square error, an average error and an average relative error (the latter being calculated as per the records with a nonzero value of the dependent variable). These three measures of error are commonly known, and need not to be discussed. If a classification problem is at issue, then five measures of error can be used. The first and best-known is the classification error (the number or percent of the incorrectly classified cases). The second equally known measure is cross-entropy. The MLPQNA+LEMON package uses average cross-entropy per record estimated in bits (base 2 logarithm). The use of average cross-entropy (instead of total cross-entropy) permits comparable estimates for different test sets to be obtained. The remaining three error measures are the root-mean-square error, average error and average relative error again. However, as opposed to the regression task, they are used here to characterize the posterior probability vector miscalculation. The error implies how much the probability vector calculated by means of a classification algorithm differs from the vector obtained on the basis of a test set (this vector's component parts are equal to 0 or 1, subject to the class which the object belongs to). The meaning of the root-mean-square error and average error is comprehensible: it is an error in conditional probability approximation that is averaged as per all probabilities. The average relative error is an average error in approximating the probability that an object is correctly classified (same as average error for binary tasks). Taking into account these considerations, the implemented MLP error functions in the MLPQNA-LEMON package are (calculated at the end of each batch Hessian cycle): REGRESSION ERROR Least Square error + Tikhonov regularization E = Npatterns (y i t i ) W 2 λ 2 where, y and t are respectively, output and target for each pattern, while W is the weight matrix of MLP. 4

5 CLASSIFICATION ERROR 1. Cross entropy enabled cross-entropy per record estimated in bits (logarithm); Npatterns E = ln ( 1 ) y i 2. Cross entropy disabled percentage of misclassified patterns at each cycle; From a theoretical point of view, there is the possibility to furnish the extremely complex expression of regression function calculating the output (example the zphot value) through MLP network (trained by MLPQNA+LEMON) with two hidden layers. Given: i = 1 N index of N input neurons (features for example related to magnitudes/colors) for a single input pattern x = {x i } N ; h 1 = 1 H 1 index of H 1 neurons of the first hidden layer; h 2 = 1 H 2 index of H 2 neurons of the second hidden layer; y index of the single output neuron; w h1 i, w h2 h 1, w yh2 weights among network layers, respectively, hidden1-input, hidden2-hidden1 and output-hidden2); θ h1, θ h2, θ y the bias related to different neuron layers, respectively, hidden 1, hidden 2 and output; where each neuron of all layers has its activation function f(x) = tanh (x); We obtain: zphot(x) = e e 2 H2 e h2=1 w yh 2 H2 h2=1 2 w yh 2 [ e e [ e N w 2 w h 2h1 (e2 h1i x i H1 e 2θ h1 e 2 N w h1i x i+e 2θ ) h1=1 h1 e 2θ h2 N w 2 w h 2h1 (e2 h1i x i H1 e 2θ h1 e 2 N w h1i x i+e 2θ ) h1=1 h1 +e 2θ h2 ] N w 2 w h 2h1 (e2 h1i x i H1 e 2θ h1 e 2 N w h1i x i+e 2θ ) h1=1 h1 e 2θ h2 N w 2 w h 2h1 (e2 h1i x i H1 e 2θ h1 e 2 N w h1i x i+e 2θ ) h1=1 h1 +e 2θ h2] e 2θ y + e 2θ y 5

6 4 Main commands In terms of command-line interface, we distinguish among three kinds of interfaces: Command-line options (e.g. prompt>> mlpqna-lemon [option]): o help: list of command lines valid for all functional use cases; o version: code version information; o author: author information; Command-line use cases (e.g. prompt>> mlpqna-lemon [use_case]): o Classification: multivariate classification; o Regression: non-linear regression; Command-line functional use cases (e.g. prompt>> mlpqna-lemon [use_case] [function]: o QNA TRAIN: training using QNA learning rule; o LEMON TRAIN: training using LEMON learning rule; o TEST: testing; o RUN: execution of trained/tested model on arbitrary data. In order to select the specific use case at command-line level, the user must provide a sequence of parameters, presenting some differences accordingly to the specific use case. The list of these command-line parameters is provided in the following sections. 4.1 TRAINING command line This is the exact composition of the command-line parameter list in the case of training: >> mlpqna_lemon use_case function decay restarts wstep iterations ninp nout nhidlay nhid1 nhid2 CE input_path CV k 1 W_init W_name exp_dir Here, the description of each parameter: 1. mlpqna_lemon: name of the program; 2. use_case: [integer] code for the use case (10 classification, 20 regression); 3. function: [integer] functional use case (3 QNA train, 7 LEMON train); 4. decay: [float] decay parameter of QNA/LEMON weight updating law. Weight decay constant, (>=0.001). Decay term 'Decay* Weights ^2' is added to error function. Default value = 0.001; 5. restarts: [integer] max number of random gradient calculations approximating Hessian; 6. wstep: [integer] max error on Hessian approximation (fixed to 0 for LEMON); 7. iterations: [integer] max number of iterations for each restart (fixed to 0 for LEMON); 8. ninp: [integer] number of input nodes (input data features); 9. nout: [integer] number of output nodes (1 for regression, arbitrary for classification); 10. nhidlay: [integer] number of hidden layers of the network (1 or 2); 11. nhid1: [integer] number of neurons of first hidden layer; 12. nhid2: [integer] number of neurons of second hidden layer; 13. 1: fixed (internal reasons); 14. input_path: [string] relative/absolute pathname of the input data file; 15. CV: [1/0] flag to enable/disable k-fold cross validation; 16. k: [integer] number of cross validation folds (indifferent if CV disabled); 17. CE: [1/0] flag to enable/disable Cross Entropy (used for classification only); 6

7 18. W_init: Weight initialization choice [integer]. It issues how to initialize network weights. It is possible to resume a previous training phase: 702 RANDOM initialization between [-1, +1] or 704 FROM_FILE. To be used in case of past training resume; 19. W_name: Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]. To be used in case of parameter 18 set to FROM_FILE value. If parameter 18 is RANDOM, this is not considered; 20. exp_dir: [string] relative pathname for the output (DIR must exist and ex/re or ex/cl must be created within it). 4.2 TEST/RUN command line This is the exact composition of the command-line parameter list in the case of test/run: >> mlpqna_lemon use_case function ninp nout nhidlay nhid1 nhid2 input_path 1 trainedweightsfile trainedparamsfile exp_dir Here, the description of each parameter: 1. mlpqna_lemon: name of the program; 2. use_case: [integer] code for the use case (10 classification, 20 regression); 3. function: [integer] functional use case (4 test, 5 run); 4. ninp: [integer] number of input nodes (input data features); 5. nout: [integer] number of output nodes (1 for regression, arbitrary for classification); 6. nhidlay: [integer] number of hidden layers of the network (1 or 2); 7. nhid1: [integer] number of neurons of first hidden layer; 8. nhid2: [integer] number of neurons of second hidden layer; 9. input_path: [string] relative/absolute pathname of the input data file; 10. 1: fixed (internal reasons); 11. trainedweightsfile: [string] relative pathname of the model trained weights file; 12. trainedparamsfile: [string] relative pathname of the model trained parameters file; 13. exp_dir: [string] relative pathname for the output. 5 Input/Output The following sections describe the type and contents of the input/output interface, depending on the functional use case. 5.1 Input Input data must be a file in CSV format, without metadata headers. For classification the class target columns associated to each input pattern must be represented in binary codification (es. 100, 010, 001 for 3-class labels). 5.2 Training Output When executed under training use case, the output is composed by following files, stored into a predefined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different sub-trees, depending on the functionality domain of the current execution: 7

8 -./ex/cl for the classification case -./ex/re for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; trainlog.txt: log file with detailed information about experiment configuration, main results and parameter setup; trainpartialerror.txt: ascii (space separated) file with partial values at each training iteration of the QNA algorithm. Useful to obtain a graphical view of the learning process. Each row is composed by three columns: o training step; o number of iterations of current step (number of Hessian approximations <= MaxIts); o current step batch error (MSE or Cross Entropy value if selected in classification mode); trainedweights.txt: final network weights frozen at the end of batch training. It can be used in a new training experiment to restore old one; frozen_train_net.txt: internal network node values as frozen at the end of training, to be given as network input file in test/run cases; traintestoutlog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; train_output.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of traintestoutlog.txt, for internal use only); traintestconfmatrix.txt: confusion matrix calculated at the end of training. It results from the values stored into the traintestoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole training results. In the case of regression it is an adapted version; 5.3 Test Output When executed under training use case, the output is composed by following files, stored into a predefined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different sub-trees, depending on the functionality domain of the current execution: -./ex/cl for the classification case -./ex/re for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; testoutlog.txt: output values as calculated after test, with respective target values. It can be used to evaluate the network output for each input pattern; test_output.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of testoutlog.txt, for internal use only); 8

9 testconfmatrix.txt: confusion matrix calculated at the end of test. It results from the values stored into the testoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole test results. In the case of regression it is an adapted version; 5.4 Run Output When executed under training use case, the output is composed by following files, stored into a predefined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different sub-trees, depending on the functionality domain of the current execution: -./ex/cl for the classification case -./ex/re for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; run_output.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; 9

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.0 Date: July 28, 2011 Author: M. Brescia, S. Riccardi Doc. : BetaRelease_Model_MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.0

More information

Multi Layer Perceptron trained by Quasi Newton Algorithm

Multi Layer Perceptron trained by Quasi Newton Algorithm Multi Layer Perceptron trained by Quasi Newton Algorithm MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.2 Author: M. Brescia, S. Riccardi Doc. : MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.2 1 Index 1 Introduction...

More information

Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network

Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network MLPQNA/LEMON User Manual DAME-MAN-NA-0015 Issue: 1.3 Author: M. Brescia, S. Riccardi Doc. : MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.3

More information

DAta Mining Exploration Project

DAta Mining Exploration Project Pag. 1 of 31 DAta Mining Exploration Project General Purpose Multi Layer Perceptron Neural Network (trained by Back Propagation & Quasi-Newton) Data Mining Model User Manual DAME-MAN-NA-0008 Doc. : Issue:

More information

Experimental Data and Training

Experimental Data and Training Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information



More information


LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

Multi Layer Perceptron trained by Quasi Newton learning rule

Multi Layer Perceptron trained by Quasi Newton learning rule Multi Layer Perceptron trained by Quasi Newton learning rule Feed-forward neural networks provide a general framework for representing nonlinear functional mappings between a set of input variables and

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Multi Layer Perceptron with Back Propagation. User Manual

Multi Layer Perceptron with Back Propagation. User Manual Multi Layer Perceptron with Back Propagation User Manual DAME-MAN-NA-0011 Issue: 1.3 Date: September 03, 2013 Author: S. Cavuoti, M. Brescia Doc. : MLPBP_UserManual_DAME-MAN-NA-0011-Rel1.3 1 INDEX 1 Introduction...

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey. Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews Tom Kelsey ID5059-13-NN

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews Tom Kelsey ID5059-13-NN

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

User Manual Release 1.2

User Manual Release 1.2 arxiv:1602.05408v1 [] 17 Feb 2016 Photometric Research Application To Redshifts User Manual Release 1.2 S. Cavuoti 1, M. Brescia 1, V. De Stefano 2, G. Longo 2 1 - INAF - Astronomical Observatory

More information

Notes on Multilayer, Feedforward Neural Networks

Notes on Multilayer, Feedforward Neural Networks Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Introduction Neural networks are flexible nonlinear models that can be used for regression and classification

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

6. Linear Discriminant Functions

6. Linear Discriminant Functions 6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Logistic Regression

Logistic Regression Logistic Regression 2016-05-26 Agenda Model Specification Model Fitting Bayesian Logistic Regression Online Learning and Stochastic Optimization Generative versus Discriminative Classifiers

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

A Neural Network Model Of Insurance Customer Ratings

A Neural Network Model Of Insurance Customer Ratings A Neural Network Model Of Insurance Customer Ratings Jan Jantzen 1 Abstract Given a set of data on customers the engineering problem in this study is to model the data and classify customers

More information

I211: Information infrastructure II

I211: Information infrastructure II Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class

More information

Logical Rhythm - Class 3. August 27, 2018

Logical Rhythm - Class 3. August 27, 2018 Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur

Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

For Monday. Read chapter 18, sections Homework:

For Monday. Read chapter 18, sections Homework: For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

Machine Learning using Matlab. Lecture 3 Logistic regression and regularization

Machine Learning using Matlab. Lecture 3 Logistic regression and regularization Machine Learning using Matlab Lecture 3 Logistic regression and regularization Presentation Date (correction) 10.07.2017 11.07.2017 17.07.2017 18.07.2017 24.07.2017 25.07.2017 Project proposals 13 submissions,

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Predict the box office of US movies

Predict the box office of US movies Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such

More information

Report: Privacy-Preserving Classification on Deep Neural Network

Report: Privacy-Preserving Classification on Deep Neural Network Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information


CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one

More information

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions ENEE 739Q SPRING 2002 COURSE ASSIGNMENT 2 REPORT 1 Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions Vikas Chandrakant Raykar Abstract The aim of the

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

Model Answers to The Next Pixel Prediction Task

Model Answers to The Next Pixel Prediction Task Model Answers to The Next Pixel Prediction Task December 2, 25. (Data preprocessing and visualization, 8 marks) (a) Solution. In Algorithm we are told that the data was discretized to 64 grey scale values,...,

More information

Logistic Regression. Abstract

Logistic Regression. Abstract Logistic Regression Tsung-Yi Lin, Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl60} January 4, 013 Abstract Logistic regression

More information

Univariate and Multivariate Decision Trees

Univariate and Multivariate Decision Trees Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each

More information

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. Mohammad Mahmudul Alam Mia, Shovasis Kumar Biswas, Monalisa Chowdhury Urmi, Abubakar

More information

Hyperparameters and Validation Sets. Sargur N. Srihari

Hyperparameters and Validation Sets. Sargur N. Srihari Hyperparameters and Validation Sets Sargur N. 1 Topics in Machine Learning Basics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

5 Machine Learning Abstractions and Numerical Optimization

5 Machine Learning Abstractions and Numerical Optimization Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information


COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 6: k-nn Cross-validation Regularization LEARNING METHODS Lazy vs eager learning Eager learning generalizes training data before

More information



More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson:

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

S2 Text. Instructions to replicate classification results.

S2 Text. Instructions to replicate classification results. S2 Text. Instructions to replicate classification results. Machine Learning (ML) Models were implemented using WEKA software Version 3.8. The software can be free downloaded at this link:

More information

Machine Learning in Telecommunications

Machine Learning in Telecommunications Machine Learning in Telecommunications Paulos Charonyktakis & Maria Plakia Department of Computer Science, University of Crete Institute of Computer Science, FORTH Roadmap Motivation Supervised Learning

More information

How Learning Differs from Optimization. Sargur N. Srihari

How Learning Differs from Optimization. Sargur N. Srihari How Learning Differs from Optimization Sargur N. 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical

More information

Tested Paradigm to Include Optimization in Machine Learning Algorithms

Tested Paradigm to Include Optimization in Machine Learning Algorithms Tested Paradigm to Include Optimization in Machine Learning Algorithms Aishwarya Asesh School of Computing Science and Engineering VIT University Vellore, India International Journal of Engineering Research

More information

The Fly & Anti-Fly Missile

The Fly & Anti-Fly Missile The Fly & Anti-Fly Missile Rick Tilley Florida State University (USA) Abstract Linear Regression with Gradient Descent are used in many machine learning applications. The algorithms are

More information

Programming Exercise 4: Neural Networks Learning

Programming Exercise 4: Neural Networks Learning Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information


COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence

More information

DAME Web Application REsource Plugin Creator User Manual

DAME Web Application REsource Plugin Creator User Manual DAME Web Application REsource Plugin Creator User Manual DAMEWARE-MAN-NA-0016 Issue: 2.1 Date: March 20, 2014 Authors: S. Cavuoti, A. Nocella, S. Riccardi, M. Brescia Doc. : ModelPlugin_UserManual_DAMEWARE-MAN-NA-0016-Rel2.1

More information

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016 CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation

More information

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick

Neural Networks. Theory And Practice. Marco Del Vecchio 19/07/2017. Warwick Manufacturing Group University of Warwick Neural Networks Theory And Practice Marco Del Vecchio Warwick Manufacturing Group University of Warwick 19/07/2017 Outline I 1 Introduction 2 Linear Regression Models 3 Linear

More information

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

A neural network that classifies glass either as window or non-window depending on the glass chemistry. A neural network that classifies glass either as window or non-window depending on the glass chemistry. Djaber Maouche Department of Electrical Electronic Engineering Cukurova University Adana, Turkey

More information

Data Preprocessing. Supervised Learning

Data Preprocessing. Supervised Learning Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

Exercise: Training Simple MLP by Backpropagation. Using Netlab. Exercise: Training Simple MLP by Backpropagation. Using Netlab. Petr Pošík December, 27 File list This document is an explanation text to the following script: demomlpklin.m script implementing the beckpropagation

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information