DAta Mining Exploration Project
|
|
- Chad Williams
- 6 years ago
- Views:
Transcription
1 Pag. 1 of 31 DAta Mining Exploration Project General Purpose Multi Layer Perceptron Neural Network (trained by Back Propagation & Quasi-Newton) Data Mining Model User Manual DAME-MAN-NA-0008 Doc. : Issue: 1.0 Date: September 02, 2010 Prepared by M. Brescia 02/09/2010 Released by G. Longo 02/09/2010 1
2 Pag. 2 of 31 Revision Matrix Issue Author Date Section/Paragraph Reason/Initiation/ Affected Documents/Remarks 0.1 M. Brescia 02/09/2010 All First draft release 2
3 Pag. 3 of 31 INDEX 1 Reference & Applicable Documents Abbreviations & Acronyms Introduction Design Issues The MLP implementation overview The BP implementation The QNA implementation System Architectural Design Chosen System Architecture System Interface description Wrapping design & implementations requirements User Interface description Input dataset format MLP-BP wrapping requirements and execution details TRAIN USE CASE TEST USE CASE RUN USE CASE MLP-QNA wrapping requirements and execution details TRAIN USE CASE TEST USE CASE RUN USE CASE STATISTICAL TRAIN USE CASE APPENDIX Scientific case test with MLP-QNA The Science case Test procedure and results TABLE INDEX Tab. 1 Reference & Applicable Documents... 5 Tab. 2 Abbreviations and acronyms... 6 Tab. 3 Test results FIGURE INDEX Fig. 1 MLP architecture... 8 Fig. 2 bipolar sigmoid activation function... 9 Fig. 3 Execution time comparison between the 4 tests Fig. 4 Training iterations comparison between the 4 tests
4 Pag. 4 of 31 This page is intentionally left blank 4
5 Pag. 5 of 31 1 Reference & Applicable Documents INDEX Reference Author Date 1 sdd_template_voneural-sdd-na-0000-rel0.1 Software Design Description Document Guidelines M. Brescia 18/12/ SuiteDesign_VONEURAL-PDD-NA-0001-Rel2.0 Suite Project Description Document 3 DMPlugins_DAME-TRE-NA-0016-Rel0.3 deployed Model-functionality DMPlugins Description report 4 dm-model_voneural-sdd-na-0008-rel1.2 Data Mining Model Component Software Design Description 5 framework_voneural-sdd-na-0005-rel1.0 VO-Neural team A.DiGuido, M. Brescia S. Cavuoti, A. Di Guido O. Laurino, 15/10/ /04/ /05/ /05/2009 Framework Component Software Design Description 6 Neural Networks PC Tools A practical guide Academic Press 7 "Improving The Learning Speed of 2-layer Neural Networks by Choosing Initial Values of The Adaptive Weights", IJCNN, USA 8 "A Comparison among Weight Initialization Methods for Multilayer Feedforward Networks," IJCN, Italy 9 "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods", Mathematical Programming, 63, 4, pp M. Fiore R. C. Eberhart, R.W. Dobbins Nguyen, D. Widrow, B. Mercedes Fernández- Redondo, Carlos Hernández- Espinosa Byrd, R.H, Nocedal,J., and Schnabel, R.B Tab. 1 Reference & Applicable Documents 5
6 Pag. 6 of 31 2 Abbreviations & Acronyms A & A BFGS BP CE CSV DAME DM DMM GP GRID L-BFGS MLP MSE NN OOP QNA SA SVM TS TBC UML VO XML Broyden-Fletcher-Goldfarb-Shanno Back Propagation Cross Entropy Comma Separated Value Data Mining & Exploration Data Mining Data Mining Model General Purpose Global Resource Information Database Limited memory BFGS Multi Layer Perceptron Mean Square Error Neural Network Object Oriented Programming Quasi Newton Algorithm Stand Alone Support Vector Machine Tournament Selection To Be Completed Unified Modeling Language Virtual Observatory extensible Markup Language Meaning Tab. 2 Abbreviations and acronyms 6
7 Pag. 7 of 31 3 Introduction This document deals with a data mining (DM) model, used to solve not linear problems optimization. It is based on the design of a general purpose feed-forward neural network architecture, in order to obtain a Soft Computing instrument implementing supervised learning. Classical MLP architecture is used associated to a double type of learning algorithm: the standard weight gradient descendent rule named Back Propagation (BP); the statistical Quasi-Newton Algorithm (QNA); In the MLP-BP case, both batch and on-line learning modes are available, while in the MLP-QNA case only batch learning is achieved. Hereinafter the term MLP-GP is used to indicate the general model implemented, while MLP-BP is used to focalize the MLP-GP associated to the BP algorithm and MLP-QNA when the MLP is associated to the QNA learning algorithm. 3.1 Design Issues The model described here is intended to become one of the DM models officially integrated into the Suite DAME. To achieve this goal a set of standardization rules is followed, in order to make the package compliant with the specific environment specifications, [2, 3, 4, 5]. These guidelines are basically related to input/output data format, compiling and execution dependencies, DMM wrapper conditions and requirements The MLP implementation overview The MLP architecture is one of the most typical feed-forward neural network model. The term feed-forward is used to identify basic behavior of such neural models, in which the impulse is propagated always in the same direction, e.g. from neuron input layer towards output layer, through one or more hidden layers (the network brain), by combining weighted sum of weights associated to all neurons (except the input layer). As easy to understand, the neurons are organized in layers, with proper own role. The input signal, simply propagated throughout the neurons of the input layer, is used to stimulate next hidden and output neuron layers. The output of each neuron is obtained by means of an activation function, applied to the weighted sum of its inputs. Different shape of this activation function can be applied, from the simplest linear one up to sigmoid. The number of hidden layers represents the degree of the complexity achieved for the energy solution space in which the network output moves looking for the best solution. As an example, in a typical classification problem, the number of hidden layers indicates the number of hyper-planes used to split the parameter space (i.e. number of possible classes) in order to classify each input pattern. What is different in such a neural network architecture is typically the learning algorithm used to train the network. It exists a dichotomy between supervised and unsupervised learning methods. 7
8 Pag. 8 of 31 Fig. 1 MLP architecture In the first case, the network must be firstly trained (training phase), in which the input patterns are submitted to the network as couples (input, desired known output). The feed-forward algorithm is then achieved and at the end of the input submission, the network output is compared with the corresponding desired output in order to quantify the learning quote. It is possible to perform the comparison in a batch way (after an entire input pattern set submission) or incremental (the comparison is done after each input pattern submission) and also the metric used for the distance measure between desired and obtained outputs, can be chosen accordingly problem specific requirements (in the MLP-BP the MSE, Mean Square Error, is used). After each comparison and until a desired error distance is unreached (typically the error tolerance is a precalculated value or a constant imposed by the user), the weights of hidden layers must be changed accordingly to a particular law or learning technique. After the training phase is finished (or arbitrarily stopped), the network should be able not only to recognize correct output for each input already used as training set, but also to achieve a certain degree of generalization, i.e. to give correct output for those inputs never used before to train it. The degree of generalization varies, as obvious, depending on how good has been the learning phase. This important feature is realized because the network doesn t associates a single input to the output, but it discovers the relationship present behind their association. After training, such a neural network can be seen as a black box able to perform a particular function (input-output correlation) whose analytical shape is a priori not known. In order to gain the best training, it must be as much homogeneous as possible and able to describe a great variety of samples. Bigger the training set, higher will be the network generalization capability. Despite of these considerations, it should always taken into account that neural networks application field should be usually referred to problems where it is needed high flexibility (quantitative result) more than high precision (qualitative results). Concerning the hidden layer choice, there is the possibility to define zero hidden layers (SLP, Single Layer Perceptron, able to solve only linear separation of the parameter space), 1 or 2 hidden layers, depending on the complexity the user wants to introduce in the not linear problem solving experiment. Second learning type (unsupervised) is basically referred to neural models able to classify/cluster patterns onto several categories, based on their common features, by submitting training inputs without related desired outputs. This is not the learning case approached with the MLP architecture, so it is not important to add more information in this document. 8
9 Pag. 9 of The BP implementation In feed-forward process, the network will calculate the output based on the given input. We use bipolar logistic function as the activation function in hidden and output layer. While in input layer, I use unity function. Choosing an appropriate activation function can also contribute to a much faster learning. Theoretically, sigmoid function with less saturation speed will give a better result. f f sigmoid ' sigmoid 2 ( x) = 1 σ x 1+ e ( x) = 2σ e σ x σ x ( e + 1) 2 Fig. 2 bipolar sigmoid activation function It can be manipulated its slope and see how it affects the learning speed. A larger slope will make weight values move faster to saturation region (faster convergence), while smaller slope will make weight values move slower but it allows a refined weight adjustment. Next, it will compare this calculated output to the desired output to calculate the error. The next mission is to minimize this error. What method we choose for minimizing this error will also determine the learning speed. Gradient descent method is the most common for minimizing this error. Finally, it will update the weight value as the following: where: 9
10 Pag. 10 of 31 Besides this gradient descent method, there are several other methods that will guarantee a faster learning speed. In this case, we can make the classical BP learning process much faster by adding momentum term or by using adaptive learning rate. The feedforward network error is calculated with the standard MSE function. In momentum learning, weight update at time (t+1) contains momentum of the previous learning. So we need to keep the previous value of error and output. The equation above can be implemented as the following. Variable α is the momentum value. The value should be greater than zero and smaller than one. In the MLP-BP implementation, the momentum is not the unique improvement done to the standard BP algorithm. In our case we have also implemented an adaptive learning rule, [7], described in the following. For adaptive learning, the idea is to change the learning rate automatically based on current error and previous error. The formula is: The idea is to observe the last two errors and adjust the learning rate in the direction that would have reduced the second error. Both variable E and Ei are the current and previous error. Parameter A is a parameter that will determine how rapidly the learning rate is adjusted. Parameter A should be less than one and greater than zero. You can also try another method by multiplying the current learning rate with a factor greater than one if current error is smaller than previous error. And if current error is bigger than previous error, you can multiply it with a factor less than one. In literature it is also suggested to discard the changes if the error is increasing. This will lead into a better result. The adaptive learning routine is in the function ann_train_network_from_file where learning rate update is performed either in on-line (updated epoch-by-epoch after each single pattern presentation) or batch (updated after a whole dataset presentation). Moreover, concerning the MLP network weight initialization, several alternative methods have been implemented, [8]. It is known that the particular initialization values give influences to the speed of convergence. There are several methods available for this purpose. The most common is by initializing the weights at random with uniform distribution inside the interval of a certain small range of number. In the MLP-BP we call this method HARD_RANDOM. Another better method is by bounding the range as expressed in the equation below. We call this method with just RANDOM. 10
11 Pag. 11 of 31 Widely known as a very good weight initialization method is the Nguyen-Widrow method. We call this method as NGUYEN. Nguyen-Widrow weight initialization algorithm can be expressed as the following steps: As stated in the algorithm as written above, first, we assign random number of -1 to 1 to all hidden nodes. Next, we calculate the norm of these random numbers that we have generated by calling function get_norm_of_weight. Now we have all the necessary data and we can proceed to the available formula. All the weight initialization routines are located in function initialize_weights. It is also possible to resume a trained network, to perform a further training session, starting from the previous final weight setup. In this case the user should set the method FROM_FILE and specifying the stored weights file name as inputs The QNA implementation In this case, the BP algorithm is completely replaced by an adapted version of the classical Newton method for optimization problems. The Newton method is the general basis for a whole family of so called Quasi-Newton methods. One of those methods, implemented here is the L-BFGS algorithm, [9]. More rigorously, the QNA is an optimization of learning rule, also because, as described below, the implementation is based on a statistical approximation of the Hessian by cyclic gradient calculation, that, as said in the previous section, is at the base of BP method. As known, the classical Newton method uses the Hessian of a function. The step of the method is defined as a product of an inverse Hessian matrix and a function gradient. If the function is a positive definite quadratic form, we can reach the function minimum in one step. In case of an indefinite quadratic form (which has no minimum), we will reach the maximum or saddle point. In short, the method finds the stationary point of a quadratic form. In practice, we usually have functions which are not quadratic forms. If such a function is smooth, it is sufficiently good described by a quadratic form in the minimum neighborhood. However, the Newton method can converge both to a minimum and a maximum (taking a step into the direction of a function increasing). Quasi-Newton methods solve this problem as follows: they use a positive definite approximation instead of a Hessian. If Hessian is positive definite, we make the step using the Newton method. If Hessian is indefinite, we modify it to make it positive definite, and then perform a step using the Newton method. The step is always performed in the direction of the function decrement. In case of a positive definite Hessian, we use it to generate a quadratic surface approximation. This should make the convergence better. If Hessian is indefinite, we just move to where function decreases. 11
12 Pag. 12 of 31 Some modifications of Quasi-Newton methods perform a precise linear minimum search along the indicated line, but it is proved that it's enough to sufficiently decrease the function value, and not necessary to find a precise minimum value. The L-BFGS algorithm tries to perform a step using the Newton method. If it does not lead to a function value decreasing, it lessens the step length to find a lesser function value. Up to here it seems quite simple but it is not! The Hessian of a function isn't always available and in many cases is too much complicated; more often we can only calculate the function gradient. Therefore, the following operation is used: the Hessian of a function is generated on the basis of the N consequent gradient calculations, and the quasi-newton step is performed. There is a special formulas which allows to iteratively get a Hessian approximation. On each step approximation, the matrix remains positive definite. The algorithm uses the L-BFGS update scheme. BFGS stands for Broyden-Fletcher-Goldfarb- Shanno (more precisely, this scheme generates not the Hessian, but its inverse matrix, so we don't have to waste time inverting a Hessian). The L letter in the scheme name comes from the words "Limited memory". In case of big dimensions, the amount of memory required to store a Hessian (N 2 ) is too big, along with the machine time required to process it. Therefore, instead of using N gradient values to generate a Hessian we can use a smaller number of values, which requires a memory capacity of order of N M. In practice, M is usually chosen from 3 to 7, in difficult cases it is reasonable to increase this constant to 20. Of course, as a result we'll get not the Hessian but its approximation. On the one hand, the convergence slows down. On the other hand, the performance could even grow up. At first sight, this statement is paradoxical. But it contains no contradictions: the convergence is measured by a number of iterations, whereas the performance depends on the number of processor's time units spent to calculate the result. As a matter of fact, this method was designed to optimize the functions of a number of arguments (hundreds and thousands), because in this case it is worth having an increasing iteration number due to the lower approximation precision because the overheads become much lower. This is particularly useful in astrophysical data mining problems, where usually the parameter space is dimensionally huge and confused by a low signal-to-noise ratio. But we can use these methods for small dimension problems too. The main advantage of the method is scalability, because it provides high performance when solving high dimensionality problems, and it allows to solve small dimension problems too. From the implementation point of view, in the MLP-QNA case the following features are available for the end user: only batch learning mode is available; Strict separation between classification and regression functionality modes; For classification mode, the Cross Entropy method is available to compare output and target network values. It is possible to alternatively use standard MSE rule, that is mandatory for regression mode; K-fold cross validation method to improve training performances and to avoid overfitting problems; Resume training from past experiments, by using the weights stored in an external file at the end of the training phase; Confusion matrix calculated and stored in an external file for both classification and regression modes (in the last case an adapted version is provided). It is useful after training and test sessions to evaluate model performances; 12
13 Pag. 13 of 31 4 System Architectural Design 4.1 Chosen System Architecture The choice of the MLP-GP system architecture is not free, but bounded by the specific requirements issued by the DAME Suite environment. The MLP-GP is one of the supported DM models to be integrated into the Suite infrastructure, both in terms of I/O data format, XML parameter description, functionality association (design pattern integration as specified in [4]) and DMPlugin package constraints, as specified in [3] and [5]. 4.2 System Interface description Wrapping design & implementations requirements In order to wrap MLP-GP model (library implemented in C++) into DAME Suite, we have to create a java class called MLPGP.java. DAME Suite have a class interface called DMMInterface which represents a generic data mining model that can be added in DAME Suite. Therefore the MLPGP java class must implement DMMInterface class and its specified use cases: Train; Test; Run; Full (as sequential combination of previous ones); This wrapping phase is foreseen to be provided as soon as possible. 4.3 User Interface description In order to be integrated into DAME suite, the code of MLP-GP has been structured taking into account the DMM design pattern requirements and the distinction into different functionality and use case constraints. The MLP-GP has to be put into the supervised model hierarchy associated with specific functionality modes. The MLP-GP, in particular with the QNA learning rule, can be used for the following functionality modes: Classification; Regression; Also the use case requirements mentioned in the previous section has been strictly followed (train, test, run, full use cases allowed). At run time the program can be executed under the form of formatted command line. 13
14 Pag. 14 of 31 Depending on the different use case and functionality of the current experiment, the following are the details of command line parameters to be specified to execute the MLP-GP. In principle the way to execute the program is by compiling a command line as suffix string for the executable program mycnn_bp.exe. In all following cases the command line must be respected in terms of number and order of parameters Input dataset format The dataset input file (both with target columns for train and test cases, and without target columns for run case) to be accepted by the program MLP-BP is exclusively CSV format with no header or special character at the beginning of the file. IMPORTANT: the file MUST be provided WITHOUT any header and with NO CARRIAGE RETURN after last pattern row!!! MLP-BP wrapping requirements and execution details Here the details about the launch of MLP with BP are reported TRAIN USE CASE The following are pre-formatted command lines to be used training command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 0 TRAIN_BP (training case); 3. Learning rate [double]. The BP learning rate, in the range [0, 1]; 4. Momentum factor [double]. The momentum value in the range [0, 1]; 5. Learning changing factor [double]. Used for adaptive learning rule, in the range ]0, 1[; 6. Slope sigma argument of bimodal sigmoid activation function [double]; 7. Number of input neurons [integer]. It must match the number of input dataset columns; 8. Number of output neurons [integer]. It must match the number of target dataset columns; 9. Number of hidden layers [integer]. It may be 0, 1 or 2; 10. Number of first hidden layer neurons [integer]. If parameter 9 is 0, then this field is not considered; 11. Number of second hidden layer neurons [integer]. If parameter 9 is < 2, then this field is not considered; 12. Weight initializing rule [integer]: 14
15 Pag. 15 of 31 a. 701 HARD_RANDOM. Random values with user specified range [-m_init_val, +m_init_val]; b. 702 RANDOM. Random range into [-1, +1]; c. 703 NGUYEN --> first scales into [-1, +1] then apply specific rule (see section 3.1.2); d. 704 FROM_FILE --> already created weight file used. This is the case of resume training or test/run sessions, where an already trained network must be used; 13. Range of user defined range of weight initialization [double]. Used only if the selected rule as parameter 12 is HARD_RANDOM, otherwise it is not considered; 14. Training input dataset file name (with full relative path) [character string]; 15. Training log file name (with full relative path) [character string]; 16. Partial training error file name (with full relative path) [character string]; 17. Training network weight file name (with full relative path) [character string]; 18. Number of training iterations [integer]. Stop condition; 19. Learning MSE error threshold [double]. Stop condition; 20. Training input dataset internal column order [integer]: a. 705 INPUT_FIRST. The file contains input columns first and then the target columns; b. 706 OUTPUT_FIRST. The file contains target columns first and then the input columns; 21. Training mode [integer]: a. 300 BATCH; b. 301 INCREMENTAL (on-line mode to update the network weights); As examples of command lines: training command line: mycnn_bp input.txt trainlog.txt trainpartialerror.txt trainedweights.txt Training Output The following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; <log file name>: user defined log file with information about experiment results; <partial training error file name>: user defined file with partial error values at each training iteration. Useful to obtain a graphical view of the learning process <trained weights file name>: final network weights frozen at the end of training. It can be used in a new training experiment to restore old one; All these files have to be registered as official output files of the experiment. 15
16 Pag. 16 of TEST USE CASE The following are pre-formatted command lines to be used test command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 1 TEST_BP (test case); 3. Slope sigma argument of bimodal sigmoid activation function [double]; 4. Number of input neurons [integer]. It must match the number of input dataset columns; 5. Number of output neurons [integer]. It must match the number of target dataset columns; 6. Number of hidden layers [integer]. It may be 0, 1 or 2; 7. Number of first hidden layer neurons [integer]. If parameter 6 is 0, then this field is not considered; 8. Number of second hidden layer neurons [integer]. If parameter 6 is < 2, then this field is not considered; 9. Input weight file name (with full relative path) [character string]; 10. Test input dataset file name (with full relative path) [character string]; 11. Test output log file name (with full relative path) [character string]; 12. Test input dataset internal column order [integer]: a. 705 INPUT_FIRST. The file contains input columns first and then the target columns; b. 706 OUTPUT_FIRST. The file contains target columns first and then the input columns; As examples of command lines: test command line: mycnn_bp trainedweights.txt input.txt testlog.txt 705 Test Output The following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; <log file name>: user defined log file with information about experiment results; 16
17 Pag. 17 of 31 All these files have to be registered as official output files of the experiment RUN USE CASE The following are pre-formatted command lines to be used run command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 2 RUN_BP (run case); 3. Slope sigma argument of bimodal sigmoid activation function [double]; 4. Number of input neurons [integer]. It must match the number of input dataset columns; 5. Number of output neurons [integer]. It must match the number of target dataset columns; 6. Number of hidden layers [integer]. It may be 0, 1 or 2; 7. Number of first hidden layer neurons [integer]. If parameter 6 is 0, then this field is not considered; 8. Number of second hidden layer neurons [integer]. If parameter 6 is < 2, then this field is not considered; 9. Input weight file name (with full relative path) [character string]; 10. Run input dataset file name (with full relative path) [character string]; 11. Run output log file name (with full relative path) [character string]; As examples of command lines: run command line: mycnn_bp trainedweights.txt run.txt runlog.txt Run Output The following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; <log file name>: user defined log file with information about experiment results; All these files have to be registered as official output files are those underlined in the previous list. 17
18 Pag. 18 of MLP-QNA wrapping requirements and execution details Here the details about the launch of MLP with QNA are reported TRAIN USE CASE The following are pre-formatted command lines to be used training command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 3 TRAIN_QNA (training case); 3. Decay [double]. Weight decay constant, (>=0.001). Decay term 'Decay* Weights ^2' is added to error function. Default value = 0.001; 4. Restarts [integer]. Number of restarts from random position, >0. If you don't know what Restarts to choose, use 2. (THIS IS THE NUMBER OF MAX TRAINING CYCLES PERFORMED ANYWAY). Default value = 20; 5. Wstep [double]. Stopping criterion. Algorithm stops if step size is less than WStep. Recommended value = Zero step size means stopping after MaxIts iterations. Default value = 0.01; 6. MaxIts [integer]. Stopping criterion. Algorithm stops after MaxIts iterations (NOT gradient calculations). Zero MaxIts means stopping when step is sufficiently small (use Wstep). Default value = 1500; 7. Number of input neurons [integer]. It must match the number of input dataset columns; 8. Number of output neurons [integer]. It must match the number of target dataset columns; 9. Number of hidden layers [integer]. It may be 0, 1 or 2; 10. Number of first hidden layer neurons [integer]. If parameter 9 is 0, then this field is not considered; 11. Number of second hidden layer neurons [integer]. If parameter 9 is < 2, then this field is not considered; 12. Cross Entropy flag [integer]. Used in classification mode only. In case of regression mode this parameter is not considered. Default value = 0. a. 0 not used (standard MSE used); b. 1 used; 13. Name of input dataset file (with full relative path) [character string]; 14. K-fold cross validation flag [integer]. Default value = 0. a. 0 not used (standard training without validation); b. 1 used (training with validation sequence); 15. K-fold cross validation k value [integer]. If parameter 14 is 0 then this parameter is not considered. Default value = 5; 18
19 Pag. 19 of Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); 17. Weight initialization choice [integer]. It issues how to initialize network weights. It is possible to resume a previous training phase: a. 702 RANDOM initialization between [-1, +1]; b. 704 FROM_FILE. To be used in case of past training resume; 18. Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]. To be used in case of parameter 17 set to FROM_FILE value. If parameter 17 is RANDOM, this is not considered. As examples of command lines: Classification mode training command line: mycnn_bp datasets/agn_7_stat_full.txt none In this case (red fields): - Network with 1 hidden layer of 15 neurons - It uses cross entropy (flag set to 1) - It uses cross validation (flag set to 1) with k = 10 - Weights are initialized randomly (702) with weight file name none (not used) mycnn_bp datasets/agn_7_stat_full.txt experiments/regression/trainedweights.txt In this case (red fields): - Network with 2 hidden layers of, respectively, 15 and 6 neurons - It does not use cross entropy (flag set to 0), but simple MSE error - It does not use cross validation (flag set to 0) with k = 10 not considered - Weights are initialized restoring past training (704) with weight file name experiments/regression/trainedweights.txt Regression mode training command line: 19
20 Pag. 20 of 31 mycnn_bp datasets/agn_7_stat_full.txt experiments/regression/trainedweights.txt Training Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; trainlog.txt: log file with detailed information about experiment configuration, main results and parameter setup; trainpartialerror.txt: ascii (space separated) file with partial values at each training iteration of the QNA algorithm. Useful to obtain a graphical view of the learning process. Each row is composed by three columns: o training step; o number of iterations of current step (number of Hessian approximations <= MaxIts); o current step batch error (MSE or Cross Entropy value if selected in classification mode); trainedweights.txt: final network weights frozen at the end of batch training. It can be used in a new training experiment to restore old one; frozen_train_net.txt: internal network node values as frozen at the end of training, to be given as network input file in test/run cases; traintestoutlog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; temptrash.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of traintestoutlog.txt, for internal use only); traintestconfmatrix.txt: confusion matrix calculated at the end of training. It results from the values stored into the traintestoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole training results. In the case of regression it is an adapted version; Some of the above files, as described, are not very useful for the end user, being created for internal use only. In particular, main important files to be registered as official output files are those underlined in the previous list. 20
21 Pag. 21 of TEST USE CASE The following are pre-formatted command lines to be used test command line parameters (ALL parameters are required in a strict sequential order) 12. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 13. Use case [integer]: a. 4 TEST_QNA (test case); 14. Number of input neurons [integer]. It must match the number of input dataset columns; 15. Number of output neurons [integer]. It must match the number of target dataset columns; 16. Number of hidden layers [integer]. It may be 0, 1 or 2; 17. Number of first hidden layer neurons [integer]. If parameter 5 is 0, then this field is not considered; 18. Number of second hidden layer neurons [integer]. If parameter 5 is < 2, then this field is not considered; 19. Name of input dataset file (with full relative path) [character string]; 20. Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); 21. Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]; 22. Name of the file (with full relative path if loaded from different directory) with internal network node values as frozen at the end of training phase [character string]; As examples of command lines: Classification mode test command line: mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/classification/trainedweights.txt experiments/classification/frozen_train_net.txt Regression mode test command line: 21
22 Pag. 22 of 31 mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/regression/trainedweights.txt experiments/regression/frozen_train_net.txt Test Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; testoutlog.txt: output values as calculated after test, with respective target values. It can be used to evaluate the network output for each input pattern; temptrash.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of testoutlog.txt, for internal use only); testconfmatrix.txt: confusion matrix calculated at the end of test. It results from the values stored into the testoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole test results. In the case of regression it is an adapted version; All these files have to be registered as official output files are those underlined in the previous list RUN USE CASE The following are pre-formatted command lines to be used run command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 4 RUN_QNA (run case); 3. Number of input neurons [integer]. It must match the number of input dataset columns; 4. Number of output neurons [integer]. It must match the number of target dataset columns; 5. Number of hidden layers [integer]. It may be 0, 1 or 2; 6. Number of first hidden layer neurons [integer]. If parameter 5 is 0, then this field is not considered; 22
23 Pag. 23 of Number of second hidden layer neurons [integer]. If parameter 5 is < 2, then this field is not considered; 8. Name of input dataset file (with full relative path) [character string]; 9. Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); 10. Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]; 11. Name of the file (with full relative path if loaded from different directory) with internal network node values as frozen at the end of training phase [character string]; As examples of command lines: Classification mode run command line: mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/classification/trainedweights.txt experiments/classification/frozen_train_net.txt Regression mode run command line: mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/regression/trainedweights.txt experiments/regression/frozen_train_net.txt Run Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; 23
24 Pag. 24 of 31 RunOutLog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; All these files have to be registered as official output files are those underlined in the previous list STATISTICAL TRAIN USE CASE The following are pre-formatted command lines to be used Statistical training command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 6 STAT_TRAIN_QNA (statistical training case). It generates a verbose log file reporting a step-by-step training procedure with incremental dimension of input dataset (see related section below for details); 3. Decay [double]. Weight decay constant, (>=0.001). Decay term 'Decay* Weights ^2' is added to error function. Default value = 0.001; 4. Restarts [integer]. Number of restarts from random position, >0. If you don't know what Restarts to choose, use 2. (THIS IS THE NUMBER OF MAX TRAINING CYCLES PERFORMED ANYWAY). Default value = 20; 5. Wstep [double]. Stopping criterion. Algorithm stops if step size is less than WStep. Recommended value = Zero step size means stopping after MaxIts iterations. Default value = 0.01; 6. MaxIts [integer]. Stopping criterion. Algorithm stops after MaxIts iterations (NOT gradient calculations). Zero MaxIts means stopping when step is sufficiently small (use Wstep). Default value = 1500; 7. Number of input neurons [integer]. It must match the number of input dataset columns; 8. Number of output neurons [integer]. It must match the number of target dataset columns; 9. Number of hidden layers [integer]. It may be 0, 1 or 2; 10. Number of first hidden layer neurons [integer]. If parameter 9 is 0, then this field is not considered; 11. Number of second hidden layer neurons [integer]. If parameter 9 is < 2, then this field is not considered; 12. Cross Entropy flag [integer]. Used in classification mode only. In case of regression mode this parameter is not considered. Default value = 0. a. 0 not used (standard MSE used); b. 1 used; 13. Name of input dataset file (with full relative path) [character string]; 14. K-fold cross validation flag [integer]. Default value = 0. 24
25 Pag. 25 of 31 a. 0 not used (standard training without validation); b. 1 used (training with validation sequence); 15. K-fold cross validation k value [integer]. If parameter 14 is 0 then this parameter is not considered. Default value = 5; 16. Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); As examples of command lines: Classification mode statistical training command line: mycnn_bp datasets/agn_7_stat_full.txt Regression mode statistical training command line: mycnn_bp datasets/agn_7_stat_full.txt Statistical Training Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; trainlog.txt: log file with detailed information about experiment configuration, main results and parameter setup; trainpartialerror.txt: ascii (space separated) file with partial values at each training iteration of the QNA algorithm. Useful to obtain a graphical view of the learning process. Each row is composed by three columns: 25
26 Pag. 26 of 31 o training step; o number of iterations of current step (number of Hessian approximations <= MaxIts); o current step batch error (MSE or Cross Entropy value if selected in classification mode); trainedweights.txt: final network weights frozen at the end of batch training. It can be used in a new training experiment to restore old one; frozen_train_net.txt: internal network node values as frozen at the end of training, to be given as network input file in test/run cases; traintestoutlog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; temptrash.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of traintestoutlog.txt, for internal use only); traintestconfmatrix.txt: confusion matrix calculated at the end of training. It results from the values stored into the traintestoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole training results; stat.txt: complete log with statistics information about both method and algorithm performances; Some of the above files, as described, are not very useful for the end user, being created for internal use only. In particular, main important files to be registered as official output files are those underlined in the previous list. 26
27 Pag. 27 of 31 5 APPENDIX Scientific case test with MLP-QNA In the following details about statistical training/test use cases with MLP-QNA algorithm are reported. The described experiments are done for a commissioned scientific case. Both classification and regression are used as functionality modes. The scope of such tests is to verify the correctness of the algorithm, its efficiency and preliminary scientific results (to be investigated in more details with specialists of the team). 5.1 The Science case The scientific problem has two main goals: 1) to determine the accuracy of recognizing globular clusters in near galaxies (<20 Mpc) from images in a single band, taken with HST, by separating these sources from background stellar contaminants, compact galaxies and AGNs. Usually such recognition is done through color selection, eventually integrated with morphological parameters able to measure the angular extension of single sources. In our case, in the preliminary way, we consider photometric data; 2) To extract from the parameter space those which influence the formation of X binary sources (LMXB) in the globular clusters. But to investigate this goal, both X and optical data are required; Moreover, an intrinsic important result could be to prove the capability and robustness of neural network as an automatic and easy way to solve the mentioned goals, instead of more complex traditional methods. Last but not the least, if the proposed method is able to reach the two goals by using only photometric parameters, another important issue could be to consider morphological (structural) information of the sources as secondary in the recognizing process. But this is a matter of a deeper investigation in the next future. The dataset used in these preliminary tests consist of source catalogues obtained by HST images of galaxy NGC1399, in the broad band V (F606V) of HST. For these sources we have the photometric parameters (reported below). At the moment M. Paolillo is checking the possibility to obtain a more precise catalogue, with color information for a larger amount of sources, together with a more precise information set about morphological parameters for all considered sources. Concerning traditional methods, for example by considering the magnitude and stellar attributes of SExtractor, M. Paolillo is able to obtain an accuracy of 92% with a +/- 10% of contamination, within m_v~24.5 (by comparing with C-R color classification only) on a dataset of ~2700 sources. Our photometric information (input columns of our dataset) used are: 1) mag_iso "Isophotal magnitude" 2) mag_aper1 "Fixed aperture magnitude vector" 3) mag_aper2 4) mag_aper3 5) kron_radius "Kron apertures" 6) ellipticity "1 - B_IMAGE/A_IMAGE" 7) fwhm_image "FWHM assuming a gaussian core" 27
β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual
β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.0 Date: July 28, 2011 Author: M. Brescia, S. Riccardi Doc. : BetaRelease_Model_MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.0
More informationMulti Layer Perceptron trained by Quasi Newton Algorithm
Multi Layer Perceptron trained by Quasi Newton Algorithm MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.2 Author: M. Brescia, S. Riccardi Doc. : MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.2 1 Index 1 Introduction...
More informationMulti Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network
Multi Layer Perceptron trained by Quasi Newton Algorithm or Levenberg-Marquardt Optimization Network MLPQNA/LEMON User Manual DAME-MAN-NA-0015 Issue: 1.3 Author: M. Brescia, S. Riccardi Doc. : MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.3
More informationMLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms
MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points
More informationMulti Layer Perceptron trained by Quasi Newton learning rule
Multi Layer Perceptron trained by Quasi Newton learning rule Feed-forward neural networks provide a general framework for representing nonlinear functional mappings between a set of input variables and
More informationMulti Layer Perceptron with Back Propagation. User Manual
Multi Layer Perceptron with Back Propagation User Manual DAME-MAN-NA-0011 Issue: 1.3 Date: September 03, 2013 Author: S. Cavuoti, M. Brescia Doc. : MLPBP_UserManual_DAME-MAN-NA-0011-Rel1.3 1 INDEX 1 Introduction...
More informationDAME Web Application REsource Plugin Creator User Manual
DAME Web Application REsource Plugin Creator User Manual DAMEWARE-MAN-NA-0016 Issue: 2.1 Date: March 20, 2014 Authors: S. Cavuoti, A. Nocella, S. Riccardi, M. Brescia Doc. : ModelPlugin_UserManual_DAMEWARE-MAN-NA-0016-Rel2.1
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationDAME Web Application REsource Plugin Setup Tool User Manual
DAME Web Application REsource Plugin Setup Tool User Manual DAMEWARE-MAN-NA-0016 Issue: 1.0 Date: October 15, 2011 Authors: M. Brescia, S. Riccardi Doc. : ModelPlugin_UserManual_DAMEWARE-MAN-NA-0016-Rel1.0
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationImage Compression: An Artificial Neural Network Approach
Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and
More informationExperimental Data and Training
Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationAssignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions
ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationSupervised Learning in Neural Networks (Part 2)
Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning
More informationKnowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationSelf-Organizing Maps for cyclic and unbounded graphs
Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationResearch on Evaluation Method of Product Style Semantics Based on Neural Network
Research Journal of Applied Sciences, Engineering and Technology 6(23): 4330-4335, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: September 28, 2012 Accepted:
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More information11/14/2010 Intelligent Systems and Soft Computing 1
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationClassification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions
ENEE 739Q SPRING 2002 COURSE ASSIGNMENT 2 REPORT 1 Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions Vikas Chandrakant Raykar Abstract The aim of the
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationCS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016
CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation
More informationDr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically
Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationGLOBULAR CLUSTERS CLASSIFICATION WITH GPU-BASED DATA MINING METHODS
GLOBULAR CLUSTERS CLASSIFICATION WITH GPU-BASED DATA MINING METHODS S. Cavuoti (1), M. Garofalo (2), M. Brescia (3), M. Paolillo (1), G. Longo (1,4), A. Pescapè (2), G. Ventre (2) and DAME Working Group
More information6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION
6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationA Data Classification Algorithm of Internet of Things Based on Neural Network
A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationHash Tables. Hashing Probing Separate Chaining Hash Function
Hash Tables Hashing Probing Separate Chaining Hash Function Introduction In Chapter 4 we saw: linear search O( n ) binary search O( log n ) Can we improve the search operation to achieve better than O(
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationTested Paradigm to Include Optimization in Machine Learning Algorithms
Tested Paradigm to Include Optimization in Machine Learning Algorithms Aishwarya Asesh School of Computing Science and Engineering VIT University Vellore, India International Journal of Engineering Research
More informationA *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,
The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationCOMPUTATIONAL INTELLIGENCE
COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationSimulation of Zhang Suen Algorithm using Feed- Forward Neural Networks
Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Ritika Luthra Research Scholar Chandigarh University Gulshan Goyal Associate Professor Chandigarh University ABSTRACT Image Skeletonization
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationGraphical User Interface User Manual
Graphical User Interface User Manual DAME-MAN-NA-0010 Issue: 1.3 Date: September 04, 2013 Author: M. Brescia, S. Cavuoti Doc. : GUI_UserManual_DAME-MAN-NA-0010-Rel1.3 1 DAME we make science discovery happen
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationMultilayer Feed-forward networks
Multi Feed-forward networks 1. Computational models of McCulloch and Pitts proposed a binary threshold unit as a computational model for artificial neuron. This first type of neuron has been generalized
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationDAMEWARE. Data Mining & Exploration Web Application Resource
arxiv:1603.00720v2 [astro-ph.im] 16 Mar 2016 DAMEWARE Data Mining & Exploration Web Application Resource Issue: 1.5 Date: March 1, 2016 Authors: M. Brescia, S. Cavuoti, F. Esposito, M. Fiore, M. Garofalo,
More informationEE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR
EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR 1.Introductıon. 2.Multi Layer Perception.. 3.Fuzzy C-Means Clustering.. 4.Real
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationRandom Search Report An objective look at random search performance for 4 problem sets
Random Search Report An objective look at random search performance for 4 problem sets Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA dwai3@gatech.edu Abstract: This report
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationThis leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section
An Algorithm for Incremental Construction of Feedforward Networks of Threshold Units with Real Valued Inputs Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton,
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationAn Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.
An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. Mohammad Mahmudul Alam Mia, Shovasis Kumar Biswas, Monalisa Chowdhury Urmi, Abubakar
More informationA neural network that classifies glass either as window or non-window depending on the glass chemistry.
A neural network that classifies glass either as window or non-window depending on the glass chemistry. Djaber Maouche Department of Electrical Electronic Engineering Cukurova University Adana, Turkey
More informationPARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION
PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION Stanislav Kontár Speech@FIT, Dept. of Computer Graphics and Multimedia, FIT, BUT, Brno, Czech Republic E-mail: xkonta00@stud.fit.vutbr.cz In
More informationA NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE GRADIENT TRAINING METHOD
1 st International Conference From Scientific Computing to Computational Engineering 1 st IC SCCE Athens, 8 10 September, 2004 c IC SCCE A NEW EFFICIENT VARIABLE LEARNING RATE FOR PERRY S SPECTRAL CONJUGATE
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More information(1) Department of Physics University Federico II, Via Cinthia 24, I Napoli, Italy (2) INAF Astronomical Observatory of Capodimonte, Via
(1) Department of Physics University Federico II, Via Cinthia 24, I-80126 Napoli, Italy (2) INAF Astronomical Observatory of Capodimonte, Via Moiariello 16, I-80131 Napoli, Italy To measure the distance
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationReview on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationArtificial Neural Networks MLP, RBF & GMDH
Artificial Neural Networks MLP, RBF & GMDH Jan Drchal drchajan@fel.cvut.cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical
More informationLogistic Regression. Abstract
Logistic Regression Tsung-Yi Lin, Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl60}@ucsd.edu January 4, 013 Abstract Logistic regression
More informationCOMBINING NEURAL NETWORKS FOR SKIN DETECTION
COMBINING NEURAL NETWORKS FOR SKIN DETECTION Chelsia Amy Doukim 1, Jamal Ahmad Dargham 1, Ali Chekima 1 and Sigeru Omatu 2 1 School of Engineering and Information Technology, Universiti Malaysia Sabah,
More informationModule 1 Lecture Notes 2. Optimization Problem and Model Formulation
Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization
More informationM. Sc. (Artificial Intelligence and Machine Learning)
Course Name: Advanced Python Course Code: MSCAI 122 This course will introduce students to advanced python implementations and the latest Machine Learning and Deep learning libraries, Scikit-Learn and
More informationRapid growth of massive datasets
Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,
More informationA Neural Network Model Of Insurance Customer Ratings
A Neural Network Model Of Insurance Customer Ratings Jan Jantzen 1 Abstract Given a set of data on customers the engineering problem in this study is to model the data and classify customers
More informationCHAPTER VI BACK PROPAGATION ALGORITHM
6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More information2. Neural network basics
2. Neural network basics Next commonalities among different neural networks are discussed in order to get started and show which structural parts or concepts appear in almost all networks. It is presented
More informationNeural Networks Laboratory EE 329 A
Neural Networks Laboratory EE 329 A Introduction: Artificial Neural Networks (ANN) are widely used to approximate complex systems that are difficult to model using conventional modeling techniques such
More informationHashing. Hashing Procedures
Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements
More informationUnsupervised Image Segmentation with Neural Networks
Unsupervised Image Segmentation with Neural Networks J. Meuleman and C. van Kaam Wageningen Agricultural University, Department of Agricultural, Environmental and Systems Technology, Bomenweg 4, 6703 HD
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More information