DAta Mining Exploration Project

Size: px

Start display at page:

Download "DAta Mining Exploration Project"

Chad Williams
6 years ago
Views:

1 Pag. 1 of 31 DAta Mining Exploration Project General Purpose Multi Layer Perceptron Neural Network (trained by Back Propagation & Quasi-Newton) Data Mining Model User Manual DAME-MAN-NA-0008 Doc. : Issue: 1.0 Date: September 02, 2010 Prepared by M. Brescia 02/09/2010 Released by G. Longo 02/09/2010 1

2 Pag. 2 of 31 Revision Matrix Issue Author Date Section/Paragraph Reason/Initiation/ Affected Documents/Remarks 0.1 M. Brescia 02/09/2010 All First draft release 2

3 Pag. 3 of 31 INDEX 1 Reference & Applicable Documents Abbreviations & Acronyms Introduction Design Issues The MLP implementation overview The BP implementation The QNA implementation System Architectural Design Chosen System Architecture System Interface description Wrapping design & implementations requirements User Interface description Input dataset format MLP-BP wrapping requirements and execution details TRAIN USE CASE TEST USE CASE RUN USE CASE MLP-QNA wrapping requirements and execution details TRAIN USE CASE TEST USE CASE RUN USE CASE STATISTICAL TRAIN USE CASE APPENDIX Scientific case test with MLP-QNA The Science case Test procedure and results TABLE INDEX Tab. 1 Reference & Applicable Documents... 5 Tab. 2 Abbreviations and acronyms... 6 Tab. 3 Test results FIGURE INDEX Fig. 1 MLP architecture... 8 Fig. 2 bipolar sigmoid activation function... 9 Fig. 3 Execution time comparison between the 4 tests Fig. 4 Training iterations comparison between the 4 tests

4 Pag. 4 of 31 This page is intentionally left blank 4

5 Pag. 5 of 31 1 Reference & Applicable Documents INDEX Reference Author Date 1 sdd_template_voneural-sdd-na-0000-rel0.1 Software Design Description Document Guidelines M. Brescia 18/12/ SuiteDesign_VONEURAL-PDD-NA-0001-Rel2.0 Suite Project Description Document 3 DMPlugins_DAME-TRE-NA-0016-Rel0.3 deployed Model-functionality DMPlugins Description report 4 dm-model_voneural-sdd-na-0008-rel1.2 Data Mining Model Component Software Design Description 5 framework_voneural-sdd-na-0005-rel1.0 VO-Neural team A.DiGuido, M. Brescia S. Cavuoti, A. Di Guido O. Laurino, 15/10/ /04/ /05/ /05/2009 Framework Component Software Design Description 6 Neural Networks PC Tools A practical guide Academic Press 7 "Improving The Learning Speed of 2-layer Neural Networks by Choosing Initial Values of The Adaptive Weights", IJCNN, USA 8 "A Comparison among Weight Initialization Methods for Multilayer Feedforward Networks," IJCN, Italy 9 "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods", Mathematical Programming, 63, 4, pp M. Fiore R. C. Eberhart, R.W. Dobbins Nguyen, D. Widrow, B. Mercedes Fernández- Redondo, Carlos Hernández- Espinosa Byrd, R.H, Nocedal,J., and Schnabel, R.B Tab. 1 Reference & Applicable Documents 5

6 Pag. 6 of 31 2 Abbreviations & Acronyms A & A BFGS BP CE CSV DAME DM DMM GP GRID L-BFGS MLP MSE NN OOP QNA SA SVM TS TBC UML VO XML Broyden-Fletcher-Goldfarb-Shanno Back Propagation Cross Entropy Comma Separated Value Data Mining & Exploration Data Mining Data Mining Model General Purpose Global Resource Information Database Limited memory BFGS Multi Layer Perceptron Mean Square Error Neural Network Object Oriented Programming Quasi Newton Algorithm Stand Alone Support Vector Machine Tournament Selection To Be Completed Unified Modeling Language Virtual Observatory extensible Markup Language Meaning Tab. 2 Abbreviations and acronyms 6

7 Pag. 7 of 31 3 Introduction This document deals with a data mining (DM) model, used to solve not linear problems optimization. It is based on the design of a general purpose feed-forward neural network architecture, in order to obtain a Soft Computing instrument implementing supervised learning. Classical MLP architecture is used associated to a double type of learning algorithm: the standard weight gradient descendent rule named Back Propagation (BP); the statistical Quasi-Newton Algorithm (QNA); In the MLP-BP case, both batch and on-line learning modes are available, while in the MLP-QNA case only batch learning is achieved. Hereinafter the term MLP-GP is used to indicate the general model implemented, while MLP-BP is used to focalize the MLP-GP associated to the BP algorithm and MLP-QNA when the MLP is associated to the QNA learning algorithm. 3.1 Design Issues The model described here is intended to become one of the DM models officially integrated into the Suite DAME. To achieve this goal a set of standardization rules is followed, in order to make the package compliant with the specific environment specifications, [2, 3, 4, 5]. These guidelines are basically related to input/output data format, compiling and execution dependencies, DMM wrapper conditions and requirements The MLP implementation overview The MLP architecture is one of the most typical feed-forward neural network model. The term feed-forward is used to identify basic behavior of such neural models, in which the impulse is propagated always in the same direction, e.g. from neuron input layer towards output layer, through one or more hidden layers (the network brain), by combining weighted sum of weights associated to all neurons (except the input layer). As easy to understand, the neurons are organized in layers, with proper own role. The input signal, simply propagated throughout the neurons of the input layer, is used to stimulate next hidden and output neuron layers. The output of each neuron is obtained by means of an activation function, applied to the weighted sum of its inputs. Different shape of this activation function can be applied, from the simplest linear one up to sigmoid. The number of hidden layers represents the degree of the complexity achieved for the energy solution space in which the network output moves looking for the best solution. As an example, in a typical classification problem, the number of hidden layers indicates the number of hyper-planes used to split the parameter space (i.e. number of possible classes) in order to classify each input pattern. What is different in such a neural network architecture is typically the learning algorithm used to train the network. It exists a dichotomy between supervised and unsupervised learning methods. 7

8 Pag. 8 of 31 Fig. 1 MLP architecture In the first case, the network must be firstly trained (training phase), in which the input patterns are submitted to the network as couples (input, desired known output). The feed-forward algorithm is then achieved and at the end of the input submission, the network output is compared with the corresponding desired output in order to quantify the learning quote. It is possible to perform the comparison in a batch way (after an entire input pattern set submission) or incremental (the comparison is done after each input pattern submission) and also the metric used for the distance measure between desired and obtained outputs, can be chosen accordingly problem specific requirements (in the MLP-BP the MSE, Mean Square Error, is used). After each comparison and until a desired error distance is unreached (typically the error tolerance is a precalculated value or a constant imposed by the user), the weights of hidden layers must be changed accordingly to a particular law or learning technique. After the training phase is finished (or arbitrarily stopped), the network should be able not only to recognize correct output for each input already used as training set, but also to achieve a certain degree of generalization, i.e. to give correct output for those inputs never used before to train it. The degree of generalization varies, as obvious, depending on how good has been the learning phase. This important feature is realized because the network doesn t associates a single input to the output, but it discovers the relationship present behind their association. After training, such a neural network can be seen as a black box able to perform a particular function (input-output correlation) whose analytical shape is a priori not known. In order to gain the best training, it must be as much homogeneous as possible and able to describe a great variety of samples. Bigger the training set, higher will be the network generalization capability. Despite of these considerations, it should always taken into account that neural networks application field should be usually referred to problems where it is needed high flexibility (quantitative result) more than high precision (qualitative results). Concerning the hidden layer choice, there is the possibility to define zero hidden layers (SLP, Single Layer Perceptron, able to solve only linear separation of the parameter space), 1 or 2 hidden layers, depending on the complexity the user wants to introduce in the not linear problem solving experiment. Second learning type (unsupervised) is basically referred to neural models able to classify/cluster patterns onto several categories, based on their common features, by submitting training inputs without related desired outputs. This is not the learning case approached with the MLP architecture, so it is not important to add more information in this document. 8

9 Pag. 9 of The BP implementation In feed-forward process, the network will calculate the output based on the given input. We use bipolar logistic function as the activation function in hidden and output layer. While in input layer, I use unity function. Choosing an appropriate activation function can also contribute to a much faster learning. Theoretically, sigmoid function with less saturation speed will give a better result. f f sigmoid ' sigmoid 2 ( x) = 1 σ x 1+ e ( x) = 2σ e σ x σ x ( e + 1) 2 Fig. 2 bipolar sigmoid activation function It can be manipulated its slope and see how it affects the learning speed. A larger slope will make weight values move faster to saturation region (faster convergence), while smaller slope will make weight values move slower but it allows a refined weight adjustment. Next, it will compare this calculated output to the desired output to calculate the error. The next mission is to minimize this error. What method we choose for minimizing this error will also determine the learning speed. Gradient descent method is the most common for minimizing this error. Finally, it will update the weight value as the following: where: 9

10 Pag. 10 of 31 Besides this gradient descent method, there are several other methods that will guarantee a faster learning speed. In this case, we can make the classical BP learning process much faster by adding momentum term or by using adaptive learning rate. The feedforward network error is calculated with the standard MSE function. In momentum learning, weight update at time (t+1) contains momentum of the previous learning. So we need to keep the previous value of error and output. The equation above can be implemented as the following. Variable α is the momentum value. The value should be greater than zero and smaller than one. In the MLP-BP implementation, the momentum is not the unique improvement done to the standard BP algorithm. In our case we have also implemented an adaptive learning rule, [7], described in the following. For adaptive learning, the idea is to change the learning rate automatically based on current error and previous error. The formula is: The idea is to observe the last two errors and adjust the learning rate in the direction that would have reduced the second error. Both variable E and Ei are the current and previous error. Parameter A is a parameter that will determine how rapidly the learning rate is adjusted. Parameter A should be less than one and greater than zero. You can also try another method by multiplying the current learning rate with a factor greater than one if current error is smaller than previous error. And if current error is bigger than previous error, you can multiply it with a factor less than one. In literature it is also suggested to discard the changes if the error is increasing. This will lead into a better result. The adaptive learning routine is in the function ann_train_network_from_file where learning rate update is performed either in on-line (updated epoch-by-epoch after each single pattern presentation) or batch (updated after a whole dataset presentation). Moreover, concerning the MLP network weight initialization, several alternative methods have been implemented, [8]. It is known that the particular initialization values give influences to the speed of convergence. There are several methods available for this purpose. The most common is by initializing the weights at random with uniform distribution inside the interval of a certain small range of number. In the MLP-BP we call this method HARD_RANDOM. Another better method is by bounding the range as expressed in the equation below. We call this method with just RANDOM. 10

11 Pag. 11 of 31 Widely known as a very good weight initialization method is the Nguyen-Widrow method. We call this method as NGUYEN. Nguyen-Widrow weight initialization algorithm can be expressed as the following steps: As stated in the algorithm as written above, first, we assign random number of -1 to 1 to all hidden nodes. Next, we calculate the norm of these random numbers that we have generated by calling function get_norm_of_weight. Now we have all the necessary data and we can proceed to the available formula. All the weight initialization routines are located in function initialize_weights. It is also possible to resume a trained network, to perform a further training session, starting from the previous final weight setup. In this case the user should set the method FROM_FILE and specifying the stored weights file name as inputs The QNA implementation In this case, the BP algorithm is completely replaced by an adapted version of the classical Newton method for optimization problems. The Newton method is the general basis for a whole family of so called Quasi-Newton methods. One of those methods, implemented here is the L-BFGS algorithm, [9]. More rigorously, the QNA is an optimization of learning rule, also because, as described below, the implementation is based on a statistical approximation of the Hessian by cyclic gradient calculation, that, as said in the previous section, is at the base of BP method. As known, the classical Newton method uses the Hessian of a function. The step of the method is defined as a product of an inverse Hessian matrix and a function gradient. If the function is a positive definite quadratic form, we can reach the function minimum in one step. In case of an indefinite quadratic form (which has no minimum), we will reach the maximum or saddle point. In short, the method finds the stationary point of a quadratic form. In practice, we usually have functions which are not quadratic forms. If such a function is smooth, it is sufficiently good described by a quadratic form in the minimum neighborhood. However, the Newton method can converge both to a minimum and a maximum (taking a step into the direction of a function increasing). Quasi-Newton methods solve this problem as follows: they use a positive definite approximation instead of a Hessian. If Hessian is positive definite, we make the step using the Newton method. If Hessian is indefinite, we modify it to make it positive definite, and then perform a step using the Newton method. The step is always performed in the direction of the function decrement. In case of a positive definite Hessian, we use it to generate a quadratic surface approximation. This should make the convergence better. If Hessian is indefinite, we just move to where function decreases. 11

12 Pag. 12 of 31 Some modifications of Quasi-Newton methods perform a precise linear minimum search along the indicated line, but it is proved that it's enough to sufficiently decrease the function value, and not necessary to find a precise minimum value. The L-BFGS algorithm tries to perform a step using the Newton method. If it does not lead to a function value decreasing, it lessens the step length to find a lesser function value. Up to here it seems quite simple but it is not! The Hessian of a function isn't always available and in many cases is too much complicated; more often we can only calculate the function gradient. Therefore, the following operation is used: the Hessian of a function is generated on the basis of the N consequent gradient calculations, and the quasi-newton step is performed. There is a special formulas which allows to iteratively get a Hessian approximation. On each step approximation, the matrix remains positive definite. The algorithm uses the L-BFGS update scheme. BFGS stands for Broyden-Fletcher-Goldfarb- Shanno (more precisely, this scheme generates not the Hessian, but its inverse matrix, so we don't have to waste time inverting a Hessian). The L letter in the scheme name comes from the words "Limited memory". In case of big dimensions, the amount of memory required to store a Hessian (N 2 ) is too big, along with the machine time required to process it. Therefore, instead of using N gradient values to generate a Hessian we can use a smaller number of values, which requires a memory capacity of order of N M. In practice, M is usually chosen from 3 to 7, in difficult cases it is reasonable to increase this constant to 20. Of course, as a result we'll get not the Hessian but its approximation. On the one hand, the convergence slows down. On the other hand, the performance could even grow up. At first sight, this statement is paradoxical. But it contains no contradictions: the convergence is measured by a number of iterations, whereas the performance depends on the number of processor's time units spent to calculate the result. As a matter of fact, this method was designed to optimize the functions of a number of arguments (hundreds and thousands), because in this case it is worth having an increasing iteration number due to the lower approximation precision because the overheads become much lower. This is particularly useful in astrophysical data mining problems, where usually the parameter space is dimensionally huge and confused by a low signal-to-noise ratio. But we can use these methods for small dimension problems too. The main advantage of the method is scalability, because it provides high performance when solving high dimensionality problems, and it allows to solve small dimension problems too. From the implementation point of view, in the MLP-QNA case the following features are available for the end user: only batch learning mode is available; Strict separation between classification and regression functionality modes; For classification mode, the Cross Entropy method is available to compare output and target network values. It is possible to alternatively use standard MSE rule, that is mandatory for regression mode; K-fold cross validation method to improve training performances and to avoid overfitting problems; Resume training from past experiments, by using the weights stored in an external file at the end of the training phase; Confusion matrix calculated and stored in an external file for both classification and regression modes (in the last case an adapted version is provided). It is useful after training and test sessions to evaluate model performances; 12

13 Pag. 13 of 31 4 System Architectural Design 4.1 Chosen System Architecture The choice of the MLP-GP system architecture is not free, but bounded by the specific requirements issued by the DAME Suite environment. The MLP-GP is one of the supported DM models to be integrated into the Suite infrastructure, both in terms of I/O data format, XML parameter description, functionality association (design pattern integration as specified in [4]) and DMPlugin package constraints, as specified in [3] and [5]. 4.2 System Interface description Wrapping design & implementations requirements In order to wrap MLP-GP model (library implemented in C++) into DAME Suite, we have to create a java class called MLPGP.java. DAME Suite have a class interface called DMMInterface which represents a generic data mining model that can be added in DAME Suite. Therefore the MLPGP java class must implement DMMInterface class and its specified use cases: Train; Test; Run; Full (as sequential combination of previous ones); This wrapping phase is foreseen to be provided as soon as possible. 4.3 User Interface description In order to be integrated into DAME suite, the code of MLP-GP has been structured taking into account the DMM design pattern requirements and the distinction into different functionality and use case constraints. The MLP-GP has to be put into the supervised model hierarchy associated with specific functionality modes. The MLP-GP, in particular with the QNA learning rule, can be used for the following functionality modes: Classification; Regression; Also the use case requirements mentioned in the previous section has been strictly followed (train, test, run, full use cases allowed). At run time the program can be executed under the form of formatted command line. 13

14 Pag. 14 of 31 Depending on the different use case and functionality of the current experiment, the following are the details of command line parameters to be specified to execute the MLP-GP. In principle the way to execute the program is by compiling a command line as suffix string for the executable program mycnn_bp.exe. In all following cases the command line must be respected in terms of number and order of parameters Input dataset format The dataset input file (both with target columns for train and test cases, and without target columns for run case) to be accepted by the program MLP-BP is exclusively CSV format with no header or special character at the beginning of the file. IMPORTANT: the file MUST be provided WITHOUT any header and with NO CARRIAGE RETURN after last pattern row!!! MLP-BP wrapping requirements and execution details Here the details about the launch of MLP with BP are reported TRAIN USE CASE The following are pre-formatted command lines to be used training command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 0 TRAIN_BP (training case); 3. Learning rate [double]. The BP learning rate, in the range [0, 1]; 4. Momentum factor [double]. The momentum value in the range [0, 1]; 5. Learning changing factor [double]. Used for adaptive learning rule, in the range ]0, 1[; 6. Slope sigma argument of bimodal sigmoid activation function [double]; 7. Number of input neurons [integer]. It must match the number of input dataset columns; 8. Number of output neurons [integer]. It must match the number of target dataset columns; 9. Number of hidden layers [integer]. It may be 0, 1 or 2; 10. Number of first hidden layer neurons [integer]. If parameter 9 is 0, then this field is not considered; 11. Number of second hidden layer neurons [integer]. If parameter 9 is < 2, then this field is not considered; 12. Weight initializing rule [integer]: 14

15 Pag. 15 of 31 a. 701 HARD_RANDOM. Random values with user specified range [-m_init_val, +m_init_val]; b. 702 RANDOM. Random range into [-1, +1]; c. 703 NGUYEN --> first scales into [-1, +1] then apply specific rule (see section 3.1.2); d. 704 FROM_FILE --> already created weight file used. This is the case of resume training or test/run sessions, where an already trained network must be used; 13. Range of user defined range of weight initialization [double]. Used only if the selected rule as parameter 12 is HARD_RANDOM, otherwise it is not considered; 14. Training input dataset file name (with full relative path) [character string]; 15. Training log file name (with full relative path) [character string]; 16. Partial training error file name (with full relative path) [character string]; 17. Training network weight file name (with full relative path) [character string]; 18. Number of training iterations [integer]. Stop condition; 19. Learning MSE error threshold [double]. Stop condition; 20. Training input dataset internal column order [integer]: a. 705 INPUT_FIRST. The file contains input columns first and then the target columns; b. 706 OUTPUT_FIRST. The file contains target columns first and then the input columns; 21. Training mode [integer]: a. 300 BATCH; b. 301 INCREMENTAL (on-line mode to update the network weights); As examples of command lines: training command line: mycnn_bp input.txt trainlog.txt trainpartialerror.txt trainedweights.txt Training Output The following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; <log file name>: user defined log file with information about experiment results; <partial training error file name>: user defined file with partial error values at each training iteration. Useful to obtain a graphical view of the learning process <trained weights file name>: final network weights frozen at the end of training. It can be used in a new training experiment to restore old one; All these files have to be registered as official output files of the experiment. 15

16 Pag. 16 of TEST USE CASE The following are pre-formatted command lines to be used test command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 1 TEST_BP (test case); 3. Slope sigma argument of bimodal sigmoid activation function [double]; 4. Number of input neurons [integer]. It must match the number of input dataset columns; 5. Number of output neurons [integer]. It must match the number of target dataset columns; 6. Number of hidden layers [integer]. It may be 0, 1 or 2; 7. Number of first hidden layer neurons [integer]. If parameter 6 is 0, then this field is not considered; 8. Number of second hidden layer neurons [integer]. If parameter 6 is < 2, then this field is not considered; 9. Input weight file name (with full relative path) [character string]; 10. Test input dataset file name (with full relative path) [character string]; 11. Test output log file name (with full relative path) [character string]; 12. Test input dataset internal column order [integer]: a. 705 INPUT_FIRST. The file contains input columns first and then the target columns; b. 706 OUTPUT_FIRST. The file contains target columns first and then the input columns; As examples of command lines: test command line: mycnn_bp trainedweights.txt input.txt testlog.txt 705 Test Output The following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; <log file name>: user defined log file with information about experiment results; 16

17 Pag. 17 of 31 All these files have to be registered as official output files of the experiment RUN USE CASE The following are pre-formatted command lines to be used run command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 2 RUN_BP (run case); 3. Slope sigma argument of bimodal sigmoid activation function [double]; 4. Number of input neurons [integer]. It must match the number of input dataset columns; 5. Number of output neurons [integer]. It must match the number of target dataset columns; 6. Number of hidden layers [integer]. It may be 0, 1 or 2; 7. Number of first hidden layer neurons [integer]. If parameter 6 is 0, then this field is not considered; 8. Number of second hidden layer neurons [integer]. If parameter 6 is < 2, then this field is not considered; 9. Input weight file name (with full relative path) [character string]; 10. Run input dataset file name (with full relative path) [character string]; 11. Run output log file name (with full relative path) [character string]; As examples of command lines: run command line: mycnn_bp trainedweights.txt run.txt runlog.txt Run Output The following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; <log file name>: user defined log file with information about experiment results; All these files have to be registered as official output files are those underlined in the previous list. 17

18 Pag. 18 of MLP-QNA wrapping requirements and execution details Here the details about the launch of MLP with QNA are reported TRAIN USE CASE The following are pre-formatted command lines to be used training command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 3 TRAIN_QNA (training case); 3. Decay [double]. Weight decay constant, (>=0.001). Decay term 'Decay* Weights ^2' is added to error function. Default value = 0.001; 4. Restarts [integer]. Number of restarts from random position, >0. If you don't know what Restarts to choose, use 2. (THIS IS THE NUMBER OF MAX TRAINING CYCLES PERFORMED ANYWAY). Default value = 20; 5. Wstep [double]. Stopping criterion. Algorithm stops if step size is less than WStep. Recommended value = Zero step size means stopping after MaxIts iterations. Default value = 0.01; 6. MaxIts [integer]. Stopping criterion. Algorithm stops after MaxIts iterations (NOT gradient calculations). Zero MaxIts means stopping when step is sufficiently small (use Wstep). Default value = 1500; 7. Number of input neurons [integer]. It must match the number of input dataset columns; 8. Number of output neurons [integer]. It must match the number of target dataset columns; 9. Number of hidden layers [integer]. It may be 0, 1 or 2; 10. Number of first hidden layer neurons [integer]. If parameter 9 is 0, then this field is not considered; 11. Number of second hidden layer neurons [integer]. If parameter 9 is < 2, then this field is not considered; 12. Cross Entropy flag [integer]. Used in classification mode only. In case of regression mode this parameter is not considered. Default value = 0. a. 0 not used (standard MSE used); b. 1 used; 13. Name of input dataset file (with full relative path) [character string]; 14. K-fold cross validation flag [integer]. Default value = 0. a. 0 not used (standard training without validation); b. 1 used (training with validation sequence); 15. K-fold cross validation k value [integer]. If parameter 14 is 0 then this parameter is not considered. Default value = 5; 18

19 Pag. 19 of Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); 17. Weight initialization choice [integer]. It issues how to initialize network weights. It is possible to resume a previous training phase: a. 702 RANDOM initialization between [-1, +1]; b. 704 FROM_FILE. To be used in case of past training resume; 18. Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]. To be used in case of parameter 17 set to FROM_FILE value. If parameter 17 is RANDOM, this is not considered. As examples of command lines: Classification mode training command line: mycnn_bp datasets/agn_7_stat_full.txt none In this case (red fields): - Network with 1 hidden layer of 15 neurons - It uses cross entropy (flag set to 1) - It uses cross validation (flag set to 1) with k = 10 - Weights are initialized randomly (702) with weight file name none (not used) mycnn_bp datasets/agn_7_stat_full.txt experiments/regression/trainedweights.txt In this case (red fields): - Network with 2 hidden layers of, respectively, 15 and 6 neurons - It does not use cross entropy (flag set to 0), but simple MSE error - It does not use cross validation (flag set to 0) with k = 10 not considered - Weights are initialized restoring past training (704) with weight file name experiments/regression/trainedweights.txt Regression mode training command line: 19

20 Pag. 20 of 31 mycnn_bp datasets/agn_7_stat_full.txt experiments/regression/trainedweights.txt Training Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; trainlog.txt: log file with detailed information about experiment configuration, main results and parameter setup; trainpartialerror.txt: ascii (space separated) file with partial values at each training iteration of the QNA algorithm. Useful to obtain a graphical view of the learning process. Each row is composed by three columns: o training step; o number of iterations of current step (number of Hessian approximations <= MaxIts); o current step batch error (MSE or Cross Entropy value if selected in classification mode); trainedweights.txt: final network weights frozen at the end of batch training. It can be used in a new training experiment to restore old one; frozen_train_net.txt: internal network node values as frozen at the end of training, to be given as network input file in test/run cases; traintestoutlog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; temptrash.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of traintestoutlog.txt, for internal use only); traintestconfmatrix.txt: confusion matrix calculated at the end of training. It results from the values stored into the traintestoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole training results. In the case of regression it is an adapted version; Some of the above files, as described, are not very useful for the end user, being created for internal use only. In particular, main important files to be registered as official output files are those underlined in the previous list. 20

21 Pag. 21 of TEST USE CASE The following are pre-formatted command lines to be used test command line parameters (ALL parameters are required in a strict sequential order) 12. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 13. Use case [integer]: a. 4 TEST_QNA (test case); 14. Number of input neurons [integer]. It must match the number of input dataset columns; 15. Number of output neurons [integer]. It must match the number of target dataset columns; 16. Number of hidden layers [integer]. It may be 0, 1 or 2; 17. Number of first hidden layer neurons [integer]. If parameter 5 is 0, then this field is not considered; 18. Number of second hidden layer neurons [integer]. If parameter 5 is < 2, then this field is not considered; 19. Name of input dataset file (with full relative path) [character string]; 20. Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); 21. Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]; 22. Name of the file (with full relative path if loaded from different directory) with internal network node values as frozen at the end of training phase [character string]; As examples of command lines: Classification mode test command line: mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/classification/trainedweights.txt experiments/classification/frozen_train_net.txt Regression mode test command line: 21

22 Pag. 22 of 31 mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/regression/trainedweights.txt experiments/regression/frozen_train_net.txt Test Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; testoutlog.txt: output values as calculated after test, with respective target values. It can be used to evaluate the network output for each input pattern; temptrash.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of testoutlog.txt, for internal use only); testconfmatrix.txt: confusion matrix calculated at the end of test. It results from the values stored into the testoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole test results. In the case of regression it is an adapted version; All these files have to be registered as official output files are those underlined in the previous list RUN USE CASE The following are pre-formatted command lines to be used run command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 4 RUN_QNA (run case); 3. Number of input neurons [integer]. It must match the number of input dataset columns; 4. Number of output neurons [integer]. It must match the number of target dataset columns; 5. Number of hidden layers [integer]. It may be 0, 1 or 2; 6. Number of first hidden layer neurons [integer]. If parameter 5 is 0, then this field is not considered; 22

23 Pag. 23 of Number of second hidden layer neurons [integer]. If parameter 5 is < 2, then this field is not considered; 8. Name of input dataset file (with full relative path) [character string]; 9. Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); 10. Name of the weight file (with full relative path if loaded from different directory) to be loaded to initialize network weights [character string]; 11. Name of the file (with full relative path if loaded from different directory) with internal network node values as frozen at the end of training phase [character string]; As examples of command lines: Classification mode run command line: mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/classification/trainedweights.txt experiments/classification/frozen_train_net.txt Regression mode run command line: mycnn_bp datasets/test_agn_ridotto.txt 1 experiments/regression/trainedweights.txt experiments/regression/frozen_train_net.txt Run Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; 23

24 Pag. 24 of 31 RunOutLog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; All these files have to be registered as official output files are those underlined in the previous list STATISTICAL TRAIN USE CASE The following are pre-formatted command lines to be used Statistical training command line parameters (ALL parameters are required in a strict sequential order) 1. Functionality case [integer]: a. 10 CLASSIFICATION; b. 20 REGRESSION; 2. Use case [integer]: a. 6 STAT_TRAIN_QNA (statistical training case). It generates a verbose log file reporting a step-by-step training procedure with incremental dimension of input dataset (see related section below for details); 3. Decay [double]. Weight decay constant, (>=0.001). Decay term 'Decay* Weights ^2' is added to error function. Default value = 0.001; 4. Restarts [integer]. Number of restarts from random position, >0. If you don't know what Restarts to choose, use 2. (THIS IS THE NUMBER OF MAX TRAINING CYCLES PERFORMED ANYWAY). Default value = 20; 5. Wstep [double]. Stopping criterion. Algorithm stops if step size is less than WStep. Recommended value = Zero step size means stopping after MaxIts iterations. Default value = 0.01; 6. MaxIts [integer]. Stopping criterion. Algorithm stops after MaxIts iterations (NOT gradient calculations). Zero MaxIts means stopping when step is sufficiently small (use Wstep). Default value = 1500; 7. Number of input neurons [integer]. It must match the number of input dataset columns; 8. Number of output neurons [integer]. It must match the number of target dataset columns; 9. Number of hidden layers [integer]. It may be 0, 1 or 2; 10. Number of first hidden layer neurons [integer]. If parameter 9 is 0, then this field is not considered; 11. Number of second hidden layer neurons [integer]. If parameter 9 is < 2, then this field is not considered; 12. Cross Entropy flag [integer]. Used in classification mode only. In case of regression mode this parameter is not considered. Default value = 0. a. 0 not used (standard MSE used); b. 1 used; 13. Name of input dataset file (with full relative path) [character string]; 14. K-fold cross validation flag [integer]. Default value = 0. 24

25 Pag. 25 of 31 a. 0 not used (standard training without validation); b. 1 used (training with validation sequence); 15. K-fold cross validation k value [integer]. If parameter 14 is 0 then this parameter is not considered. Default value = 5; 16. Confusion matrix calculation mode [integer]. This is an internal parameter used to define the calculation of confusion matrix (which diagonal to be considered as positive cases). To be used only in classification mode. Default value = 1. In case of regression this is not considered: a. 0 reverse mode (positive cases are on secondary diagonal of matrix); b. 1 standard mode (positive cases are on primary diagonal of matrix); As examples of command lines: Classification mode statistical training command line: mycnn_bp datasets/agn_7_stat_full.txt Regression mode statistical training command line: mycnn_bp datasets/agn_7_stat_full.txt Statistical Training Output When executed under training use case, the output is composed by following files, stored into a pre-defined directory sub-tree. This sub-tree starts from the execution directory, and it branches into two different subtrees, depending on the functionality domain of the current execution: -./experiments/classification for the classification case -./experiments/regression for the regression case In one of such directories the following output files are automatically generated at the end of execution: errorlog.txt: error report file, containing details about any incorrect condition or exception that caused the abnormal exit from the execution. This file is not created if the program ends normally; trainlog.txt: log file with detailed information about experiment configuration, main results and parameter setup; trainpartialerror.txt: ascii (space separated) file with partial values at each training iteration of the QNA algorithm. Useful to obtain a graphical view of the learning process. Each row is composed by three columns: 25

26 Pag. 26 of 31 o training step; o number of iterations of current step (number of Hessian approximations <= MaxIts); o current step batch error (MSE or Cross Entropy value if selected in classification mode); trainedweights.txt: final network weights frozen at the end of batch training. It can be used in a new training experiment to restore old one; frozen_train_net.txt: internal network node values as frozen at the end of training, to be given as network input file in test/run cases; traintestoutlog.txt: output values as calculated after training, with respective target values. It can be used to evaluate the network output for each input pattern. It corresponds to an embedded test session done by submitting the training dataset as test dataset; temptrash.txt: ascii file with network outputs and related targets for all input patterns (simplified, not verbose, version of traintestoutlog.txt, for internal use only); traintestconfmatrix.txt: confusion matrix calculated at the end of training. It results from the values stored into the traintestoutlog.txt file. Useful to obtain a simple statistical evaluation of the whole training results; stat.txt: complete log with statistics information about both method and algorithm performances; Some of the above files, as described, are not very useful for the end user, being created for internal use only. In particular, main important files to be registered as official output files are those underlined in the previous list. 26

27 Pag. 27 of 31 5 APPENDIX Scientific case test with MLP-QNA In the following details about statistical training/test use cases with MLP-QNA algorithm are reported. The described experiments are done for a commissioned scientific case. Both classification and regression are used as functionality modes. The scope of such tests is to verify the correctness of the algorithm, its efficiency and preliminary scientific results (to be investigated in more details with specialists of the team). 5.1 The Science case The scientific problem has two main goals: 1) to determine the accuracy of recognizing globular clusters in near galaxies (<20 Mpc) from images in a single band, taken with HST, by separating these sources from background stellar contaminants, compact galaxies and AGNs. Usually such recognition is done through color selection, eventually integrated with morphological parameters able to measure the angular extension of single sources. In our case, in the preliminary way, we consider photometric data; 2) To extract from the parameter space those which influence the formation of X binary sources (LMXB) in the globular clusters. But to investigate this goal, both X and optical data are required; Moreover, an intrinsic important result could be to prove the capability and robustness of neural network as an automatic and easy way to solve the mentioned goals, instead of more complex traditional methods. Last but not the least, if the proposed method is able to reach the two goals by using only photometric parameters, another important issue could be to consider morphological (structural) information of the sources as secondary in the recognizing process. But this is a matter of a deeper investigation in the next future. The dataset used in these preliminary tests consist of source catalogues obtained by HST images of galaxy NGC1399, in the broad band V (F606V) of HST. For these sources we have the photometric parameters (reported below). At the moment M. Paolillo is checking the possibility to obtain a more precise catalogue, with color information for a larger amount of sources, together with a more precise information set about morphological parameters for all considered sources. Concerning traditional methods, for example by considering the magnitude and stellar attributes of SExtractor, M. Paolillo is able to obtain an accuracy of 92% with a +/- 10% of contamination, within m_v~24.5 (by comparing with C-R color classification only) on a dataset of ~2700 sources. Our photometric information (input columns of our dataset) used are: 1) mag_iso "Isophotal magnitude" 2) mag_aper1 "Fixed aperture magnitude vector" 3) mag_aper2 4) mag_aper3 5) kron_radius "Kron apertures" 6) ellipticity "1 - B_IMAGE/A_IMAGE" 7) fwhm_image "FWHM assuming a gaussian core" 27

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual

β-release Multi Layer Perceptron Trained by Quasi Newton Rule MLPQNA User Manual DAME-MAN-NA-0015 Issue: 1.0 Date: July 28, 2011 Author: M. Brescia, S. Riccardi Doc. : BetaRelease_Model_MLPQNA_UserManual_DAME-MAN-NA-0015-Rel1.0