CHAPTER 4 DETERMINATION OF OPTIMAL PATTERN RECOGNITION TECHNIQUE BY SOFTCOMPUTING ANALYSIS

Size: px
Start display at page:

Download "CHAPTER 4 DETERMINATION OF OPTIMAL PATTERN RECOGNITION TECHNIQUE BY SOFTCOMPUTING ANALYSIS"

Transcription

1 70 CHAPTER 4 DETERMINATION OF OPTIMAL PATTERN RECOGNITION TECHNIQUE BY SOFTCOMPUTING ANALYSIS 4.1 INTRODUCTION Pattern recognition is the act of taking in raw data and taking action based on the category of pattern. Over the years, Artificial Neural Networks (ANNs) have become recognized as powerful tool for pattern-recognition techniques and are considered to have remarkable approach in solving some difficult pattern recognition problems. These networks are capable of recognising spatial or temporal relationships and performing tasks like classification. The artificial neural networks have the precise feature of storing the knowledge in the synaptic weights of the processing elements (Mohamed 2009). There are a number of ANN types and algorithms which allow the design of neural networks and the computation of weight values. But, the challenging issue in the design of ANN for a particular application is to find out a specialized architecture in terms of required number of neurons, number of hidden layers and the learning procedure (Nelson 2002). This is more vital, in order to meet the necessary requirements, the sensitivity, specificity and repeatability. This chapter focuses on finding the optimal architecture and its learning algorithm required for this particular application of identifying E.coli from other pathogens together with its life stage.

2 71 The simulation tool used in this study was Neuro-Solutions software (v. 4.2 Neuro Dimension Inc., Gainesville, Florida, USA). From literature survey, it was found that for this type of research analysis the pattern recognition method used were back propagation neural network (Pavlou 2002 a, Pavlou 2004, Hao 2003, Xing 2005, Fend 2006, Hong 2007), genetic algorithm (Pavlou 2002 a, Pavlou 2004), self organising map (Ritaban 2002) and fuzzy clustering means (Ping 1997, Ritaban 2002). In all these works the percentage of recognition varied from 85% to 100%. This chapter focuses on finding a model which produces 100% recognition. After analysis, it was decided that in this work various pattern recognition techniques such as Multi Layer Perceptron (MLP), Principal Component Analysis Neural Network (PCANN) and Support Vector Machine (SVM) were employed to find the best architecture suiting this application. In these three models, genetic algorithm was invoked for network optimization for selecting training parameters, input selection and for evolving network architecture. 4.2 APPLYING GA FOR NETWORK ARCHITECTURE OPTIMISATION The topology of a network, that is, the number of nodes and the location and the number of connections among them, has a significant impact in the performance of the network and its generalization skills. By obtaining optimised network, measures like fast response (requires minimum network size) and VLSI hardware implementation compatibility (requires minimum connectivity) can be achieved. Genetic Algorithms (GAs) are the other methods to arrive optimal neural network design (Fiezelew 2007). They employ a parallel multipoint probabilistic search strategy that is biased toward reinforcing search points of high fitness. The most distinguishing feature of GAs is their flexibility and applicability to a wide range of optimisation problems. In the domain of neural networks, GAs are useful as global search

3 72 methods for synthesizing the weights of generally interconnected networks, optimal network architectures, learning parameters, optimal learning rules, etc (Kermani 1999, Montana 1989). As the performance of a neural network is critically dependant on the choice of the processing elements and network architecture, this investigation is focussed on implementing genetic algorithm for optimising the architecture of the pattern recognition network. In GA, a genotype is an array of genes, where every gene takes a value from a properly defined domain (Goldberg 1991). Each genotype codes a phenotype or candidate solution for the domain of interest (neural architecture class). Such codings could use genes that take numeric values to represent a few parameters or complex structures of symbols become into phenotypes (neural networks) by means of a proper decoding process (Illeana 2004, Fiezelew 2007). The resulting neural network (the phenotypes) is also equipped with learning algorithms that train them using stimulus data set. This evaluation of a phenotype determines the fitness of its corresponding genotype (Rich 1991). The evolutionary procedure works in a population of such genotypes, preferably selecting genotypes that code phenotypes with a high fitness and reproducing them. Genetic operators such as mutation, crossover and selection are used to introduce variety into the population and to test variants of candidate solutions represented in the current population. In this way, over several generations, the population gradually will move towards genotypes that correspond to phenotypes with high fitness (Srivastava 1998). In this work, the genotype only codes the architecture of a neural network with forward connections. The training of the weights for those connections is carried out by various learning algorithms (Figure 4.1).

4 73 Offspring Decoding Trained ANN Genotype Phenotype Selection Crossover Mutation Fitness Evaluation ANN training Figure 4.1 Optimisation of ANN by genetic algorithm The hybrid algorithm for generation of neural networks uses direct coding scheme and develops the following steps: Step1: Step2: Step3: Step4: Step5: Step6: Step7: Create an initial population of individuals (neural network) with random topologies and learning parameters. Train each neural network with a learning algorithm. Select parents from the population by selection operator. Calculate the fitness f(x) of each string x in the population. Recombine both parents by cross over operator with probability p c to produce two offspring. Mutate each offspring randomly by mutation operator with probability p m and place it in the new population. Train each child network using learning algorithm. Replace the offspring into the population. Repeat steps from 2 to 6 for a given number of generations. For computation of genetic algorithms, it maintains a collection of samples called population of strings. The computation model was given by

5 74 Vose and Liepins (1991). From the initial population subsequent populations are computed by employing three genetic operators. These operators are selection, crossover and mutation. Treatments of various lengths are found from Neuro-Solution manual Selection Operator Selection is a genetic operator that chooses a chromosome from the current generation s population for inclusion in the next generation s population. Before making it into the next generation s population, the selected chromosomes depending upon the probability of crossover and mutation would undergo crossover and / or mutation. The purpose of selection is to emphasize the fitter individuals in the population with the hope that their offspring will in turn have even higher fitness (Melanie 1999). Selection has to be balanced with variation from crossover and mutation. If selection is too strong then suboptimal highly fit individuals would take over the population, reducing the diversity needed for further change and progress. If selection is too weak it will result in too slow evolution. Different selection methods like Roulette, Tournament etc., are available Crossover Operator Crossover is a genetic operator that mates two chromosomes from parental set to produce a new offspring chromosome. The idea behind crossover is that the new chromosome may be better than both of the parents if it takes the best characteristics from each of the parents. Crossover occurs during evolution according to a user-definable crossover probability, P c. The crossover can exhibit high simultaneous levels of preservation, survival and construction since it shares information between fit individuals. Of the three genetic operators crossover is the most crucial in obtaining global results. It is responsible for mixing the partial information contained in the strings of population. Based on empirical evidence, it has been found that reasonable

6 75 values for the probability of crossover are in the range of 0.6 P c 0.99 (Jong 1975, Grefenstette 1986, Schaffer 1989). Different crossover operators like one point crossover, uniform crossover etc., are available Mutation Operator Mutation is a genetic operator that alters one ore more gene values in a chromosome from its initial state. It is an important part of the genetic search as it helps to prevent the population from stagnating at any local optima. As mutation is considered to be disruptor of new schemas, it is kept in low value and is often implemented with a parameter that is constant during genetic algorithm search. This can result in entirely new gene values being added to the gene pool. With these new gene values, the genetic algorithm may be able to arrive at better solution than was previously possible. There are various mutation operators like flip bit, boundary, Gaussian and uniform. The mutation forces diversity in the population and explores the search space allowing the search to overcome local minima. Applying mutation too frequently would result in destroying the highly fit strings in the population which may be slow and impede convergence to the solution. Empirically, it has been found that reasonable values for the probability of mutation are in the range of 0.01 P m (Jong 1975, Grefenstette 1986, Schaffer 1989) Parameters Setting for Genetic Algorithm The next criterion to make in implementing a genetic algorithm is to decide the values for the various parameters, such as population size, crossover rate and mutation rate. These parameters typically interact with one another nonlinearly, so they cannot be optimized one at a time. Jong's experiments indicated that the best population size was individuals, the best single point crossover rate was ~0.6 per pair of parents and the best mutation rate was per bit (Jong 1975). Schaffer et al found that the best

7 76 settings for population size, crossover rate and mutation rate were independent of the problem in their test suite (Schaffer 1992). These settings were similar to those found by Grefenstette : population size 20 30, crossover rate and mutation rate (Grefenstette 1986). For this research work, the design parameters such as number of inputs, number of hidden layers, learning rate, learning parameter and number of processing element in the hidden layer are genetically optimized. The population size is fixed as 50 with maximum generation as 100. The operators of genetic algorithm including the selection, crossover and mutation were varied to obtain required sensitivity and specificity. There are numerous selection schemes in which those selection schemes available in Neuro- Solution software were employed in this work. They are roulette, tournament, top selection and best selection. Different crossover methods like one-point, two-point and uniform were employed. The mutation was implemented with a parameter that was constant during genetic algorithm search and kept as 0.01 uniform type mutation. For one set of data, these parameters were applied and the performances were measured as shown in the Figure 4.2, 4.3, 4.4 and Table 4.1. Figure 4.2 Performance of LM algorithm

8 77 Figure 4.3 Performance of GA with LM algorithm: Selection-Roulette, Crossover- Uniform, Mutation - Uniform Figure 4.4 Performance of GA with LM algorithm: Selection-Best, Crossover- Uniform, Mutation - Uniform

9 78 Table 4.1 Performance of GA with different Selection and Crossover operators Selection Operator Crossover Operator One Point Two Point Uniform Generation (Average Fitness) MSE Generation (Average Fitness) MSE Generation (Average Fitness) MSE Roulette Tournament Top Best From the above figures and table, the performance measures can be seen and the average fitness of all individuals that was evaluated over t evaluation step was analysed between various genetic optimization models. Of these models, Best selection with uniform crossover with cross over probability 0.9 and mixing ratio 0.5 gave better results. Hence, in this work the selection method was kept as best selection method, the cross over operator as uniform cross over with probability 0.9 and mixing ratio as 0.5 with uniform mutation as CLASSIFICATION METHODOLOGY The performance characteristic of adaptive systems is utilized directly to change the parameters through systematic procedures called learning or training rules, so that the system output improves with respect to the desired goal. Before learning, the data set was divided into training phase and testing phase Training Phase During the learning process, the network adjusts its parameters and the synaptic weights, in response to a stimulus input so that its actual output

10 79 converges to the desired output. At this level the synaptic weights of each processing unit are dynamically modified to reach a defined error level according to optimization criteria called learning algorithm. This is in order to identify the best architecture with a given number of neurons for a specific problem. When the error minimizes below a threshold level and when the actual output response is the same as the desired one, the network has completed the learning phase Testing Phase Once the learning phase is completed, the network is subjected to the test inputs. Even if the error of learning samples classification reaches zero, there is a probability that samples with non identical but similar characteristics can be misclassified. The testing set embrace samples with similar but not identical characteristics with the learning ones Cross validation More features, more hidden units and longer training times enable the neural network to learn the data in its training set with greater accuracy. Sometimes due to over fitting, instead of learning how to approximate the function presented in the data, the network can simply memorize every training example. The noise in the training data could then be memorized as part of the function, often destroying the skills of the network to generalize. Having good generalization as a goal, it becomes very difficult to realize the stopping criterion looking only at the training learning curve. In particular, it is possible that the network ends up over fitting the training data if the training session is not stopped at the right time. Two popular techniques for generalizing artificial neural network models are Bayesian Regularization and the cross validated early stopping method. As this research work utilise Neuro-Solutions tool, the cross validated early stopping method is used for improved generalisation. The beginning of over fitting can be found by using

11 80 crossed validation in which the training exemplars are split into a training subset and a validation subset. The training subset is used to train the network in the usual way, except for a little modification: the training session is periodically stopped (every a certain number of epochs) and the network is evaluated with the validation set after each training period. The early stopping heuristic suggests that the minimum point on the validation learning curve should be used as an approach to stop the training session (Fiezelew 2007) Parameter Setting for Various Learning Algorithms After each pass through the training set, training was suspended and each vector in the cross validation set was fed to the neural network's input units. The value produced at the neural network's output unit in response to each vector was compared with the desired output value. From this the MSE between the desired and actual output values was calculated over the entire cross validation set. The criterion for when to stop training was the number of passes through the training set that minimised the MSE on the cross validation set. To account for the possibility of local minima in error space each architecture was trained several times using a different random initialisation of the weights set on each occasion. This was then repeated using the cross validation set as the training set and vice versa. For each learning technique of the ANN, the performance was evaluated by Mean Squared Error (MSE), Normalized Mean Squared Error (NMSE), Correlation Coefficient (r) and percentage of recognisation. Generally while designing ANN, for all problems, one hidden layer is sufficient. Two hidden layers are required for modeling data with discontinuities such as a saw tooth wave pattern. Using two hidden layers rarely improves the model and it may introduce a greater risk of converging to a local minima. There is no theoretical reason for using more than two hidden layers. Hence in this work for all learning algorithms, the architecture was

12 81 built with maximum of two hidden layers. A simple experiment was conducted with different activation functions for neurons at hidden layer (Figure 4.5). The activation function at the output layer was kept as softmax axon as suggested in the Neuro-Solution tool manual. Figure 4.5 Percentage classifications for different transfer functions at hidden layer The percentage classification from the Figure 4.5 suggests that tanh activation is better and hence it was used as the activation function for all neurons in the hidden layer for various learning algorithms. However for the output neurons softmax activation function was used. 4.4 DATA SET The obtained exemplars from the sensory array were divided into three sets for training, cross validation and testing. Each exemplar consists of 12 data which signifies the response from 12 sensors. The data set obtained for E.coli identification was 75 and this was referred throughout this chapter as data set 1 (DS1). Similarly, the data set for discriminating the life stages of E.coli was 100 and this was referred throughout this chapter as data set 2 (DS2).

13 82 The typical range of training exemplars lies between 30% to 90% of the complete data set and the typical range of cross validation exemplars lies between 10% and 90%. In this work to find optimum number of hidden layer neurons for deciding the network configuration, the numbers of exemplars for each class were kept almost equal and the data set was divided into training, cross-validation and testing sets with a ratio of 70-80%, 10-15% and 10-15% respectively Data Set 1 (DS1) Out of 75 sets of readings, Seventy percent of the randomised sample data set were used as the training set (53 samples), 10 samples for cross validation and 12 samples for testing. The four different bacteria groups with one control medium were now further grouped into 3 groups as E.coli, control and other pathogens. This grouping is done in order to identify E.coli and discriminate from others including the control medium. The number of input neurons utilized is 12 and the number of output neurons is Data Set 2 (DS2) The data set consists of 100 samples, in which seventy eight randomised samples were utilized for training, 7 for cross validation and 15 for testing. The output is supposed to be grouped into four groups as Lag, Log, Stationary and Death phase. The number of input neurons utilized is 12 and the number of output neurons is MULTI LAYER PERCEPTRON (MLP) The most common neural network model is the multi layer perceptron (MLP). It is a feed forward network (Figure 4.6). This network learns a correct mapping between input and output patterns via a learning algorithm. The process of learning used here is supervised learning.

14 83 Figure 4.6 Architecture of Multi Layer Perceptron Network Design Approach The most common form of learning is back propagation (BP) as it is easy to use, with few parameters to adjust and is applicable to a wide range of problems. Here the weights are changed based on their previous value and a correction term using generalized delta learning rule (Haykin 2003). The error function (E) on input pattern is given as Equation (4.1) 1 E t y 2 P 2 ( k k) (4.1) k 1 where y k is the actual output for the k th pattern and t k is desired output. P is the total number of training patterns. However there are several shortcomings for this algorithm such as inability to know generation of arbitrary mapping procedure and slow learning. Several alternative algorithms were developed to improve the training speed and to obtain global minimum such as back propagation with momentum, quick propagation etc (Fausett 1994). The MLP model used in this study was learned using different algorithms such as back propagation with an adaptive learning rate and

15 84 momentum, Conjugate Gradient, Quick Propagation, Delta-bar-delta and Levenberg Marquardt algorithm. All weights were initially set to small random values. Then a set of training inputs were presented sequentially to the network. The training epoch was fixed as All samples were subjected to batch learning. Recommended values are used for all learning parameters Momentum Empirical evidence shows that the use of a term called momentum in the back propagation algorithm can be helpful in speeding the convergence and avoiding local minima. This momentum term is known as heavy ball method in numerical analysis. The role of momentum is to filter out rapid changes in error surface (Phansalkar 1994). The momentum keeps the weight changes going in the same direction even when a local minimum is encountered (Drago 1995). The momentum term is added to the weight update equation and the value of momentum should be in the range of 0 to 0.9 with learning rate as 0.5 to 0.9 (Kevin 2005). The idea about using a momentum is to stabilize the weight change by making nonradical revisions using a combination of the gradient decreasing term with a fraction of the previous weight change (Yu 1993). The change in weight ( w) is given by Equation (4.2) wi () E/ wi () wi ( 1) (4.2) where is weight change step, is the learning rate and i is the index of the current weight change. In this work, the initial weights were set up to 0.5 randomly and the non linearity offset was set to The momentum and the learning rate

16 85 were varied from 0.1 to 0.9 and from 0.5 to 0.9 respectively. The performance analyses of this learning algorithm for various architectures were given in the Figures 4.7, 4.8, 4.9 and Figure 4.7 Performance measure of MLP with momentum for various neuron elements in one hidden layer (DS1) Figure 4.8 Performance measure of MLP with momentum for various neuron elements in two hidden layer (DS1)

17 86 Figure 4.9 Performance measure of MLP with momentum for various neuron elements in 1 hidden layer (DS2) Figure 4.10 Performance measure of MLP with momentum for various neuron elements in two hidden layers (DS2) From the figures it can be noted that the performance is good with momentum values 0.9 for hidden layer and 0.5 for outputlayer. However, this

18 87 learning algorithm does not perform well with classification as it is verified from the percentage error, it obtained in training and cross validation set Conjugate Gradient The Conjugate gradient (CG) technique was developed by Hestenes and Stiefe in 1952 and it was upgraded by Moller in 1993 (Fletcher 1964). As an optimization technique, the conjugate gradient works with large number of weights. It performs a series of line-searches across the error surface. It determines the direction of steepest descent and then projects a line in that direction to locate the minimum then makes an update in weights once per epoch. Another search is then performed along a conjugate direction from this point. This direction is chosen to ensure that all other directions that have been minimised stay at global minimum. It does this in the assumption that the error surface is quadratic. If the quadratic assumption is wrong and the chosen direction does not slow downward, it would then calculate a line of steepest descent and search in that direction (Nawi 2006). Each epoch involves searching a specific direction. This results in a search that does not generally follow the steepest descent, but it often produces a faster convergence that a search along the steepest direction as it searches in one direction at a time. As the algorithm moves closer to minimum point, the quadratic assumption is more likely to be true and the minimum is then located quickly (Bayati 2009). The update equation is given by Equation (4.3) k 1 k 2 k k w w x (4.3) where is the learning rate, k is the error at iteration step k, x k is the input value to the weight i at iteration k and w k is the value of weight i at iteration k. In this work, the initial weights were set up to 0.5 randomly and the non linearity offset was set to The learning rate is varied between 0.5

19 88 and 0.9. Similar to MLP with momentum, the performance measures can be analyzed with the figures 4.11, 4.12, 4.13 and Figure 4.11 Performance measure of MLP with CG for various neuron elements in one hidden layer (DS1) Figure 4.12 Performance measure of MLP with CG for various neuron elements in two hidden layers (DS1)

20 89 Figure 4.13 Performance measure of MLP with CG for various neuron elements in one hidden layer (DS2) Figure 4.14 Performance measure of MLP with CG for various neuron elements in two hidden layers (DS2)

21 90 From the figures, it can be noted that the performance is satisfactory with five neurons in one hidden layer for DS1 and DS2. For two hidden layers the architecture for DS1and for DS2 give comparatively minimum MSE However, this learning algorithm does not perform well with classification as it was verified from percentage error obtained in training and cross validation set Levenberg Marquardt The Levenberg Marquardt (LM) algorithm is basically a Hessianbased algorithm for nonlinear least square optimization. Hessian-based algorithms allow the network to learn more subtle features of a complicated mapping (Deepak 2005). The training process converges quickly as the solution is approached, because the Hessian does not vanish at the solution. By LM algorithm, all inputs are presented to the network and their corresponding outputs are computed. For those outputs, mean squared errors are calculated (Fletcher 1987). Then the Jacobian matrix, J(z) is computed where z represents the weights and biases of the network. After that the Levenberg-Marquardt weight update equation to obtain z is solved. The error is recomputed using z + z. If this new error is smaller than the previously computed error, the training parameter is reduced by the factor. If the error is not reduced, then the factor is increased by +. This is repeated until the training process reaches a minimum error. The and + are defined by user. The algorithm is assumed to have converged when the norm of the gradient is less than some predetermined value, or when the error has been reduced to some error goal. The weight update vector z is calculated by Equation (4.4) and (4.5) [ ( ) ( ) ] T 1 T z J z J z I J E (4.4)

22 91 where E is a vector of size P calculated as E [ t yt y... t y ] T p p (4.5) Here J T (z)j(z) is referred to as the Hessian matrix where I is the identity matrix, is the learning parameter. For = 0 the algorithm becomes Gauss- Newton method. For very large the LM algorithm becomes steepest decent or the error back propagation algorithm. The parameter is automatically adjusted for each iteration in order to secure convergence. The LM algorithm requires computation of the Jacobian J(z) matrix and the inversion of J T (z)j(z) square matrix at each iteration step. The initial value of µ is set to be The value is incremented by a factor of + µ by 10 and decremented by a factor of - µ by 0.1. The performance measures are analysed in Figures 4.15, 4.16, 4.17 and Figure 4.15 Performance measure of MLP with LM for various neuron elements in one hidden layer (DS1)

23 92 Figure 4.16 Performance measure of MLP with LM for various neuron elements in two hidden layers (DS1) Figure 4.17 Performance measure of MLP with LM for various neuron elements in one hidden layer (DS2)

24 93 Figure 4.18 Performance measure of MLP with LM for various neuron elements in two hidden layers (DS2) From the figures, it can be noted that the performance is satisfactory with architecture for DS1 and for DS2. However, this learning algorithm does perform well with classification as it is verified from percentage error, it gives in training and cross validation set Quick Propagation Quick Propagation (QP) was formulated by Patterson in It is a batch update algorithm. It works out the average gradient of the error surface across all cases before updating the weights once at the end of the epoch. QP works by making the assumption that the error surface is locally quadratic, with the axes of the hyper-ellipsoid error surface aligned with the weights. This algorithm converges on the minimum very rapidly. Weight changes ( w) are calculated using the quick propagation formula as Equation (4.6) wt ( ) a wt ( 1) (4.6) where a is the acceleration coefficient. In this work, the initial weights were set up to 0.5 randomly and the non linearity offset was set to The

25 94 performance analyses of this learning algorithm for various architectures are given in the Figures 4.19 and Figure 4.19 Performance measure of MLP with QP for various neuron elements in one hidden layer (DS1) Figure 4.20 Performance measure of MLP with QP for various neuron elements in one hidden layer (DS2)

26 95 From the figures, it can be noted that the performance is good with momentum value as 0.8 for hidden layer and 0.5 for output layer. For DS1, the architecture of and for DS2, the architecture were found to have better performance than others Delta Bar Delta Delta bar delta was developed by Robert Jacobs in 1988 to improve the learning rate of standard back-propagation networks. The delta bar delta network utilizes the same architecture as a back-propagation network. The difference of delta bar delta lies in its unique algorithmic method of learning. The delta bar delta paradigm uses a learning method where each weight has its own self-adapting coefficient. It also does not use the momentum factor of the back-propagation architecture. The remaining operations of the network such as feed forward recall are identical to the normal back-propagation architecture. Delta bar delta is a heuristic approach to training artificial networks. It means that the past error values can be used to infer error values in future. Knowing the probable errors enables the system to take intelligence steps in adjusting the weights. Every connection weight of a network should have its own learning rate. The claim is that the step size appropriate for one connection weight may not be appropriate for all weights in that layer. Further these learning rates should be allowed to vary over time. By assigning a learning rate to each connection and permitting this learning rate to change continuously over time, more degrees of freedom are introduced to reduce the time to convergence. By permitting different learning rates for each connection weight in a network, the connection weights are updated on the basis of the partial derivatives of the error with respect to the weight itself. The weight updating formula is given by Equation (4.7).

27 96 if sij ( n 1) Dij ( n) 0 nij ( n 1) n ij ( n 1) if sij ( n 1) Dij ( n) 0 0 Otherwise (4.7) where is the learning rate for each weight, D ij (n) is the gradient and S ( n) (1 ) D ( n 1) S ( n 1) where is a small constant. ij ij ij In this work, the initial weights were set up to 0.5 randomly and the non linearity offset was set to The learning rate is varied between 0.5 to 0.9. The performance analyses of this learning algorithm for various architectures are given in the Figures 4.21, 4.22, 4.23 and Figure 4.21 Performance measure of MLP with for various neuron elements in one hidden layer (DS1)

28 97 Figure 4.22 Performance measure of MLP with for various neuron elements in two hidden layers (DS1) Figure 4.23 Performance measure of MLP with for various neuron elements in one hidden layer (DS2)

29 98 Figure 4.24 Performance measure of MLP with for various neuron elements in two hidden layers (DS2) From the figures, it can be noted that performance is satisfactory with architecture for DS1and with architecture for DS2. However, this learning algorithm does not perform well with classification as it is verified from percentage error, it gives in training and cross validation set Performance Measures for MLP for Various Learning Algorithms A breadboard created for MLP using Neuro solution software is shown in Figure 4.25 and it contains genetic control inspector, learning inspector (here Levenberg Marquartdt), Axon inspector and Tanh axon inspector. To find then best learning out of different algorithms, the data set DS1 and DS2 were subjected to training, cross validation and testing.

30 99 Figure 4.25 Active Breadboard for MLP For learning using momentum, bread boards were saved with one and two hidden layers with various initial weights. Out of various training, better result was obtained from architecture with two hidden layers, nine processing elements in first hidden layer and eight processing elements in second hidden layer for DS1. The momentum value was 0.9, 0.9 and 0.5 for first hidden, second hidden and output layer respectively. This architecture gave correlation coefficient of 47.9% and MSE as for cross validation. For log phase, it has recognized for 25%, for lag, stationary and death phase as 80%, 66.6% and 100% respectively. For quick propagation, the momentum values were kept at 0.8 and 0.5 for hidden and output layer respectively. For DS2, the architecture with one hidden layer with 14 processing elements gives better result. Similarly for all learning algorithms, the satisfactory rated results were tabulated for their suited architectures depending upon the correlation percentage and MSE on cross validation set (Figure 4.26 and 4.27).

31 100 Figure 4.26 Performance measure of MLP for various training algorithms (DS1) Figure 4.27 Performance measure of MLP for various training algorithms (DS2) From the Figures 4.26 and 4.27, it can be noted that for E.coli discrimination (DS1) and for life phase identification, LM learning algorithm gives less percentage errors than other algorithms. It gives the correlation

32 101 coefficient of 99.71% and MSE of for DS1. For DS2, LM gives correlation coefficient of 93.93% and MSE of PRINCIPAL COMPONENT ANALYSIS NEURAL NETWORK (PCANN) Principal Component Analysis Neural Network is a feed forward neural network. It is usually trained by an external teacher i.e., by supervised learning. In supervised learning, the network separates the input parameter space on the basis of features associated with it during learning. A feed forward network may also be unsupervised according to some learning rules which impose a certain condition on its output. The unsupervised feed forward neural networks measure the correlation of the input data, identify certain features or perform principal component analysis (Hertz 1991). Principal Component Analysis (PCA) network is simply a procedure for plain Hebbian learning with constrained weight vector growth and measures the correlation of input data by identifying certain features. The Hebbian learning rule is a biologically inspired scheme, which is strongly influenced by unsupervised learning (Kung 1994). For unsupervised training of feed forward networks, two robust learning rules are available for implementing Hebbian principle and they were Oja s and Sanger s learning rule. Out of these two, Sanger s rule was preferred for PCA as it naturally orders the PCA components by magnitude Sanger s PCA Sanger proposed a learning rule for unsupervised training of single layered neural network formed by linear units implementing PCA network (Figure 4.28). This network consists of a Sanger s synapse that compresses the input data down to its N largest "components", where N denotes the number of processing elements in the axon (Sanger 1989, Kung 1994,

33 102 Haykin 2003). Thus, the actual information contained in the input can be efficiently compressed. Figure 4.28 General architecture for Sanger s PCA The Sanger s rule finds exactly the first N principal components and the update equation is given by Equation (4.8) j wij yjxi yj wikyk (4.8) k 1 where w ij defines the synaptic weight or connection strength between the ith input and jth output neurons, x and y are the input and output vectors, respectively and is the learning rate parameter Performance Measures for PCANN for Various Learning Algorithms As in the competitive network, the PCA network could be used as a pre-processor for a supervised network. The PCA network significantly reduces the dimensionality of the input to the supervised network, creating a

34 103 smaller, easier to train network. The outputs are related to eigen-values and can be used as input to another supervised network for classification. The supervised network used in this investigation is the Multi Layer Perceptron and it is trained with back propagation by Levenberg-Marquadt (LM) algorithm. The active bread board is shown in the Figure The performance analyses are as shown in the Figure 4.30 and Figure 4.31 for various principal components. Figure 4.29 Active Breadboard for PCANN Figure 4.30 Performance measure of Various PCANN (DS1)

35 104 Figure 4.31 Performance measure of various PCANN (DS2) From the analysis, it can be stated that 6 principal components with 8 processing elements in one hidden layer give better performance for DS1. Eight principal components, with 8 processing elements in one hidden layer give better performance for DS SUPPORT VECTOR MACHINE (SVM) The SVM is a supervised machine learning technique. Instead of minimizing error in the training data like ordinary neural networks, the Support Vector Machine minimizes bound on the expected generalization error (Haykin 2003, Gunn 1998). The benefits of the SVM are that it gives global optimum solution and the computational time is much lesser than other computing techniques. It gives a sparser model solution and minimizes the expected generalization error, rather than the empirical error, on the training data (Cortes 1995).

36 105 The SVM technique has margin on either side of a hyper plane that separates data classes. The margin is maximized by creating the largest possible distance between the separating hyper planes to reduce upper bound on the expected generalization error (Burges 1998, Xiao-Dong 2005, Pardo 2005). An optimum hyper plane can be found by minimizing the squared norm of the separating hyper planes, where the data points found to lie on its margin. These data points are called Support Vector (SV) points. The optimal solution is obtained by expanding these points only. Once the hyper plane has been created, the kernel function is used to map new points into the feature space for classification. A great benefit arises from the kernel formulation. Though it is possible to design specific kernel mappings incorporating domain knowledge that separate the training data precisely, it is much easier to use one from a family of kernel functions that represent familiar machine learning techniques. The different kernels like polynomial, Gaussian radial basis function, exponential radial basis function, multilayer perceptron, fourier series and splines are available Radial Basis Function In this work the kernel is Radial Basis Function (RBF) that places a Gaussian at each data sample. Here, RBF uses back propagation to train a linear combination of the Gaussian to produce the final result. The SVM uses the idea of large margin classifiers for training. This decouples the capacity of the classifier from the input space and at the same time provides good generalization Kernel Adatron Algorithm In this work, Kernel Adatron (KA) algorithm extended to RBF network is utilized for implementing SVM. The advantages of the Adatron algorithm are that it has simple structure and the implementation is simple and

37 106 straight forward. It allows the implementation of both hard and soft margin SVM with very few changes. This algorithm recasts the Adatron in the feature space of SVM and hence called Kernel Adatron or KA algorithm. It precomputes the inner products (or the kernel computation) and is an iterative algorithm. It maps inputs to a high dimensional feature space and then optimally separates data into their respective classes by isolating these inputs which fall close to the data boundaries. Therefore, this algorithm is especially effective in separating sets of data which share complex boundaries. The step size should be experimentally chosen. The algorithm is as given below: Step 0: Define m fad( xi) yi yj jk( xi, xj) b j 1 M min f ( x), i {1,..., m} AD AD i Step 1: Initialization setup: Lagrange multiplier i, i {1,..., m}, learning rate, bias b and a small threshold t Step 2: While MAD t Step 3: Choose pattern x, j {1,..., m} Step 4 : Calculate a update (1 f ( x)) i i AD i Step 5: If ( ) 0,, b b y Step 6: End while i i i i i i i This algorithm is called kernel Adatron and can adapt an RBF to have an optimal margin.

38 Performance Measures for SVM The typical SVM built is shown by the active bread board (Figure 4.32). The first three components implement the expansion of the dimensionality by having a gaussian for each input. The second three components implement the large margin classifier that trains the parameters. The step size was varied from 0.01 to 0.1 and the performance can be analysed by the Figures 4.33 and From these figures, it can be noted that the step size 0.02 gives satisfactory results for DS1 and step size 0.06 gives satisfactory results for DS2. Figure 4.32 Active Breadboard for SVM. Figure 4.33 Performance measure of SVM for various step size (DS1)

39 108 Figure 4.34 Performance measure of SVM for various step size (DS2) 4.8 INFERENCE Out of different algorithms used, MLP with LM proved to be efficient than others which has been explained in Chapter Six.

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Image Compression: An Artificial Neural Network Approach

Image Compression: An Artificial Neural Network Approach Image Compression: An Artificial Neural Network Approach Anjana B 1, Mrs Shreeja R 2 1 Department of Computer Science and Engineering, Calicut University, Kuttippuram 2 Department of Computer Science and

More information

11/14/2010 Intelligent Systems and Soft Computing 1

11/14/2010 Intelligent Systems and Soft Computing 1 Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? Gurjit Randhawa Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done? A blind generate

More information

11/14/2010 Intelligent Systems and Soft Computing 1

11/14/2010 Intelligent Systems and Soft Computing 1 Lecture 8 Artificial neural networks: Unsupervised learning Introduction Hebbian learning Generalised Hebbian learning algorithm Competitive learning Self-organising computational map: Kohonen network

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting. Mohammad Mahmudul Alam Mia, Shovasis Kumar Biswas, Monalisa Chowdhury Urmi, Abubakar

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism) Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent

More information

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms Topological Machining Fixture Layout Synthesis Using Genetic Algorithms Necmettin Kaya Uludag University, Mechanical Eng. Department, Bursa, Turkey Ferruh Öztürk Uludag University, Mechanical Eng. Department,

More information

V.Petridis, S. Kazarlis and A. Papaikonomou

V.Petridis, S. Kazarlis and A. Papaikonomou Proceedings of IJCNN 93, p.p. 276-279, Oct. 993, Nagoya, Japan. A GENETIC ALGORITHM FOR TRAINING RECURRENT NEURAL NETWORKS V.Petridis, S. Kazarlis and A. Papaikonomou Dept. of Electrical Eng. Faculty of

More information

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

CHAPTER VI BACK PROPAGATION ALGORITHM

CHAPTER VI BACK PROPAGATION ALGORITHM 6.1 Introduction CHAPTER VI BACK PROPAGATION ALGORITHM In the previous chapter, we analysed that multiple layer perceptrons are effectively applied to handle tricky problems if trained with a vastly accepted

More information

The Genetic Algorithm for finding the maxima of single-variable functions

The Genetic Algorithm for finding the maxima of single-variable functions Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 46-54 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.com The Genetic Algorithm for finding

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Experimental Data and Training

Experimental Data and Training Modeling and Control of Dynamic Systems Experimental Data and Training Mihkel Pajusalu Alo Peets Tartu, 2008 1 Overview Experimental data Designing input signal Preparing data for modeling Training Criterion

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) SKIP - May 2004 Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) S. G. Hohmann, Electronic Vision(s), Kirchhoff Institut für Physik, Universität Heidelberg Hardware Neuronale Netzwerke

More information

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM

IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM Annals of the University of Petroşani, Economics, 12(4), 2012, 185-192 185 IMPROVEMENTS TO THE BACKPROPAGATION ALGORITHM MIRCEA PETRINI * ABSTACT: This paper presents some simple techniques to improve

More information

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons

Linear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,

More information

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Gómez-Skarmeta, A.F. University of Murcia skarmeta@dif.um.es Jiménez, F. University of Murcia fernan@dif.um.es

More information

Supervised Learning in Neural Networks (Part 2)

Supervised Learning in Neural Networks (Part 2) Supervised Learning in Neural Networks (Part 2) Multilayer neural networks (back-propagation training algorithm) The input signals are propagated in a forward direction on a layer-bylayer basis. Learning

More information

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 97 CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 5.1 INTRODUCTION Fuzzy systems have been applied to the area of routing in ad hoc networks, aiming to obtain more adaptive and flexible

More information

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Research on Evaluation Method of Product Style Semantics Based on Neural Network Research Journal of Applied Sciences, Engineering and Technology 6(23): 4330-4335, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2013 Submitted: September 28, 2012 Accepted:

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:

More information

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Chapter 5 A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Graph Matching has attracted the exploration of applying new computing paradigms because of the large number of applications

More information

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India. Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial

More information

AN NOVEL NEURAL NETWORK TRAINING BASED ON HYBRID DE AND BP

AN NOVEL NEURAL NETWORK TRAINING BASED ON HYBRID DE AND BP AN NOVEL NEURAL NETWORK TRAINING BASED ON HYBRID DE AND BP Xiaohui Yuan ', Yanbin Yuan 2, Cheng Wang ^ / Huazhong University of Science & Technology, 430074 Wuhan, China 2 Wuhan University of Technology,

More information

Artificial Neural Network based Curve Prediction

Artificial Neural Network based Curve Prediction Artificial Neural Network based Curve Prediction LECTURE COURSE: AUSGEWÄHLTE OPTIMIERUNGSVERFAHREN FÜR INGENIEURE SUPERVISOR: PROF. CHRISTIAN HAFNER STUDENTS: ANTHONY HSIAO, MICHAEL BOESCH Abstract We

More information

Accelerating the convergence speed of neural networks learning methods using least squares

Accelerating the convergence speed of neural networks learning methods using least squares Bruges (Belgium), 23-25 April 2003, d-side publi, ISBN 2-930307-03-X, pp 255-260 Accelerating the convergence speed of neural networks learning methods using least squares Oscar Fontenla-Romero 1, Deniz

More information

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions

Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions ENEE 739Q SPRING 2002 COURSE ASSIGNMENT 2 REPORT 1 Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions Vikas Chandrakant Raykar Abstract The aim of the

More information

A novel firing rule for training Kohonen selforganising

A novel firing rule for training Kohonen selforganising A novel firing rule for training Kohonen selforganising maps D. T. Pham & A. B. Chan Manufacturing Engineering Centre, School of Engineering, University of Wales Cardiff, P.O. Box 688, Queen's Buildings,

More information

Multi Layer Perceptron trained by Quasi Newton learning rule

Multi Layer Perceptron trained by Quasi Newton learning rule Multi Layer Perceptron trained by Quasi Newton learning rule Feed-forward neural networks provide a general framework for representing nonlinear functional mappings between a set of input variables and

More information

COMPUTATIONAL INTELLIGENCE

COMPUTATIONAL INTELLIGENCE COMPUTATIONAL INTELLIGENCE Fundamentals Adrian Horzyk Preface Before we can proceed to discuss specific complex methods we have to introduce basic concepts, principles, and models of computational intelligence

More information

Genetic Algorithms Variations and Implementation Issues

Genetic Algorithms Variations and Implementation Issues Genetic Algorithms Variations and Implementation Issues CS 431 Advanced Topics in AI Classic Genetic Algorithms GAs as proposed by Holland had the following properties: Randomly generated population Binary

More information

Training of Neural Networks. Q.J. Zhang, Carleton University

Training of Neural Networks. Q.J. Zhang, Carleton University Training of Neural Networks Notation: x: input of the original modeling problem or the neural network y: output of the original modeling problem or the neural network w: internal weights/parameters of

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically

Dr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,

More information

The Binary Genetic Algorithm. Universidad de los Andes-CODENSA

The Binary Genetic Algorithm. Universidad de los Andes-CODENSA The Binary Genetic Algorithm Universidad de los Andes-CODENSA 1. Genetic Algorithms: Natural Selection on a Computer Figure 1 shows the analogy between biological i l evolution and a binary GA. Both start

More information

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach 1 Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach David Greiner, Gustavo Montero, Gabriel Winter Institute of Intelligent Systems and Numerical Applications in Engineering (IUSIANI)

More information

Identification of intelligent computational models by evolutionary and gradient based learning algorithms

Identification of intelligent computational models by evolutionary and gradient based learning algorithms Identification of intelligent computational models by evolutionary and gradient based learning algorithms Ph.D. Thesis Booklet János Botzheim Supervisor: László T. Kóczy, Ph.D., D.Sc. Budapest University

More information

Multi-Objective Optimization Using Genetic Algorithms

Multi-Objective Optimization Using Genetic Algorithms Multi-Objective Optimization Using Genetic Algorithms Mikhail Gaerlan Computational Physics PH 4433 December 8, 2015 1 Optimization Optimization is a general term for a type of numerical problem that involves

More information

IN recent years, neural networks have attracted considerable attention

IN recent years, neural networks have attracted considerable attention Multilayer Perceptron: Architecture Optimization and Training Hassan Ramchoun, Mohammed Amine Janati Idrissi, Youssef Ghanou, Mohamed Ettaouil Modeling and Scientific Computing Laboratory, Faculty of Science

More information

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter Review: Final Exam Model for a Learning Step Learner initially Environm ent Teacher Compare s pe c ia l Information Control Correct Learning criteria Feedback changed Learner after Learning Learning by

More information

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved Lecture 6: Genetic Algorithm An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved Lec06/1 Search and optimization again Given a problem, the set of all possible

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms

MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms MLPQNA-LEMON Multi Layer Perceptron neural network trained by Quasi Newton or Levenberg-Marquardt optimization algorithms 1 Introduction In supervised Machine Learning (ML) we have a set of data points

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Dynamic Analysis of Structures Using Neural Networks

Dynamic Analysis of Structures Using Neural Networks Dynamic Analysis of Structures Using Neural Networks Alireza Lavaei Academic member, Islamic Azad University, Boroujerd Branch, Iran Alireza Lohrasbi Academic member, Islamic Azad University, Boroujerd

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

Chap.12 Kernel methods [Book, Chap.7]

Chap.12 Kernel methods [Book, Chap.7] Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Approximation Algorithms and Heuristics November 21, 2016 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff Inria Saclay Ile-de-France 2 Exercise: The Knapsack

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Approximation Algorithms and Heuristics November 6, 2015 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff INRIA Lille Nord Europe 2 Exercise: The Knapsack Problem

More information

Chapter 14 Global Search Algorithms

Chapter 14 Global Search Algorithms Chapter 14 Global Search Algorithms An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Introduction We discuss various search methods that attempts to search throughout the entire feasible set.

More information

Artificial Intelligence Application (Genetic Algorithm)

Artificial Intelligence Application (Genetic Algorithm) Babylon University College of Information Technology Software Department Artificial Intelligence Application (Genetic Algorithm) By Dr. Asaad Sabah Hadi 2014-2015 EVOLUTIONARY ALGORITHM The main idea about

More information

Introduction to Genetic Algorithms

Introduction to Genetic Algorithms Advanced Topics in Image Analysis and Machine Learning Introduction to Genetic Algorithms Week 3 Faculty of Information Science and Engineering Ritsumeikan University Today s class outline Genetic Algorithms

More information

Lecture 4. Convexity Robust cost functions Optimizing non-convex functions. 3B1B Optimization Michaelmas 2017 A. Zisserman

Lecture 4. Convexity Robust cost functions Optimizing non-convex functions. 3B1B Optimization Michaelmas 2017 A. Zisserman Lecture 4 3B1B Optimization Michaelmas 2017 A. Zisserman Convexity Robust cost functions Optimizing non-convex functions grid search branch and bound simulated annealing evolutionary optimization The Optimization

More information

GENETIC ALGORITHM with Hands-On exercise

GENETIC ALGORITHM with Hands-On exercise GENETIC ALGORITHM with Hands-On exercise Adopted From Lecture by Michael Negnevitsky, Electrical Engineering & Computer Science University of Tasmania 1 Objective To understand the processes ie. GAs Basic

More information

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems

An evolutionary annealing-simplex algorithm for global optimisation of water resource systems FIFTH INTERNATIONAL CONFERENCE ON HYDROINFORMATICS 1-5 July 2002, Cardiff, UK C05 - Evolutionary algorithms in hydroinformatics An evolutionary annealing-simplex algorithm for global optimisation of water

More information

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS NABEEL AL-MILLI Financial and Business Administration and Computer Science Department Zarqa University College Al-Balqa'

More information

In this assignment, we investigated the use of neural networks for supervised classification

In this assignment, we investigated the use of neural networks for supervised classification Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric

More information

Pattern Classification Algorithms for Face Recognition

Pattern Classification Algorithms for Face Recognition Chapter 7 Pattern Classification Algorithms for Face Recognition 7.1 Introduction The best pattern recognizers in most instances are human beings. Yet we do not completely understand how the brain recognize

More information

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization J.Venkatesh 1, B.Chiranjeevulu 2 1 PG Student, Dept. of ECE, Viswanadha Institute of Technology And Management,

More information

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological

More information

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION 6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm

More information

Hierarchical Learning Algorithm for the Beta Basis Function Neural Network

Hierarchical Learning Algorithm for the Beta Basis Function Neural Network Third International Conference on Systems, Signals & Devices March 2-24, 2005 Sousse, Tunisia Volume III Communication and Signal Processing Hierarchical Learning Algorithm for the Beta Basis Function

More information

RIMT IET, Mandi Gobindgarh Abstract - In this paper, analysis the speed of sending message in Healthcare standard 7 with the use of back

RIMT IET, Mandi Gobindgarh Abstract - In this paper, analysis the speed of sending message in Healthcare standard 7 with the use of back Global Journal of Computer Science and Technology Neural & Artificial Intelligence Volume 13 Issue 3 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

More information

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm Dr. Ian D. Wilson School of Technology, University of Glamorgan, Pontypridd CF37 1DL, UK Dr. J. Mark Ware School of Computing,

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Using Genetic Algorithms in Integer Programming for Decision Support

Using Genetic Algorithms in Integer Programming for Decision Support Doi:10.5901/ajis.2014.v3n6p11 Abstract Using Genetic Algorithms in Integer Programming for Decision Support Dr. Youcef Souar Omar Mouffok Taher Moulay University Saida, Algeria Email:Syoucef12@yahoo.fr

More information

Introduction to Design Optimization: Search Methods

Introduction to Design Optimization: Search Methods Introduction to Design Optimization: Search Methods 1-D Optimization The Search We don t know the curve. Given α, we can calculate f(α). By inspecting some points, we try to find the approximated shape

More information

Artificial Neural Networks MLP, RBF & GMDH

Artificial Neural Networks MLP, RBF & GMDH Artificial Neural Networks MLP, RBF & GMDH Jan Drchal drchajan@fel.cvut.cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical

More information

Genetic Algorithms. Kang Zheng Karl Schober

Genetic Algorithms. Kang Zheng Karl Schober Genetic Algorithms Kang Zheng Karl Schober Genetic algorithm What is Genetic algorithm? A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization

More information

Support Vector Machines (a brief introduction) Adrian Bevan.

Support Vector Machines (a brief introduction) Adrian Bevan. Support Vector Machines (a brief introduction) Adrian Bevan email: a.j.bevan@qmul.ac.uk Outline! Overview:! Introduce the problem and review the various aspects that underpin the SVM concept.! Hard margin

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Clustering with Reinforcement Learning

Clustering with Reinforcement Learning Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of

More information

DERIVATIVE-FREE OPTIMIZATION

DERIVATIVE-FREE OPTIMIZATION DERIVATIVE-FREE OPTIMIZATION Main bibliography J.-S. Jang, C.-T. Sun and E. Mizutani. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. Prentice Hall, New Jersey,

More information

Vulnerability of machine learning models to adversarial examples

Vulnerability of machine learning models to adversarial examples Vulnerability of machine learning models to adversarial examples Petra Vidnerová Institute of Computer Science The Czech Academy of Sciences Hora Informaticae 1 Outline Introduction Works on adversarial

More information

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR 1.Introductıon. 2.Multi Layer Perception.. 3.Fuzzy C-Means Clustering.. 4.Real

More information

Research Article A New Optimized GA-RBF Neural Network Algorithm

Research Article A New Optimized GA-RBF Neural Network Algorithm Computational Intelligence and Neuroscience, Article ID 982045, 6 pages http://dx.doi.org/10.1155/2014/982045 Research Article A New Optimized GA-RBF Neural Network Algorithm Weikuan Jia, 1 Dean Zhao,

More information

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation

Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters

More information

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS

MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS In: Journal of Applied Statistical Science Volume 18, Number 3, pp. 1 7 ISSN: 1067-5817 c 2011 Nova Science Publishers, Inc. MAXIMUM LIKELIHOOD ESTIMATION USING ACCELERATED GENETIC ALGORITHMS Füsun Akman

More information