Knowledge Discovery using PSO and DE Techniques

60 CHAPTER 4 KNOWLEDGE DISCOVERY USING PSO AND DE TECHNIQUES

61 Knowledge Discovery using PSO and DE Techniques 4.1 Introduction In the recent past, there has been an enormous increase in the amount of data stored in electronic format. It has been estimated that the amount of collected data in the world doubles every 18 months and the size and number of databases are increasing even faster. The ability to rapidly collect data has outpaced the ability to analyze it. We are becoming data rich, but information poor. Information is crucial for decision making, especially in business operations. As a response to those trends, the term Data Mining (or Knowledge Discovery) has been coined to describe a variety of techniques to identify nuggets of information or decisionmaking knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. Automated tools must be developed to help extract meaningful information from a flood of information. Moreover, these tools must be sophisticated enough to search for correlations among the data unspecified by the user, as the potential for unforeseen relationships to exist among the data is very high. A successful tool set to accomplish these goals will locate useful nuggets of information in the otherwise chaotic data space, and present them to the user in a contextual format.

62 An urgent need for creating a new generation of techniques is needed for automating Data Mining and Knowledge Discovery in Databases (KDD). KDD is a broad area that integrates methods from several fields including statistics, databases, AI, machine learning, pattern recognition, machine discovery, uncertainty modeling, data visualization, high performance computing, optimization, management information systems (MIS), and knowledge-based systems. The term Knowledge Discovery in databases is defined as the process of identifying useful and novel structure (model) in data [89-90]. It could be viewed as a multi-stage process. These stages are summarized as follows: Data gathering: e.g., databases, data warehouses, Web crawling. Data cleaning: eliminate errors, e.g., GPA = 7.3. Feature extraction: obtaining only the interesting attributes of data Data mining: discovering and extracting meaningful patterns. Visualization of data. Verification and evaluation of results: drawing conclusions. Data mining is considered as the main step in the knowledge discovery process that is concerned with the algorithms used to extract potentially valuable patterns, associations, trends, sequences and dependencies in data [87-90]. Key business examples include web site access analysis for improvements in

63 e-commerce advertising, fraud detection, screening and investigation, retail site or product analysis, and customer segmentation. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. Additionally, the application of Data Mining techniques further exploits the value of data warehouse by converting expensive volumes of data into valuable assets for future tactical and strategic business development. Management information systems should provide advanced capabilities that give the user the power to ask more sophisticated and pertinent questions. It empowers the right people by providing the specific information they need. Data mining techniques could be categorized either by tasks or by methods. Based on tasks they are Association Rule Discovery, Classification, Clustering, Sequential Pattern Discovery, Regression, Deviation Detection etc. Most researchers refer to the first three Data Mining tasks as the main Data Mining tasks. The rest are either related more to some other fields, such as regression, or considered as a sub-field in one of the main tasks, such as sequential pattern discovery and deviation detection. Based on methods we have followings: Algorithms for mining spatial, textual, image and other complex data Incremental discovery methods and re-use of discovered knowledge Integration of discovery methods

64 Data structures and query evaluation methods for Data Mining Parallel and distributed Data Mining techniques Issues and challenges for dealing with massive or small data sets Fundamental issues from statistics, databases, optimization, and information processing in general as they relate to problems of extracting patterns and models from data. In our work we have limited ourselves to mainly clustering task from first category and development of algorithms for mining complex data and image data from second category. In this chapter we have presented unsupervised pattern classification (clustering) studies for numerical and image datasets. First part deals with the application of PSO for clustering. In this part classical k-means algorithm is discussed and also it is simulated for many sample datasets. Next we have used DE and a new adaptable DE (ADE) for clustering. Later, suitable hybridization of PSO with ADE also modeled and experimented for clustering problems. 4.2 Approaches to Clustering Clustering algorithms are divided into two broad classes: Centroid approaches: The centroid or central point of each cluster is estimated, and points are assigned to the cluster of their nearest centroid. Hierarchical approaches: Starting with each point as a cluster by itself, nearby clusters are repeatedly merged.

65 The clustering algorithms can also be classified according to: whether or not they assume a Euclidean distance and whether they use a centroid or hierarchical approach. The following three algorithms are examples of the above classification. BFR: Centroid based, assumes Euclidean measure, with clusters formed by a Gaussian process in each dimension around the centroid. GRGPF: Centroid-based, but uses only a distance measure, not a Euclidean space. CURE: Hierarchical and Euclidean, this algorithm deals with odd-shaped clusters. The following subsections describe few popular clustering algorithms. 4.3 Centroid Clustering The k-means algorithm is a popular main-memory algorithm. k cluster centroids are picked and points are assigned to the clusters by picking the closest centroid to the point in question. As points are assigned to clusters, the centroid of the cluster may migrate. The BFR Algorithm is based on k-means, this algorithm [94] reads its data once, consuming a main-memory-full at a time. The algorithm works best if the clusters are normally distributed around a central point, perhaps with a different standard deviation in each dimension.

66 Fastmap [5] picks k pairs of points (a i, b i ), each of which pairs serves as the ends of one of the k axes of the k-dimension space. Using the law of cosines, we can calculate the projection x of any point c onto the line ab, using only the distances between points, not any assumed coordinates of these points in a plane. 4.4 Hierarchical Clustering Hierarchical clustering [5][94] is a general technique that could take, in the worst case, O(n 2 ) time to cluster n points. The General outlines of this approach are as follows. Start with each point in a cluster by itself. Repeatedly select two clusters to merge. In general, we want to pick the two clusters that are closest, but there are various ways we could measure closeness. Some possibilities: o Distance between their centroids (or if the space is not Euclidean, between their clustroids). o Minimum distance between nodes in the clusters. o Maximum distance between nodes in the clusters. o Average distance between nodes of the clusters. o End the merger process when we have few enough clusters. o Use a k-means approach -- merge until only k clusters remain.

67 o Stop merging clusters when the only clusters that can result from merging fail to meet some criterion of compactness, e.g., the average distance of nodes to their clustroid or centroid is too high. The GRGPF Algorithm [5][94] assumes there is a distance measure D, but no Euclidean space. It also assumes that there is too much data to fit in main memory. It uses the data structure like R-tree to store clusters. Nodes of the tree are disk blocks, and we store different things at leaf and interior nodes. CURE Clustering Using Representatives has been proposed by Tung, Han et al., [94], is an efficient algorithm for large databases and is more robust to outliers. This algorithm starts with a main memory full of random points using the hierarchical approach. There are many approaches in Data Mining literature to overcome some of above problems. In recent past many researchers have shown keen interest to use evolutionary computation techniques to address some of these difficulties. In our work we have contributed few such approaches using PSO and DE to overcome some disadvantages of k-means. Our main focus has been to produce clusters that do not get trapped in local optima and also do not depend on the initial random centroids. In next two sections of this chapter we highlight the PSO and DE based clustering for several datasets including image dataset.

68 4.5 Clustering using PSO Research efforts have made it possible to view data clustering as an optimization problem. This view offers us a chance to apply PSO algorithm for evolving a set of candidate cluster centroids and thus determining a near optimal partitioning of the dataset at hand. An important advantage of the PSO is its ability to cope with local optima by maintaining, recombining and comparing several candidate solutions simultaneously. In contrast, local search heuristics, such as the simulated annealing algorithm only refine a single candidate solution and are notoriously weak in coping with local optima. Deterministic local search, which is used in algorithms like the k-means, always converges to the nearest local optimum from the starting position of the search. PSO-based clustering algorithm was first introduced by Omran et al. in [36]. The results of Omran et al. [36] showed that PSO based method outperformed K-means, FCM and a few other state-of-the-art clustering algorithms. In their method, Omran et al. used a quantization error based fitness measure for judging the performance of a clustering algorithm. The quantization error is defined as: J e K r r r d( X V n X C j, i ) / j i i= = 1 K i (4.1)

69 where C i is the i-th cluster center and n i is the number of data points belonging to the i-th cluster. Each particle in the PSO algorithm represents a possible set of K cluster centroids as: r Z i (t) V r V r r i, 2,,. i, = i, 1 V, K where r V i, p refers to the p-th cluster centroid vector of the i-th particle. The quality of each particle is measured by the following fitness function: r r f ( Z i, M i ) = w1 d max ( M i, X i ) + w2 ( Rmax d min ( Z i )) + w3 r J e (4.2) In the above expression, R max is the maximum feature value in the dataset and M i is the matrix representing the assignment of the patterns to the clusters of the i-th particle. Each element m i, k, p indicates whether the pattern X r p belongs to cluster C k of i-th particle. The user-defined constants w 1, w 2, and w 3 are used to weigh the contributions from different sub-objectives. In addition, d max = max k 1,2,..., K { r X p C i, K r d( X p r, V i, k ) / n i, k } (4.3) and, d r r r Z ) = min { d( V, V )} (4.4) min ( i i, p i, q p, q, p q is the minimum Euclidean distance between any pair of clusters. In the above, n i,k is the number of patterns that belong to cluster C i,k of particle i. The fitness function is a multi-objective optimization problem, which minimizes the intra-

70 cluster distance, maximizes inter-cluster separation, and reduces the quantization error. The PSO clustering algorithm is summarized below as presented in. Step 1: Initialize each particle with K random cluster centers. Step 2: repeat for iteration_count = 1 to maximum_iteration (a) repeat for each particle i (i) repeat for each pattern X r p in the dataset calculate Euclidean distance of X r p with all cluster centroids assign X r p to the cluster that have nearest centroid to X r p r (ii) calculate the fitness function f Z i, M ) ( i (b) find the personal best and global best position of each particle. (c) Update the cluster centroids according to velocity updating and coordinate updating formula of PSO. Van der Merwe and Engelbrecht [31] hybridized this approach with the k- means algorithm for clustering general datasets. A single particle of the swarm is initialized with the result of the k-means algorithm. The rest of the swarm is initialized randomly. In 2003, Xiao et al used a new approach based on the synergism of the PSO and the Self Organizing Maps (SOM) for clustering gene expression data. They got promising results by applying the hybrid SOM-PSO algorithm over the gene expression data of Yeast and Rat Hepatocytes. Paterlini and Krink have compared the performance of k-means, GA, PSO and Differential

71 Evolution (DE) or a representative point evaluation approach to partitional clustering. The results show that PSO and DE outperformed the k-means algorithm. In [95] authors proposed a PSO based hybrid algorithm for classifying the text documents. They applied the PSO, k-means and a hybrid PSO clustering algorithm on four different text document datasets. The results illustrate that the hybrid PSO algorithm can generate more compact clustering results over a short span of time than the k-means algorithm. In this work we have simulated few datasets to reinforce the above facts. The comparisons are made with classical k-means algorithm. We have chosen basic gbest PSO and some lbest PSO models for simulation. In lbest PSO we have chosen lbest_ring and VonNeumann lbest model. We have also proposed a hybrid PSO model with K-means. In this approach first the k-means is run and the resultant best centroid is taken as one of the particle in the swarm of particles for PSO to continue its simulation. 4.5.1 Dataset Description For this work we have investigated four datasets, they are Iris plants database: This is a well-understood database with 4 inputs, 3 classes and 150 data vectors. Wine: This is a classification problem with well behaved class structures. There are 13 inputs, 3 classes and 178 data vectors. Hayes Roth which has 132 data vectors with 3 classes and 5 inputs.

72 Diabetes data set has 768 data vectors having 2 classes and 8 inputs. 4.5.2 Results of Simulations This section compares the results of the k-means and PSO algorithms on five clustering problems. The main purpose is to compare the quality of the respective clustering, where quality is measured according to the Following two criteria: the quantization error the inter-cluster distances, i.e. the distance between the centroids of the clusters, where the objective is to maximize the distance between clusters. For all the results reported, averages over 20 simulations are given. The PSO algorithms used 20 particles. The Hybrid PSO takes the seed from result of k-means clustering. This seed is considered as one particle in swarm of particles in PSO. For PSO, w is varying as per the paper [27]. The initial weight is fixed at 0.9 and the final weight at 0.4. The acceleration coefficients α 1 and α 2 are fixed at 1.042 to ensure good convergence.

73 Table 4.1 summarizes the results obtained from the five clustering algorithms for the problems cited above. The values reported are averages over 20 simulations, with standard deviations to indicate the range of values to which the algorithms converge. First, consider the fitness of solutions, i.e. the quantization error, for all data sets PSO based clustering is better than k-means. However, lbest_vonneumann provides better fitness values in terms of quantization error and inter_cluster distance for all data sets except for Wine. For Wine and Hayes Roth, Hybrid PSO gives good result. The lbest_vonneumann gives worst quantization error but comparatively good inter_cluster distance measure for these data sets. The standard deviations (std) found to be very close, thereby indicating the convergence of algorithms to better results. The results discussed in Table 4.1 presents a strong point in favour of PSO based clustering. This is the reason many researchers prefer to use PSO based clustering. However, to our knowledge there is no such PSO based clustering tool available for researchers and academicians to use and investigate. In our work we have developed a very comprehensive tool box for clustering using MATLAB for use by beginners in the research areas of clustering. Next sub-section presents a brief of the tool box developed by us.

74 Data Sets Iris Hayes Roth Wine Diabetes Algorithm Quantization error, Intercluster distance, std std k-means 0.834, 0.21075 9.3238, 1.6821 PSO_gbest 0.026209, 0.017964 16.41, 2.4693 lbest_ring 0.026615, 0.014664 21.079, 3.8171 lbest_vonneumann 0.011477, 0.019458 20.278, 02204 Hybrid PSO 0.69743, 0.019081 18.598,.65266 k-means 11.961, 1.573 8.9353, 1.2419 PSO_gbest 0.77086, 0.0408 324.25, 5.7895 lbest_ring 3.99,3.3429 313.1, 3.4562 lbest_vonneumann 3.8265, 0.98856 350.73, 23.272 Hybrid PSO 0.55914, 1.6488 325.51, 66.738 k-means 116.29, 0.83715 2019.2,0.234 PSO_gbest 10.765, 3.7278 3272.8, 292.89 lbest_ring 33.622, 7.6328 2859.7, 339.91 lbest_vonneumann 11.709, 1.6749 3450.8, 222.42 Hybrid PSO 4.9763, 3.3043 3598.7, 483.11 k-means 78.984, 7.6654 20.92, 3.332 PSO_gbest 69.222, 2.4839 30.12,2.719 lbest_ring 36.98, 2.397 35.108, 2.475 lbest_vonneumann 30.205, 2.6501 38.1074,2.4714 Hybrid PSO 48.545,3.097 32.958,3.471 Table 4.1 Comparison of clustering Results

75 4.5.3 PSO Clustering Tool Box Both in the academia and in the industry, the scope of research in the area PSO is expanding rapidly. It is therefore worthwhile to consider giving good quality learning to the beginners. This work is done using MATLAB (Matrix Laboratory). MATLAB is a computational environment for modeling, analyzing and simulating a wide variety of intelligent systems. It also provides a very good access to the students by providing a numerous design and analysis tools in Fuzzy Systems, Neural Networks and Optimization tool boxes. In this work, a software tutorial for PSO has been developed to aid the teaching of PSO concepts and its applications to data clustering. The software developed offers facilities to simulate classical k-means clustering algorithm, PSO clustering and overall performance can be improved by suitable hybridization of k-means and PSO. Our software provides the scope of experimenting with various possibilities of hybridization and the learner can choose different tuning parameters for PSO along with suitable particle sizes and iterations to obtain better clustering performances. The software is GUI based and supported by various plots and graphs for reinforcing the derived results. The ease of operation of the software is the hall mark of it. In this work we tried hybridization in two ways. The first one is k-means + PSO technique, where in the k-means clustering algorithm is executed once; the result of this algorithm is used to seed the initial swarm, while the rest of the

76 swarm is initialized randomly. PSO algorithm is then executed as discussed in this thesis before. The second one is PSO + k-means technique. In this, first PSO algorithm is executed once; the resultant gbest is used as one of the centroids for k-means, while the rest of the centroids are initialized randomly. k-means algorithm is then executed. Our software offers the facilities of exploring these possibilities with various options of choosing parameters and number of iterations to investigate the ideas. As the name says, this software tutorial is for learning, how PSO can be applied in the area of clustering. The main idea is to involve the user for setting the swarm size and other parameters of PSO clustering algorithm. For this application, three data sets have been taken namely Iris, wine and breast cancer (collected from UCI machine repository). As the data sets considered for this application are pre classified, the number of clusters taken is same as that of their number of classes. The graphical user interface for the k-means clustering algorithm is given in Fig 4.5.

77 Figure 4.5 Results of k-means clustering on three datasets Here, for this application, the result of clustering is shown in terms of intra class and inter class similarities and also quantization error, which are tabulated in Table 4.2. A confusion matrix is also given, where an accuracy test can be made between the expected clusters and actual clusters. The time taken by the algorithm to cluster the data is also given. The screen shot is given below in Fig 4.6 for Iris, Wine and breast cancer datasets respectively. Measures/datasets Iris Wine Cancer Intra cluster distance 1.94212 293.617 671.53 Inter cluster distance 10.167 1474.22 1331.33 Quantization error 0.647374 97.0723 335.765 Time in sec 24.305 1.7562 6.75965 Table 4.2 Results of k-means Clustering

78 Figure 4.6 Results of PSO based Clustering Here, in Fig 4.6, scope is given for the user, to specify all the PSO parameters like swarm size, inertia of weight, and acceleration coefficients. The results of clustering are shown in the same way as in k-means clustering and it is tabulated in Table 4.3. As a sample results are computed taking swarm size as 3, inertia weight as 0.72 and c1 and c2 both 1. However, the user can play with this software giving any values to see how the PSO clustering algorithm performs. Swarm size = 3, Inertia of weight = 0.72, c1 = 1 and c2 = 1 Measures/datasets Iris Wine Cancer Intra cluster distance 0.648096 145.849 222.833 Inter cluster distance 3.37355 749.14 432.382 Quantization error 0.216032 48.6163 111.416 Time in sec 19.9997 12.9741 76.1937 Table 4.3 Results of gbest PSO Clustering

79 Swarm size = 3, Inertia of weight = 0.72, c1 = 1 and c2 = 1 Measures/datasets Iris Wine Cancer Intra cluster distance 0.986767 148.68 334.202 Inter cluster distance 4.95916 811.311 640.836 Quantization error 0.328922 49.5601 167.101 Time in sec 12.6541 14.3183 43.847 Table 4.4 Results of k-means + PSO Clustering algorithm Swarm size = 3, Inertia of weight = 0.72, c1 = 1 and c2 = 1 Measures/datasets Iris Wine Cancer Intra cluster distance 0.621062 142.808 220.765 Inter cluster distance 5.08348 737.112 665.667 Quantization error 0.223687 47.9361 111.882 Time in sec 8.75372 10.8275 38.1585 Table 4.5 Results of PSO + k-means Clustering algorithm This tool gives a very good scope for the students who wants to experiment PSO clustering algorithm, by taking different sets of parameters. 4.6 Comparative Analysis with Experimental Results This software tutorial gives a comparative study on all the four clustering algorithms, k-means, PSO, k-means + PSO and PSO + k-means. According to the experimental results obtained as shown in Table 4.4 and Table 4.5, it is proven that the accuracy rate of PSO + k-means (Hybrid algorithm) is high when compared to that of other three algorithms. The Fitness curve graph is given in Fig 4.7.

80 Figure 4.7 Fitness Curves Here, the intra and inter cluster distances, quantization error and the time are marked with blue, red, green and black colours respectively. The graph shows the variations of intra and inter cluster distances and quantization error fitness curves among the four clustering algorithms: k-means, PSO, k-means + PSO and PSO + k-means respectively. For the graph, the intra and inter cluster distances and quantization error are marked with blue, red and green colours respectively. 4.7 Clustering using DE Data clustering is a process of grouping a set of data vectors into a number of clusters or bins such that elements or data vectors within the same cluster are similar to one another and are dissimilar to the elements in other clusters.

81 Clustering algorithms can be grouped into two main classes of algorithms, namely Supervised and Unsupervised. With Supervised clustering, the learning algorithm has an external teacher that indicates the target class to which the data vector should belong. For unsupervised clustering, a teacher does not exist, and data vectors are grouped based on distance from one another. Data clustering can be hierarchical or partitional. In this work we have confined our discussion to partitional clustering only. Partitional clustering algorithms, attempt to decompose the dataset into a set of disjoint clusters by optimizing a criteria that minimize the intra cluster distance and maximize inter cluster distance. The overall objective is to decompose the given dataset into predefined clusters having good compactness of data objects belonging to the cluster and separable from other clusters. Hence clustering can also be treated as an optimization problem and we can apply the evolutionary techniques to get the optimum solutions. In this work we have implemented standard DE for clustering problems. k-means algorithm falls under partitional based clustering technique. It was introduced by MacQueen in 1967. The demerit of k-means lies in the fact that its performance is largely dependent on the initial centroid values chosen. At times due to improper initial centroid values the k-means is trapped in local optima by providing not so good clustering results. There have been considerable amount of research work done to alleviate this problem by Data Mining researchers. Due to

82 space limitation details of those studies are omitted here. However, it may be worth noting here that many of the suggested approaches to overcome the above problem make use of evolutionary algorithms (EA). EAs are particularly very suitable for this kind of problems due to the population based approach wherein instead of a single initialization of centroids, a set of candidate centroids can be initialized at the beginning of simulation. As a result of which, the candidate solutions are evolved to be getting fitter and fitter due to the EA s process, thereby offering optimal solution at the end. In our work we have used DE as the EA tool to attack this problem of k-means. We have done function optimization by DE and also compared with PSO to investigate the superiority of DE. Finally we have implemented DE for clustering and also compared with PSO clustering. In following sections all comparisons are presented. But before that for sake of continuity we present briefs of DE and its mutation in the next section. 4.7.1 Basics of DE and its Mutation Variants Differential Evolution (DE) is a parallel direct search method developed by Storn and Price in 1997 which is a population-based global optimization algorithm.it uses a real-coded representation [54]. This approach for numerical optimization is simple to implement and requires little or no parameter tuning, but gives a remarkable performance. Like all other evolutionary algorithms, the initial population is chosen randomly.

83 4.7.1.1 Classical DE Like all other evolutionary algorithms, DE method also consists of three basic steps: i. Generation of population with N individuals in the d-dimensional space, randomly distributed over the entire search domain [ ] ( t) x ( t) x ( t), x ( t) x ( t) X i = i,1, i,2 i, 3... id, where t=0,1,2,.t,t+1 ii. Replacement of this current population by a better fit new population, and criteria of termination is met. iii. Repetition of this replacement until satisfactory results are obtained. Initialization Mutation Recombination Selection Figure 4.8 Basic Scheme of Evolutionary algorithm The basic scheme of evolutionary algorithms is given below: A. Mutation After the random generation of population, in each generation, a Donor vector V i ( t) is created for each ( t) X i described in DE Mutation schemes. B. Recombination.This donor vector can be created in different ways as

84 Now a trial offspring vector is created by combining components from the Donor vector V i ( t) and the target vector ( t) U i,j (t) = V i,j (t) if rand i,j (0,1)<=CR X i. This can be done in the following way = X i,j (t) otherwise, where CR is the crossover point (4.5) C. Selection Selection in DE adopts Darwinian principle Survival of the Fittest. Here if the trail vector yields a better fitness value, it replaces its target in the next generation; otherwise the target vector is retained in the population. Hence the population either gets better (w.r.t. the fitness function) or remains constant but never deteriorates. X ( t ) U ( t) +1 if f(u i (t) ) f(x i (t)), i = i X i ( t) = if f(x i (t )) < f (U i (t)) (4.6) 4.7.1.2 DE Mutation Schemes The five different mutation schemes suggested by Price [54] are as follows: 1. Scheme 1 DE/rand/1 In this scheme, to create a donor vector V i ( t) for each ith member, three other parameter vectors (say the o 1, o 2, and o 3 th vectors) are chosen randomly from the current population. A scalar number F is taken. This number scales the difference of any two of the three vectors and the

85 resultant is added to the third one. For the ith donor vector, this process can be given as V ( o o ) ( t + 1 ) = X ( t) + F X ( t) X ( t) i o1 2 3 (4.7) 2. Scheme 2 DE/rand to best/1 This scheme follows the same procedure as that of the Scheme1. But the difference is, now the donor vector is generated by randomly selecting any two members of the population (say the 0 ( t), and X, 3( t) X 2 0 vectors) and the best vector of the current generation (say X best ( t) ). For the ith donor vector, at time t=t+1, this can be expressed as V ( t + ) = X ( t) + λ X best ( t) X i ( t) ( ) + F ( X o2 ( t) X o ( t) ) 1 i i 3 (4.8) where λ is a control parameter in DE and ranges between [0, 2]. To reduce the number of parameters, we consider λ = F. 3. Scheme 3 DE/best/1 This scheme is identical to Scheme 1 except that the result of the scaled difference is added to the best vector of the current population. This can be expressed as V ( 1 ) ( t 1) = X best ( t) + F X o ( t) X o ( t) + (4.9) i 2 4. Scheme 4 DE/best/2 In this scheme, the donor vector is formed by using two difference vectors as shown below

V 86 ( t ) = X best ( t) + F X o ( t) X ( t) ( 1 0 2 ) + F ( X o3 ( t) X o ( t) ) + 1 (4.10) i 4 5. Scheme 5 DE/rand/2 Here totally five different vectors are selected randomly from the population, in order to generate the donor vector. This is shown below V ( t ) = X o ( t) + F X o ( t) X o ( t) ( 2 3 ) + F ( X o4 ( t) X o ( t) ) + 1 (4.11) i 1 1 2 5 Here F1 and F2 are two weighing factors selected in the range from 0 to 1. To reduce the number of parameters we may choose F1 = F2 = F. The experiment we conducted in this study uses Scheme 1-DE/rand/1 (equation 4.7). 4.7.1.3 Procedure For DE i ii iii iv v Randomly initialize the position of the particles Evaluate the fitness for each particle For each particle, create Difference-Offspring Evaluate the fitness of the Difference-Offspring If an offspring is better than its parent then replace the parent by offspring in the next generation vi Loop to step ii until the criterion is met, usually a sufficiently good fitness or a maximum number of iterations.

87 4.7.2 Basic Framework for DE Clustering Data vectors can be clustered using classical DE as follows: i. Initialize each vector to contain K number of randomly selected cluster centers ii. For I=1 to I max do a) For each vector i do b) For each object in the data set Z p i. Calculate the Euclidean distance d(z p,a ij ) to all cluster centroids C ij using equation 3 ii. Assign Z p to the cluster C ij such that d(z p,a ij ) = min k=1..n k {d(z p,a ik )} c) Change the population members according to the DE algorithm outlined in Section 3.8. Use the vectors fitness to guide the evolution of the population. iii. Report cluster centers and the partition obtained by globally best vector at time I=I max 4.7.2.1 Experimental Results Three well known datasets from the machine learning repository have been considered for this experiment. They are: 1. Fisher s iris dataset (n=150, p=4, c=3) 2. Wisconsin breast cancer dataset (n=683, p=9, c=2)

88 3. Wine recognition dataset (n=178, p=13, c=3) For PSO, inertia of weight, w=0.7 and c1=c2=2 For DE, the cross over rate cr =0.9, the weighting factor is chosen as F=0.8. These algorithms are applied for clustering three datasets described above. Fitness comparison of the two algorithms, PSO and DE has been reported in Table 4.7. Fig 4.9 and Fig 4.10 shows the three-dimensional plots of observations for PSO and DE algorithms respectively on Iris dataset. Number of iterations=50, Number of vectors/particles =5 Dataset Clusters Mean Intracluster Distance Mean intracluster Distance Name PSO DE Iris 3 46.9929 38.047 Wine 3 5600.2 5846 Breast Cancer 2 75444 75285 Table 4.7 Mean Intracluster Distances of PSO and DE

89 Figure 4.9 (a) Figure 4.9 (b)

90 Figure 4.9 (c) Figure 4.9 (d) Figure 4.9 (a d) PSO generated Three-Dimensional Clusters of IRIS dataset

91 Figure 4.10 (a) Figure 4.10 (b)

92 Figure 4.10 (c) Figure 4.10 (d) Figure 4.10 (a-d) DE generated Three-Dimensional Clusters of IRIS dataset

93 4.8 Summary From the discussion of PSO and DE based clustering, it is evident that they are very good candidates for clustering. Results shown above clearly suggest that they are able to overcome the local optima problem of k-means. In this chapter various models of PSO for clustering have been discussed and also compared with DE based clustering. These stress the need for developing new variant of DE and need to explore various methods of hybridization.