Knowledge Discovery using PSO and DE Techniques

Size: px
Start display at page:

Download "Knowledge Discovery using PSO and DE Techniques"

Transcription

1 60 CHAPTER 4 KNOWLEDGE DISCOVERY USING PSO AND DE TECHNIQUES

2 61 Knowledge Discovery using PSO and DE Techniques 4.1 Introduction In the recent past, there has been an enormous increase in the amount of data stored in electronic format. It has been estimated that the amount of collected data in the world doubles every 18 months and the size and number of databases are increasing even faster. The ability to rapidly collect data has outpaced the ability to analyze it. We are becoming data rich, but information poor. Information is crucial for decision making, especially in business operations. As a response to those trends, the term Data Mining (or Knowledge Discovery) has been coined to describe a variety of techniques to identify nuggets of information or decisionmaking knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. Automated tools must be developed to help extract meaningful information from a flood of information. Moreover, these tools must be sophisticated enough to search for correlations among the data unspecified by the user, as the potential for unforeseen relationships to exist among the data is very high. A successful tool set to accomplish these goals will locate useful nuggets of information in the otherwise chaotic data space, and present them to the user in a contextual format.

3 62 An urgent need for creating a new generation of techniques is needed for automating Data Mining and Knowledge Discovery in Databases (KDD). KDD is a broad area that integrates methods from several fields including statistics, databases, AI, machine learning, pattern recognition, machine discovery, uncertainty modeling, data visualization, high performance computing, optimization, management information systems (MIS), and knowledge-based systems. The term Knowledge Discovery in databases is defined as the process of identifying useful and novel structure (model) in data [89-90]. It could be viewed as a multi-stage process. These stages are summarized as follows: Data gathering: e.g., databases, data warehouses, Web crawling. Data cleaning: eliminate errors, e.g., GPA = 7.3. Feature extraction: obtaining only the interesting attributes of data Data mining: discovering and extracting meaningful patterns. Visualization of data. Verification and evaluation of results: drawing conclusions. Data mining is considered as the main step in the knowledge discovery process that is concerned with the algorithms used to extract potentially valuable patterns, associations, trends, sequences and dependencies in data [87-90]. Key business examples include web site access analysis for improvements in

4 63 e-commerce advertising, fraud detection, screening and investigation, retail site or product analysis, and customer segmentation. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. Additionally, the application of Data Mining techniques further exploits the value of data warehouse by converting expensive volumes of data into valuable assets for future tactical and strategic business development. Management information systems should provide advanced capabilities that give the user the power to ask more sophisticated and pertinent questions. It empowers the right people by providing the specific information they need. Data mining techniques could be categorized either by tasks or by methods. Based on tasks they are Association Rule Discovery, Classification, Clustering, Sequential Pattern Discovery, Regression, Deviation Detection etc. Most researchers refer to the first three Data Mining tasks as the main Data Mining tasks. The rest are either related more to some other fields, such as regression, or considered as a sub-field in one of the main tasks, such as sequential pattern discovery and deviation detection. Based on methods we have followings: Algorithms for mining spatial, textual, image and other complex data Incremental discovery methods and re-use of discovered knowledge Integration of discovery methods

5 64 Data structures and query evaluation methods for Data Mining Parallel and distributed Data Mining techniques Issues and challenges for dealing with massive or small data sets Fundamental issues from statistics, databases, optimization, and information processing in general as they relate to problems of extracting patterns and models from data. In our work we have limited ourselves to mainly clustering task from first category and development of algorithms for mining complex data and image data from second category. In this chapter we have presented unsupervised pattern classification (clustering) studies for numerical and image datasets. First part deals with the application of PSO for clustering. In this part classical k-means algorithm is discussed and also it is simulated for many sample datasets. Next we have used DE and a new adaptable DE (ADE) for clustering. Later, suitable hybridization of PSO with ADE also modeled and experimented for clustering problems. 4.2 Approaches to Clustering Clustering algorithms are divided into two broad classes: Centroid approaches: The centroid or central point of each cluster is estimated, and points are assigned to the cluster of their nearest centroid. Hierarchical approaches: Starting with each point as a cluster by itself, nearby clusters are repeatedly merged.

6 65 The clustering algorithms can also be classified according to: whether or not they assume a Euclidean distance and whether they use a centroid or hierarchical approach. The following three algorithms are examples of the above classification. BFR: Centroid based, assumes Euclidean measure, with clusters formed by a Gaussian process in each dimension around the centroid. GRGPF: Centroid-based, but uses only a distance measure, not a Euclidean space. CURE: Hierarchical and Euclidean, this algorithm deals with odd-shaped clusters. The following subsections describe few popular clustering algorithms. 4.3 Centroid Clustering The k-means algorithm is a popular main-memory algorithm. k cluster centroids are picked and points are assigned to the clusters by picking the closest centroid to the point in question. As points are assigned to clusters, the centroid of the cluster may migrate. The BFR Algorithm is based on k-means, this algorithm [94] reads its data once, consuming a main-memory-full at a time. The algorithm works best if the clusters are normally distributed around a central point, perhaps with a different standard deviation in each dimension.

7 66 Fastmap [5] picks k pairs of points (a i, b i ), each of which pairs serves as the ends of one of the k axes of the k-dimension space. Using the law of cosines, we can calculate the projection x of any point c onto the line ab, using only the distances between points, not any assumed coordinates of these points in a plane. 4.4 Hierarchical Clustering Hierarchical clustering [5][94] is a general technique that could take, in the worst case, O(n 2 ) time to cluster n points. The General outlines of this approach are as follows. Start with each point in a cluster by itself. Repeatedly select two clusters to merge. In general, we want to pick the two clusters that are closest, but there are various ways we could measure closeness. Some possibilities: o Distance between their centroids (or if the space is not Euclidean, between their clustroids). o Minimum distance between nodes in the clusters. o Maximum distance between nodes in the clusters. o Average distance between nodes of the clusters. o End the merger process when we have few enough clusters. o Use a k-means approach -- merge until only k clusters remain.

8 67 o Stop merging clusters when the only clusters that can result from merging fail to meet some criterion of compactness, e.g., the average distance of nodes to their clustroid or centroid is too high. The GRGPF Algorithm [5][94] assumes there is a distance measure D, but no Euclidean space. It also assumes that there is too much data to fit in main memory. It uses the data structure like R-tree to store clusters. Nodes of the tree are disk blocks, and we store different things at leaf and interior nodes. CURE Clustering Using Representatives has been proposed by Tung, Han et al., [94], is an efficient algorithm for large databases and is more robust to outliers. This algorithm starts with a main memory full of random points using the hierarchical approach. There are many approaches in Data Mining literature to overcome some of above problems. In recent past many researchers have shown keen interest to use evolutionary computation techniques to address some of these difficulties. In our work we have contributed few such approaches using PSO and DE to overcome some disadvantages of k-means. Our main focus has been to produce clusters that do not get trapped in local optima and also do not depend on the initial random centroids. In next two sections of this chapter we highlight the PSO and DE based clustering for several datasets including image dataset.

9 Clustering using PSO Research efforts have made it possible to view data clustering as an optimization problem. This view offers us a chance to apply PSO algorithm for evolving a set of candidate cluster centroids and thus determining a near optimal partitioning of the dataset at hand. An important advantage of the PSO is its ability to cope with local optima by maintaining, recombining and comparing several candidate solutions simultaneously. In contrast, local search heuristics, such as the simulated annealing algorithm only refine a single candidate solution and are notoriously weak in coping with local optima. Deterministic local search, which is used in algorithms like the k-means, always converges to the nearest local optimum from the starting position of the search. PSO-based clustering algorithm was first introduced by Omran et al. in [36]. The results of Omran et al. [36] showed that PSO based method outperformed K-means, FCM and a few other state-of-the-art clustering algorithms. In their method, Omran et al. used a quantization error based fitness measure for judging the performance of a clustering algorithm. The quantization error is defined as: J e K r r r d( X V n X C j, i ) / j i i= = 1 K i (4.1)

10 69 where C i is the i-th cluster center and n i is the number of data points belonging to the i-th cluster. Each particle in the PSO algorithm represents a possible set of K cluster centroids as: r Z i (t) V r V r r i, 2,,. i, = i, 1 V, K where r V i, p refers to the p-th cluster centroid vector of the i-th particle. The quality of each particle is measured by the following fitness function: r r f ( Z i, M i ) = w1 d max ( M i, X i ) + w2 ( Rmax d min ( Z i )) + w3 r J e (4.2) In the above expression, R max is the maximum feature value in the dataset and M i is the matrix representing the assignment of the patterns to the clusters of the i-th particle. Each element m i, k, p indicates whether the pattern X r p belongs to cluster C k of i-th particle. The user-defined constants w 1, w 2, and w 3 are used to weigh the contributions from different sub-objectives. In addition, d max = max k 1,2,..., K { r X p C i, K r d( X p r, V i, k ) / n i, k } (4.3) and, d r r r Z ) = min { d( V, V )} (4.4) min ( i i, p i, q p, q, p q is the minimum Euclidean distance between any pair of clusters. In the above, n i,k is the number of patterns that belong to cluster C i,k of particle i. The fitness function is a multi-objective optimization problem, which minimizes the intra-

11 70 cluster distance, maximizes inter-cluster separation, and reduces the quantization error. The PSO clustering algorithm is summarized below as presented in. Step 1: Initialize each particle with K random cluster centers. Step 2: repeat for iteration_count = 1 to maximum_iteration (a) repeat for each particle i (i) repeat for each pattern X r p in the dataset calculate Euclidean distance of X r p with all cluster centroids assign X r p to the cluster that have nearest centroid to X r p r (ii) calculate the fitness function f Z i, M ) ( i (b) find the personal best and global best position of each particle. (c) Update the cluster centroids according to velocity updating and coordinate updating formula of PSO. Van der Merwe and Engelbrecht [31] hybridized this approach with the k- means algorithm for clustering general datasets. A single particle of the swarm is initialized with the result of the k-means algorithm. The rest of the swarm is initialized randomly. In 2003, Xiao et al used a new approach based on the synergism of the PSO and the Self Organizing Maps (SOM) for clustering gene expression data. They got promising results by applying the hybrid SOM-PSO algorithm over the gene expression data of Yeast and Rat Hepatocytes. Paterlini and Krink have compared the performance of k-means, GA, PSO and Differential

12 71 Evolution (DE) or a representative point evaluation approach to partitional clustering. The results show that PSO and DE outperformed the k-means algorithm. In [95] authors proposed a PSO based hybrid algorithm for classifying the text documents. They applied the PSO, k-means and a hybrid PSO clustering algorithm on four different text document datasets. The results illustrate that the hybrid PSO algorithm can generate more compact clustering results over a short span of time than the k-means algorithm. In this work we have simulated few datasets to reinforce the above facts. The comparisons are made with classical k-means algorithm. We have chosen basic gbest PSO and some lbest PSO models for simulation. In lbest PSO we have chosen lbest_ring and VonNeumann lbest model. We have also proposed a hybrid PSO model with K-means. In this approach first the k-means is run and the resultant best centroid is taken as one of the particle in the swarm of particles for PSO to continue its simulation Dataset Description For this work we have investigated four datasets, they are Iris plants database: This is a well-understood database with 4 inputs, 3 classes and 150 data vectors. Wine: This is a classification problem with well behaved class structures. There are 13 inputs, 3 classes and 178 data vectors. Hayes Roth which has 132 data vectors with 3 classes and 5 inputs.

13 72 Diabetes data set has 768 data vectors having 2 classes and 8 inputs Results of Simulations This section compares the results of the k-means and PSO algorithms on five clustering problems. The main purpose is to compare the quality of the respective clustering, where quality is measured according to the Following two criteria: the quantization error the inter-cluster distances, i.e. the distance between the centroids of the clusters, where the objective is to maximize the distance between clusters. For all the results reported, averages over 20 simulations are given. The PSO algorithms used 20 particles. The Hybrid PSO takes the seed from result of k-means clustering. This seed is considered as one particle in swarm of particles in PSO. For PSO, w is varying as per the paper [27]. The initial weight is fixed at 0.9 and the final weight at 0.4. The acceleration coefficients α 1 and α 2 are fixed at to ensure good convergence.

14 73 Table 4.1 summarizes the results obtained from the five clustering algorithms for the problems cited above. The values reported are averages over 20 simulations, with standard deviations to indicate the range of values to which the algorithms converge. First, consider the fitness of solutions, i.e. the quantization error, for all data sets PSO based clustering is better than k-means. However, lbest_vonneumann provides better fitness values in terms of quantization error and inter_cluster distance for all data sets except for Wine. For Wine and Hayes Roth, Hybrid PSO gives good result. The lbest_vonneumann gives worst quantization error but comparatively good inter_cluster distance measure for these data sets. The standard deviations (std) found to be very close, thereby indicating the convergence of algorithms to better results. The results discussed in Table 4.1 presents a strong point in favour of PSO based clustering. This is the reason many researchers prefer to use PSO based clustering. However, to our knowledge there is no such PSO based clustering tool available for researchers and academicians to use and investigate. In our work we have developed a very comprehensive tool box for clustering using MATLAB for use by beginners in the research areas of clustering. Next sub-section presents a brief of the tool box developed by us.

15 74 Data Sets Iris Hayes Roth Wine Diabetes Algorithm Quantization error, Intercluster distance, std std k-means 0.834, , PSO_gbest , , lbest_ring , , lbest_vonneumann , , Hybrid PSO , , k-means , , PSO_gbest , , lbest_ring 3.99, , lbest_vonneumann , , Hybrid PSO , , k-means , ,0.234 PSO_gbest , , lbest_ring , , lbest_vonneumann , , Hybrid PSO , , k-means , , PSO_gbest , ,2.719 lbest_ring 36.98, , lbest_vonneumann , , Hybrid PSO , ,3.471 Table 4.1 Comparison of clustering Results

16 PSO Clustering Tool Box Both in the academia and in the industry, the scope of research in the area PSO is expanding rapidly. It is therefore worthwhile to consider giving good quality learning to the beginners. This work is done using MATLAB (Matrix Laboratory). MATLAB is a computational environment for modeling, analyzing and simulating a wide variety of intelligent systems. It also provides a very good access to the students by providing a numerous design and analysis tools in Fuzzy Systems, Neural Networks and Optimization tool boxes. In this work, a software tutorial for PSO has been developed to aid the teaching of PSO concepts and its applications to data clustering. The software developed offers facilities to simulate classical k-means clustering algorithm, PSO clustering and overall performance can be improved by suitable hybridization of k-means and PSO. Our software provides the scope of experimenting with various possibilities of hybridization and the learner can choose different tuning parameters for PSO along with suitable particle sizes and iterations to obtain better clustering performances. The software is GUI based and supported by various plots and graphs for reinforcing the derived results. The ease of operation of the software is the hall mark of it. In this work we tried hybridization in two ways. The first one is k-means + PSO technique, where in the k-means clustering algorithm is executed once; the result of this algorithm is used to seed the initial swarm, while the rest of the

17 76 swarm is initialized randomly. PSO algorithm is then executed as discussed in this thesis before. The second one is PSO + k-means technique. In this, first PSO algorithm is executed once; the resultant gbest is used as one of the centroids for k-means, while the rest of the centroids are initialized randomly. k-means algorithm is then executed. Our software offers the facilities of exploring these possibilities with various options of choosing parameters and number of iterations to investigate the ideas. As the name says, this software tutorial is for learning, how PSO can be applied in the area of clustering. The main idea is to involve the user for setting the swarm size and other parameters of PSO clustering algorithm. For this application, three data sets have been taken namely Iris, wine and breast cancer (collected from UCI machine repository). As the data sets considered for this application are pre classified, the number of clusters taken is same as that of their number of classes. The graphical user interface for the k-means clustering algorithm is given in Fig 4.5.

18 77 Figure 4.5 Results of k-means clustering on three datasets Here, for this application, the result of clustering is shown in terms of intra class and inter class similarities and also quantization error, which are tabulated in Table 4.2. A confusion matrix is also given, where an accuracy test can be made between the expected clusters and actual clusters. The time taken by the algorithm to cluster the data is also given. The screen shot is given below in Fig 4.6 for Iris, Wine and breast cancer datasets respectively. Measures/datasets Iris Wine Cancer Intra cluster distance Inter cluster distance Quantization error Time in sec Table 4.2 Results of k-means Clustering

19 78 Figure 4.6 Results of PSO based Clustering Here, in Fig 4.6, scope is given for the user, to specify all the PSO parameters like swarm size, inertia of weight, and acceleration coefficients. The results of clustering are shown in the same way as in k-means clustering and it is tabulated in Table 4.3. As a sample results are computed taking swarm size as 3, inertia weight as 0.72 and c1 and c2 both 1. However, the user can play with this software giving any values to see how the PSO clustering algorithm performs. Swarm size = 3, Inertia of weight = 0.72, c1 = 1 and c2 = 1 Measures/datasets Iris Wine Cancer Intra cluster distance Inter cluster distance Quantization error Time in sec Table 4.3 Results of gbest PSO Clustering

20 79 Swarm size = 3, Inertia of weight = 0.72, c1 = 1 and c2 = 1 Measures/datasets Iris Wine Cancer Intra cluster distance Inter cluster distance Quantization error Time in sec Table 4.4 Results of k-means + PSO Clustering algorithm Swarm size = 3, Inertia of weight = 0.72, c1 = 1 and c2 = 1 Measures/datasets Iris Wine Cancer Intra cluster distance Inter cluster distance Quantization error Time in sec Table 4.5 Results of PSO + k-means Clustering algorithm This tool gives a very good scope for the students who wants to experiment PSO clustering algorithm, by taking different sets of parameters. 4.6 Comparative Analysis with Experimental Results This software tutorial gives a comparative study on all the four clustering algorithms, k-means, PSO, k-means + PSO and PSO + k-means. According to the experimental results obtained as shown in Table 4.4 and Table 4.5, it is proven that the accuracy rate of PSO + k-means (Hybrid algorithm) is high when compared to that of other three algorithms. The Fitness curve graph is given in Fig 4.7.

21 80 Figure 4.7 Fitness Curves Here, the intra and inter cluster distances, quantization error and the time are marked with blue, red, green and black colours respectively. The graph shows the variations of intra and inter cluster distances and quantization error fitness curves among the four clustering algorithms: k-means, PSO, k-means + PSO and PSO + k-means respectively. For the graph, the intra and inter cluster distances and quantization error are marked with blue, red and green colours respectively. 4.7 Clustering using DE Data clustering is a process of grouping a set of data vectors into a number of clusters or bins such that elements or data vectors within the same cluster are similar to one another and are dissimilar to the elements in other clusters.

22 81 Clustering algorithms can be grouped into two main classes of algorithms, namely Supervised and Unsupervised. With Supervised clustering, the learning algorithm has an external teacher that indicates the target class to which the data vector should belong. For unsupervised clustering, a teacher does not exist, and data vectors are grouped based on distance from one another. Data clustering can be hierarchical or partitional. In this work we have confined our discussion to partitional clustering only. Partitional clustering algorithms, attempt to decompose the dataset into a set of disjoint clusters by optimizing a criteria that minimize the intra cluster distance and maximize inter cluster distance. The overall objective is to decompose the given dataset into predefined clusters having good compactness of data objects belonging to the cluster and separable from other clusters. Hence clustering can also be treated as an optimization problem and we can apply the evolutionary techniques to get the optimum solutions. In this work we have implemented standard DE for clustering problems. k-means algorithm falls under partitional based clustering technique. It was introduced by MacQueen in The demerit of k-means lies in the fact that its performance is largely dependent on the initial centroid values chosen. At times due to improper initial centroid values the k-means is trapped in local optima by providing not so good clustering results. There have been considerable amount of research work done to alleviate this problem by Data Mining researchers. Due to

23 82 space limitation details of those studies are omitted here. However, it may be worth noting here that many of the suggested approaches to overcome the above problem make use of evolutionary algorithms (EA). EAs are particularly very suitable for this kind of problems due to the population based approach wherein instead of a single initialization of centroids, a set of candidate centroids can be initialized at the beginning of simulation. As a result of which, the candidate solutions are evolved to be getting fitter and fitter due to the EA s process, thereby offering optimal solution at the end. In our work we have used DE as the EA tool to attack this problem of k-means. We have done function optimization by DE and also compared with PSO to investigate the superiority of DE. Finally we have implemented DE for clustering and also compared with PSO clustering. In following sections all comparisons are presented. But before that for sake of continuity we present briefs of DE and its mutation in the next section Basics of DE and its Mutation Variants Differential Evolution (DE) is a parallel direct search method developed by Storn and Price in 1997 which is a population-based global optimization algorithm.it uses a real-coded representation [54]. This approach for numerical optimization is simple to implement and requires little or no parameter tuning, but gives a remarkable performance. Like all other evolutionary algorithms, the initial population is chosen randomly.

24 Classical DE Like all other evolutionary algorithms, DE method also consists of three basic steps: i. Generation of population with N individuals in the d-dimensional space, randomly distributed over the entire search domain [ ] ( t) x ( t) x ( t), x ( t) x ( t) X i = i,1, i,2 i, 3... id, where t=0,1,2,.t,t+1 ii. Replacement of this current population by a better fit new population, and criteria of termination is met. iii. Repetition of this replacement until satisfactory results are obtained. Initialization Mutation Recombination Selection Figure 4.8 Basic Scheme of Evolutionary algorithm The basic scheme of evolutionary algorithms is given below: A. Mutation After the random generation of population, in each generation, a Donor vector V i ( t) is created for each ( t) X i described in DE Mutation schemes. B. Recombination.This donor vector can be created in different ways as

25 84 Now a trial offspring vector is created by combining components from the Donor vector V i ( t) and the target vector ( t) U i,j (t) = V i,j (t) if rand i,j (0,1)<=CR X i. This can be done in the following way = X i,j (t) otherwise, where CR is the crossover point (4.5) C. Selection Selection in DE adopts Darwinian principle Survival of the Fittest. Here if the trail vector yields a better fitness value, it replaces its target in the next generation; otherwise the target vector is retained in the population. Hence the population either gets better (w.r.t. the fitness function) or remains constant but never deteriorates. X ( t ) U ( t) +1 if f(u i (t) ) f(x i (t)), i = i X i ( t) = if f(x i (t )) < f (U i (t)) (4.6) DE Mutation Schemes The five different mutation schemes suggested by Price [54] are as follows: 1. Scheme 1 DE/rand/1 In this scheme, to create a donor vector V i ( t) for each ith member, three other parameter vectors (say the o 1, o 2, and o 3 th vectors) are chosen randomly from the current population. A scalar number F is taken. This number scales the difference of any two of the three vectors and the

26 85 resultant is added to the third one. For the ith donor vector, this process can be given as V ( o o ) ( t + 1 ) = X ( t) + F X ( t) X ( t) i o1 2 3 (4.7) 2. Scheme 2 DE/rand to best/1 This scheme follows the same procedure as that of the Scheme1. But the difference is, now the donor vector is generated by randomly selecting any two members of the population (say the 0 ( t), and X, 3( t) X 2 0 vectors) and the best vector of the current generation (say X best ( t) ). For the ith donor vector, at time t=t+1, this can be expressed as V ( t + ) = X ( t) + λ X best ( t) X i ( t) ( ) + F ( X o2 ( t) X o ( t) ) 1 i i 3 (4.8) where λ is a control parameter in DE and ranges between [0, 2]. To reduce the number of parameters, we consider λ = F. 3. Scheme 3 DE/best/1 This scheme is identical to Scheme 1 except that the result of the scaled difference is added to the best vector of the current population. This can be expressed as V ( 1 ) ( t 1) = X best ( t) + F X o ( t) X o ( t) + (4.9) i 2 4. Scheme 4 DE/best/2 In this scheme, the donor vector is formed by using two difference vectors as shown below

27 V 86 ( t ) = X best ( t) + F X o ( t) X ( t) ( ) + F ( X o3 ( t) X o ( t) ) + 1 (4.10) i 4 5. Scheme 5 DE/rand/2 Here totally five different vectors are selected randomly from the population, in order to generate the donor vector. This is shown below V ( t ) = X o ( t) + F X o ( t) X o ( t) ( 2 3 ) + F ( X o4 ( t) X o ( t) ) + 1 (4.11) i Here F1 and F2 are two weighing factors selected in the range from 0 to 1. To reduce the number of parameters we may choose F1 = F2 = F. The experiment we conducted in this study uses Scheme 1-DE/rand/1 (equation 4.7) Procedure For DE i ii iii iv v Randomly initialize the position of the particles Evaluate the fitness for each particle For each particle, create Difference-Offspring Evaluate the fitness of the Difference-Offspring If an offspring is better than its parent then replace the parent by offspring in the next generation vi Loop to step ii until the criterion is met, usually a sufficiently good fitness or a maximum number of iterations.

28 Basic Framework for DE Clustering Data vectors can be clustered using classical DE as follows: i. Initialize each vector to contain K number of randomly selected cluster centers ii. For I=1 to I max do a) For each vector i do b) For each object in the data set Z p i. Calculate the Euclidean distance d(z p,a ij ) to all cluster centroids C ij using equation 3 ii. Assign Z p to the cluster C ij such that d(z p,a ij ) = min k=1..n k {d(z p,a ik )} c) Change the population members according to the DE algorithm outlined in Section 3.8. Use the vectors fitness to guide the evolution of the population. iii. Report cluster centers and the partition obtained by globally best vector at time I=I max Experimental Results Three well known datasets from the machine learning repository have been considered for this experiment. They are: 1. Fisher s iris dataset (n=150, p=4, c=3) 2. Wisconsin breast cancer dataset (n=683, p=9, c=2)

29 88 3. Wine recognition dataset (n=178, p=13, c=3) For PSO, inertia of weight, w=0.7 and c1=c2=2 For DE, the cross over rate cr =0.9, the weighting factor is chosen as F=0.8. These algorithms are applied for clustering three datasets described above. Fitness comparison of the two algorithms, PSO and DE has been reported in Table 4.7. Fig 4.9 and Fig 4.10 shows the three-dimensional plots of observations for PSO and DE algorithms respectively on Iris dataset. Number of iterations=50, Number of vectors/particles =5 Dataset Clusters Mean Intracluster Distance Mean intracluster Distance Name PSO DE Iris Wine Breast Cancer Table 4.7 Mean Intracluster Distances of PSO and DE

30 89 Figure 4.9 (a) Figure 4.9 (b)

31 90 Figure 4.9 (c) Figure 4.9 (d) Figure 4.9 (a d) PSO generated Three-Dimensional Clusters of IRIS dataset

32 91 Figure 4.10 (a) Figure 4.10 (b)

33 92 Figure 4.10 (c) Figure 4.10 (d) Figure 4.10 (a-d) DE generated Three-Dimensional Clusters of IRIS dataset

34 Summary From the discussion of PSO and DE based clustering, it is evident that they are very good candidates for clustering. Results shown above clearly suggest that they are able to overcome the local optima problem of k-means. In this chapter various models of PSO for clustering have been discussed and also compared with DE based clustering. These stress the need for developing new variant of DE and need to explore various methods of hybridization.

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION 131 CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION 6.1 INTRODUCTION The Orthogonal arrays are helpful in guiding the heuristic algorithms to obtain a good solution when applied to NP-hard problems. This

More information

Comparative Study of Clustering Algorithms using R

Comparative Study of Clustering Algorithms using R Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer

More information

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:

More information

Particle Swarm Optimization applied to Pattern Recognition

Particle Swarm Optimization applied to Pattern Recognition Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL 85 CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL 5.1 INTRODUCTION Document clustering can be applied to improve the retrieval process. Fast and high quality document clustering algorithms play an important

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

CHAPTER 4 AN OPTIMIZED K-MEANS CLUSTERING TECHNIQUE USING BAT ALGORITHM

CHAPTER 4 AN OPTIMIZED K-MEANS CLUSTERING TECHNIQUE USING BAT ALGORITHM 63 CHAPTER 4 AN OPTIMIZED K-MEANS CLUSTERING TECHNIQUE USING BAT ALGORITHM This chapter introduces the new algorithm K-Means and Bat Algorithm (KMBA), for identifying the initial centroid of each cluster.

More information

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf

More information

Modified K-Means Algorithm for Genetic Clustering

Modified K-Means Algorithm for Genetic Clustering 24 Modified K-Means Algorithm for Genetic Clustering Mohammad Babrdel Bonab Islamic Azad University Bonab Branch, Iran Summary The K-Means Clustering Approach is one of main algorithms in the literature

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm Prabha S. 1, Arun Prabha K. 2 1 Research Scholar, Department of Computer Science, Vellalar

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Improved Performance of Unsupervised Method by Renovated K-Means

Improved Performance of Unsupervised Method by Renovated K-Means Improved Performance of Unsupervised Method by Renovated P.Ashok Research Scholar, Bharathiar University, Coimbatore Tamilnadu, India. ashokcutee@gmail.com Dr.G.M Kadhar Nawaz Department of Computer Application

More information

Clustering of datasets using PSO-K-Means and PCA-K-means

Clustering of datasets using PSO-K-Means and PCA-K-means Clustering of datasets using PSO-K-Means and PCA-K-means Anusuya Venkatesan Manonmaniam Sundaranar University Tirunelveli- 60501, India anusuya_s@yahoo.com Latha Parthiban Computer Science Engineering

More information

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming R. Karthick 1, Dr. Malathi.A 2 Research Scholar, Department of Computer

More information

Binary Differential Evolution Strategies

Binary Differential Evolution Strategies Binary Differential Evolution Strategies A.P. Engelbrecht, Member, IEEE G. Pampará Abstract Differential evolution has shown to be a very powerful, yet simple, population-based optimization approach. The

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization J.Venkatesh 1, B.Chiranjeevulu 2 1 PG Student, Dept. of ECE, Viswanadha Institute of Technology And Management,

More information

Using CODEQ to Train Feed-forward Neural Networks

Using CODEQ to Train Feed-forward Neural Networks Using CODEQ to Train Feed-forward Neural Networks Mahamed G. H. Omran 1 and Faisal al-adwani 2 1 Department of Computer Science, Gulf University for Science and Technology, Kuwait, Kuwait omran.m@gust.edu.kw

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Data Mining: An experimental approach with WEKA on UCI Dataset

Data Mining: An experimental approach with WEKA on UCI Dataset Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Modified Particle Swarm Optimization

Modified Particle Swarm Optimization Modified Particle Swarm Optimization Swati Agrawal 1, R.P. Shimpi 2 1 Aerospace Engineering Department, IIT Bombay, Mumbai, India, swati.agrawal@iitb.ac.in 2 Aerospace Engineering Department, IIT Bombay,

More information

A HYBRID APPROACH FOR DATA CLUSTERING USING DATA MINING TECHNIQUES

A HYBRID APPROACH FOR DATA CLUSTERING USING DATA MINING TECHNIQUES Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Index Terms PSO, parallel computing, clustering, multiprocessor.

Index Terms PSO, parallel computing, clustering, multiprocessor. Parallel Particle Swarm Optimization in Data Clustering Yasin ORTAKCI Karabuk University, Computer Engineering Department, Karabuk, Turkey yasinortakci@karabuk.edu.tr Abstract Particle Swarm Optimization

More information

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization 2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

A PSO-based Generic Classifier Design and Weka Implementation Study

A PSO-based Generic Classifier Design and Weka Implementation Study International Forum on Mechanical, Control and Automation (IFMCA 16) A PSO-based Generic Classifier Design and Weka Implementation Study Hui HU1, a Xiaodong MAO1, b Qin XI1, c 1 School of Economics and

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

What to come. There will be a few more topics we will cover on supervised learning

What to come. There will be a few more topics we will cover on supervised learning Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Exploration vs. Exploitation in Differential Evolution

Exploration vs. Exploitation in Differential Evolution Exploration vs. Exploitation in Differential Evolution Ângela A. R. Sá 1, Adriano O. Andrade 1, Alcimar B. Soares 1 and Slawomir J. Nasuto 2 Abstract. Differential Evolution (DE) is a tool for efficient

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

A Naïve Soft Computing based Approach for Gene Expression Data Analysis Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2124 2128 International Conference on Modeling Optimization and Computing (ICMOC-2012) A Naïve Soft Computing based Approach for

More information

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Swapna M. Patil Dept.Of Computer science and Engineering,Walchand Institute Of Technology,Solapur,413006 R.V.Argiddi Assistant

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Swarm Based Fuzzy Clustering with Partition Validity

Swarm Based Fuzzy Clustering with Partition Validity Swarm Based Fuzzy Clustering with Partition Validity Lawrence O. Hall and Parag M. Kanade Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract

More information

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science

More information

Wrapper Feature Selection using Discrete Cuckoo Optimization Algorithm Abstract S.J. Mousavirad and H. Ebrahimpour-Komleh* 1 Department of Computer and Electrical Engineering, University of Kashan, Kashan,

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

An Enhanced K-Medoid Clustering Algorithm

An Enhanced K-Medoid Clustering Algorithm An Enhanced Clustering Algorithm Archna Kumari Science &Engineering kumara.archana14@gmail.com Pramod S. Nair Science &Engineering, pramodsnair@yahoo.com Sheetal Kumrawat Science &Engineering, sheetal2692@gmail.com

More information

Chapter 14 Global Search Algorithms

Chapter 14 Global Search Algorithms Chapter 14 Global Search Algorithms An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Introduction We discuss various search methods that attempts to search throughout the entire feasible set.

More information

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Chapter 5 A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Graph Matching has attracted the exploration of applying new computing paradigms because of the large number of applications

More information

QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION

QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION International Journal of Computer Engineering and Applications, Volume VIII, Issue I, Part I, October 14 QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION Shradha Chawla 1, Vivek Panwar 2 1 Department

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Particle Swarm Optimization Methods for Pattern. Recognition and Image Processing

Particle Swarm Optimization Methods for Pattern. Recognition and Image Processing Particle Swarm Optimization Methods for Pattern Recognition and Image Processing by Mahamed G. H. Omran Submitted in partial fulfillment of the requirements for the degree Philosophiae Doctor in the Faculty

More information

AN EFFICIENT HYBRID APPROACH FOR DATA CLUSTERING USING DYNAMIC K-MEANS ALGORITHM AND FIREFLY ALGORITHM

AN EFFICIENT HYBRID APPROACH FOR DATA CLUSTERING USING DYNAMIC K-MEANS ALGORITHM AND FIREFLY ALGORITHM AN EFFICIENT HYBRID APPROACH FOR DATA CLUSTERING USING DYNAMIC K-MEANS ALGORITHM AND FIREFLY ALGORITHM Sundararajan S. 1 and Karthikeyan S. 2 1 Department of MCA, SNS College of Technology, Coimbatore,

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM Clustering Formulation

More information

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 97 CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 5.1 INTRODUCTION Fuzzy systems have been applied to the area of routing in ad hoc networks, aiming to obtain more adaptive and flexible

More information

CHAPTER 5 CLUSTER VALIDATION TECHNIQUES

CHAPTER 5 CLUSTER VALIDATION TECHNIQUES 120 CHAPTER 5 CLUSTER VALIDATION TECHNIQUES 5.1 INTRODUCTION Prediction of correct number of clusters is a fundamental problem in unsupervised classification techniques. Many clustering techniques require

More information

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering

More information

Redefining and Enhancing K-means Algorithm

Redefining and Enhancing K-means Algorithm Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2 161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London What Is Text Clustering? Text Clustering = Grouping a set of documents into classes of similar

More information

A Genetic Algorithm Approach for Clustering

A Genetic Algorithm Approach for Clustering www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 6 June, 2014 Page No. 6442-6447 A Genetic Algorithm Approach for Clustering Mamta Mor 1, Poonam Gupta

More information

2. LITERATURE REVIEW

2. LITERATURE REVIEW 2. LITERATURE REVIEW CBIR has come long way before 1990 and very little papers have been published at that time, however the number of papers published since 1997 is increasing. There are many CBIR algorithms

More information

ATI Material Do Not Duplicate ATI Material. www. ATIcourses.com. www. ATIcourses.com

ATI Material Do Not Duplicate ATI Material. www. ATIcourses.com. www. ATIcourses.com ATI Material Material Do Not Duplicate ATI Material Boost Your Skills with On-Site Courses Tailored to Your Needs www.aticourses.com The Applied Technology Institute specializes in training programs for

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING

CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 41 CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 3.1 INTRODUCTION This chapter describes the clustering process based on association rule mining. As discussed in the introduction, clustering algorithms have

More information

Fuzzy Ant Clustering by Centroid Positioning

Fuzzy Ant Clustering by Centroid Positioning Fuzzy Ant Clustering by Centroid Positioning Parag M. Kanade and Lawrence O. Hall Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract We

More information

GENETIC ALGORITHM VERSUS PARTICLE SWARM OPTIMIZATION IN N-QUEEN PROBLEM

GENETIC ALGORITHM VERSUS PARTICLE SWARM OPTIMIZATION IN N-QUEEN PROBLEM Journal of Al-Nahrain University Vol.10(2), December, 2007, pp.172-177 Science GENETIC ALGORITHM VERSUS PARTICLE SWARM OPTIMIZATION IN N-QUEEN PROBLEM * Azhar W. Hammad, ** Dr. Ban N. Thannoon Al-Nahrain

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Road map. Basic concepts

Road map. Basic concepts Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR 1.Introductıon. 2.Multi Layer Perception.. 3.Fuzzy C-Means Clustering.. 4.Real

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

Data Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Cluster-based instance selection for machine classification

Cluster-based instance selection for machine classification Knowl Inf Syst (2012) 30:113 133 DOI 10.1007/s10115-010-0375-z REGULAR PAPER Cluster-based instance selection for machine classification Ireneusz Czarnowski Received: 24 November 2009 / Revised: 30 June

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu Clustering Overview Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements,

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

COMPARISON OF ALGORITHMS FOR NONLINEAR REGRESSION ESTIMATES

COMPARISON OF ALGORITHMS FOR NONLINEAR REGRESSION ESTIMATES COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 COMPARISON OF ALGORITHMS FOR NONLINEAR REGRESSION ESTIMATES Tvrdík J. and Křivý I. Key words: Global optimization, evolutionary algorithms, heuristics,

More information

Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization

Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization Handling Multi Objectives of with Multi Objective Dynamic Particle Swarm Optimization Richa Agnihotri #1, Dr. Shikha Agrawal #1, Dr. Rajeev Pandey #1 # Department of Computer Science Engineering, UIT,

More information

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem

More information