Keywords Clustering, K-Mean, Firefly algorithm, Genetic Algorithm (GA), Particle Swarm Optimization (PSO).

Similar documents
AN EFFICIENT HYBRID APPROACH FOR DATA CLUSTERING USING DYNAMIC K-MEANS ALGORITHM AND FIREFLY ALGORITHM


Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

GENETIC ALGORITHM VERSUS PARTICLE SWARM OPTIMIZATION IN N-QUEEN PROBLEM

Genetic-PSO Fuzzy Data Mining With Divide and Conquer Strategy

Dynamic Clustering of Data with Modified K-Means Algorithm

A Survey of Parallel Social Spider Optimization Algorithm based on Swarm Intelligence for High Dimensional Datasets

Density Based Clustering using Modified PSO based Neighbor Selection

[Kaur, 5(8): August 2018] ISSN DOI /zenodo Impact Factor

International Journal of Advanced Research in Computer Science and Software Engineering

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

Knowledge Discovery using PSO and DE Techniques

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Iteration Reduction K Means Clustering Algorithm

Particle Swarm Optimization applied to Pattern Recognition

Enhancing K-means Clustering Algorithm with Improved Initial Center

Monika Maharishi Dayanand University Rohtak

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Unsupervised learning on Color Images

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

Parallel Approach for Implementing Data Mining Algorithms

Fuzzy C-means Clustering with Temporal-based Membership Function

Neuro-fuzzy, GA-Fuzzy, Neural-Fuzzy-GA: A Data Mining Technique for Optimization

K-Means Clustering With Initial Centroids Based On Difference Operator

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

Feature weighting using particle swarm optimization for learning vector quantization classifier

Modified Particle Swarm Optimization

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION

Scheduling of Independent Tasks in Cloud Computing Using Modified Genetic Algorithm (FUZZY LOGIC)

Novel Initialisation and Updating Mechanisms in PSO for Feature Selection in Classification

Using a genetic algorithm for editing k-nearest neighbor classifiers

Analyzing Outlier Detection Techniques with Hybrid Method

Fast Efficient Clustering Algorithm for Balanced Data

BI-OBJECTIVE EVOLUTIONARY ALGORITHM FOR FLEXIBLE JOB-SHOP SCHEDULING PROBLEM. Minimizing Make Span and the Total Workload of Machines

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

A New Improved Hybridized K-MEANS Clustering Algorithm with Improved PCA Optimized with PSO for High Dimensional Data Set

KEYWORDS: Mobile Ad hoc Networks (MANETs), Swarm Intelligence, Particle Swarm Optimization (PSO), Multi Point Relay (MPR), Throughput.

Modified K-Means Algorithm for Genetic Clustering

Evolving SQL Queries for Data Mining

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

Classification and Optimization using RF and Genetic Algorithm

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

Mutual Information with PSO for Feature Selection

S.KANIMOZHI SUGUNA 1, DR.S.UMA MAHESWARI 2

A HYBRID APPROACH FOR DATA CLUSTERING USING DATA MINING TECHNIQUES

A Classifier with the Function-based Decision Tree

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Available Online through

Index Terms PSO, parallel computing, clustering, multiprocessor.

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

An Enhanced K-Medoid Clustering Algorithm

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Keywords: clustering algorithms, unsupervised learning, cluster validity

The Genetic Algorithm for finding the maxima of single-variable functions

Improving Classifier Performance by Imputing Missing Values using Discretization Method

Clustering of datasets using PSO-K-Means and PCA-K-means

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

Particle Swarm Optimization For N-Queens Problem

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

A Comparative Study of the Application of Swarm Intelligence in Kruppa-Based Camera Auto- Calibration

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

An improved PID neural network controller for long time delay systems using particle swarm optimization algorithm

Parallel Clustering of Gene Expression Dataset in Multicore Environment

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Use of the Improved Frog-Leaping Algorithm in Data Clustering

PARALLEL PARTICLE SWARM OPTIMIZATION IN DATA CLUSTERING

Regression Based Cluster Formation for Enhancement of Lifetime of WSN

A study of hybridizing Population based Meta heuristics

Genetic Programming: A study on Computer Language

IMPROVING THE PARTICLE SWARM OPTIMIZATION ALGORITHM USING THE SIMPLEX METHOD AT LATE STAGE

Unsupervised Learning

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

Available online Journal of Scientific and Engineering Research, 2019, 6(1): Research Article

QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION

Color based segmentation using clustering techniques

Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

An Island Based Hybrid Evolutionary Algorithm for Optimization

A Web Page Recommendation system using GA based biclustering of web usage data

Image Segmentation Techniques

Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem

Redefining and Enhancing K-means Algorithm

GA is the most popular population based heuristic algorithm since it was developed by Holland in 1975 [1]. This algorithm runs faster and requires les

Information Fusion Dr. B. K. Panigrahi

Generation of Ultra Side lobe levels in Circular Array Antennas using Evolutionary Algorithms

Hybrid Particle Swarm-Based-Simulated Annealing Optimization Techniques

A PSO-based Generic Classifier Design and Weka Implementation Study

A Novel Probabilistic-PSO Based Learning Algorithm for Optimization of Neural Networks for Benchmark Problems

Reconfiguration Optimization for Loss Reduction in Distribution Networks using Hybrid PSO algorithm and Fuzzy logic

Transcription:

Volume 4, Issue 12, December 214 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Hybrid Technique for Data Clustering Using Genetic Algorithm with Particle Swarm Optimization Sundararajan S * Associate Professor and Head/MCA SNS College of Technology, Coimbatore Tamil Nadu, India Dr. Karthikeyan S The Director/Computer Applications Karpagam University, Coimbatore Tamil Nadu, India Abstract Data clustering is useful in several areas such as machine learning, data mining, wireless sensor networks and pattern recognition. The most famous clustering approach is K-means which successfully has been utilized in numerous clustering problems, but this algorithm has some limitations such as local optimal convergence and initial point understanding. Clustering is the procedure of grouping objects into disjoint class is known as clusters. So, that objects within a class are extremely similar with one another and dissimilar with the objects in other classes. Firefly algorithm is mainly used for clustering problems, but it also has disadvantages. To overcome the problems in firefly this work used a proposed method of Hybrid K-Mean with GA/PSO. The hybrid method merges the standard velocity and modernizes rules of PSOs with the thoughts of selection from GAs. They compare the hybrid algorithm to the standard GA and PSO approaches. Experimental results show that the proposed method used to reduce the limitations and improve accuracy rate. Keywords Clustering, K-Mean, Firefly algorithm, Genetic Algorithm (GA), Particle Swarm Optimization (PSO). I. INTRODUCTION Clustering is one of the unsupervised learning branches where a set of sample, frequently vectors in a multidimensional dimensions, are joined into clusters in such a way that sample in the same cluster are similar in some sense and patterns in different clusters are dissimilar in the same sense. Cluster analysis is a difficult problem due to a variety of ways of measuring the similarity and dissimilarity concepts, which do not have a universal definition. Data clustering, also called cluster analysis, segmentation, taxonomy investigation, or unconfirmed classification, is a technique of constructing groups of objects, or clusters, in such a way that objects in different clusters are quite distinct and objects in one cluster are very similar. Data clustering is often difficult in classification, in which objects are allotted to predefined classes. In data clustering, the classes are also to be different. Both GA and PSO, however, have their own set of strengths and weaknesses. The PSO algorithm is conceptually simple and implemented in a few lines of code. PSOs also have memory, whereas in a GA if an individual is not selected the information contained by that individual is lost. However, without a selection operator PSOs may waste resources on a poor individual that is stuck in a poor region of the search space. A PSO s group interactions enhance the look for an optimal solution, whereas GAs have problem identifying a current solution and are good at reaching a global region. Here they propose a hybrid algorithm (GA/PSO) combining the strengths of GAs with PSO to evolve the performance. The hybrid algorithm joins the standard velocity and location update rules of PSOs with the ideas of selection and crossover from GAs. The algorithm is planned so that the GA done a global search and the PSO operates a local search. Other hybrid GA/PSO algorithms have been proposed, and have been tested on function minimization problems. The remainder of this paper is organized as follows. Section II summarizes the concepts and related works. Section III details the proposed method, and Section IV discusses the experiments and the achieved results. Finally, Section V presents the conclusions of the work. II. LITERATURE SURVEY Zhang et al (214) propose a novel fuzzy hybrid quantum artificial immune clustering algorithm based on cloud model (C-FHQAI) to solve the stochastic problem. Abdel-Kader and Rehab (21) offers a hybrid two-phase GAI-PSO+ k-means data clustering algorithm that execute fast data clustering and can omitted early convergence to local optima. Mahmood and Amjad (21) believe the trouble of placing copies of objects in a shared web server scheme to diminish the cost of allocation read and write requirements when the web servers have restricted storage capacities. Particle Swarm Optimization is a population based globalized look for algorithm that mimics the ability (cognitive and social performance) of swarms. PSO manufacture improved results in complex and multi-peak troubles. An effort is completed to offer a direct for the researchers who are effective in the area of PSO and data grouping. PSO variants are also explain in this work are offered by Rana et al (211). Frank Leung et al shos the performance of GA/PSO. 214, IJARCSSE All Rights Reserved Page 979

Akhshabi et al (214), propose a particle swarm optimization (PSO) based on Memetic Algorithm (MA) that hybridizes with a local look for technique for work out a no-wait flow scheduling difficulty. The major objective is to diminish the whole flow time. Within the structure of the planned algorithm, a local edition of PSO with a ring-shape topology arrangement is utilized as global search. Li et al (214), In the current study, particle swarm optimization (PSO) was invoked to meliorate PLS-DA via simultaneously selecting the optimal variable subset as well as the associated weights and the best number of latent variables in PLS-DA, forming a new algorithm named PSO-PLSDA. Chuang et al (212), suggest fresh particle swarm optimization (CPSO) algorithms that discover the best SNP arrangement for cancer connection studies containing seven SNPs. Muthukaruppan et al (212), gives a particle swarm optimization (PSO)-based fuzzy expert scheme for the analysis of coronary artery disease (CAD). The planned scheme is based on the Cleveland and Hungarian Heart Disease datasets. Because the datasets contains numerous input attributes, decision tree (DT) was utilized to untangle the attributes that donate towards the diagnosis. Marinakis et al (213), this introduce a fresh algorithmic environment inspired techniques that uses a hybridized Particle Swarm Optimization algorithm with a fresh neighborhood topology for effectively solving the Feature Selection Problem (FSP). Goldberg (1989) shows Genetic Algorithms-in Search, Optimization and Machine Learning. Improved GA and PSO Culled Hybrid Algorithm are given by Li et al (28) and Shi et al (25). Mohamad et al (213), they convey that from the gene expression data, choosing a small subset of instructive genes do classification. However, lots of the computational techniques faces difficulty in choosing small subset since the small amount of samples needs to be evaluated to the vast amount of genes (high-dimension), irrelevant genes and noisy genes. III. PROPOSED DATA CLUSTERING FOR HYBRID GA/PSO ALGORITHM The proposed method is more successful in research. The approaches are discussed in this research learning s are 3.1K-Mean The K-means algorithm clusters D-dimensional data vectors into a predefined number of clusters on the basis of the Euclidean distance as the comparison criteria. Euclidean distances among data vectors are smallest amount for data vectors within a group as compared with distances to other vectors in various clusters. Vectors of the similar cluster are connected with one centroids vector, which shows the middle of that group and is the mean of the data vectors that belong jointly. 3.2 Modified firefly algorithm (MFA) In the standard Firefly Algorithm (FA), in every iteration the brighter firefly applies it s manipulated over additional fireflies and attracts them towards itself in maximization problems. In fact, in the standard FA, fireflies move despite of the global optima and it reduce the ability of the firefly algorithm to discover global best. In this work, to remove weaknesses of FA and get better the collective movement of fireflies, propose a Modify Firefly Algorithm (MFA). In the proposed algorithm, use global optima in firefly s progress. Global optimum is associated to optimization difficulty and it is a firefly that has the greatest or smallest amount value. And the global optima will be modernized in any iteration of algorithm. In the proposed algorithm, when a firefly compare with another firefly as an alternative of the one firefly being permitted to authority and to attract its neighbors, global optima in every iteration is permitted to influence others and change in their association. In the MFA, when a firefly perform with correspond firefly, if the match firefly be brighter, the compared firefly will shift toward correspond firefly, measured by global optima. 3.3 Proposed Hybrid K-Mean with GA/PSO A k-means clustering is the most popular and easy method for data clustering. In the k-means algorithm, initially k random cluster middle distinct and then every data vector will be assigning to every cluster based on Euclidean distance. Because of this the algorithm may trapped in local optima in the k-means clustering K center of cluster initial randomly. In this work to develop and enhance the accuracy of k-means algorithm, initialize the k-means algorithm with best centers, which evaluated by Hybrid GA/PSO algorithm. Ζ 1,1, Ζ 1,2,, Ζ 1,d, Ζ 2,1, Ζ 2,2,, Ζ 2,d,, Ζ K,1, Ζ K,2,, Ζ K,d Figure1: Structure of a firefly position in clustering. 214, IJARCSSE All Rights Reserved Page 98

No. of clusters In the planned technique, the clustering has two stages. First stage is initializing Hybrid K-Mean with GA/PSO with random values. Figure 1 show, data is D-dimensional and there are K clusters, so every GA/PSO has K D dimension. The mechanism of GA/PSO algorithm must do till predefine iteration. The k-means will start with the location of best GA/PSO in the second stage. The k-means clustering process the centers. The proposed hybrid clustering algorithm is summarizing as the pseudo code show in following pseudo code. IV. EXPRERIMENTAL RESULTS In this section, the experimental results on several synthetic and real processes are discussed. There are a total of two algorithms used in this section, i.e., standard Firefly and Hybrid GA/PSO. The proposed method of Hybrid GA/PSO is used for data clustering. Experiments were behavior on two datasets from the UCI repository: Iris and Wine. This result shows the comparison of Firefly algorithm, GA/PSO and proposed Hybrid GA/PSO. Table 1: Comparison of the Inter Cluster Distance values Data points KFA algorithm Proposed Hybrid GA/PSO Iris Dataset.5261.6473 Wine Dataset.3837.787.8.7.6.5.4.3.2.1 Iris Dataset KFA Hybrid GA/PSO Wine Dataset Figure 1: The Inter-Clustering distance Table1 and Figure 1 reveal the comparison of the inter-cluster distance of the proposed Hybrid GA/PSO with KFA. The inter cluster distance values are larger in the proposed method. Table 2: No on Clustering and Execution Time for proposed method Iris Datasets Wine Dataset No. of Clustering Execution Time (in secs.) No. of Clustering Execution Time(in secs.) PSO.293 58.473 45 KFA.448 42.628 31 Hybrid K-Mean with GA/PSO.766 18.836 1.9.8.7.6.5.4.3.2.1 No.of cluster (Iris) No. of cluster (Wine) PSO KFA Hybrid K-Mean with GA/PSO Figure 2: No. of Clustering for the approaches 214, IJARCSSE All Rights Reserved Page 981

Execution Time (Seconds) Table 2 and figure 2 shows the Number of Clustering for PSO, FFA and Hybrid K-Mean with GA/PSO with Iris and Wine dataset. The proposed method of Hybrid GA/PSO have high clustering. 6 5 4 3 2 1 Execution Time (Seconds)-Iris Execution Time (Seconds)-Wine PSO KFA Hybrid K-Mean with GA/PSO Figure 3: Execution Time Table 2 and figure 3 shows the Execution Time for PSO, FFA and Hybrid GA/PSO with Iris and Wine dataset. The proposed methods of Hybrid K-Mean with GA/PSO have less execution time when compare with other approaches. V. CONCLUSION Genetic algorithm and particle swarm optimization are greatly related to their inherent parallel characteristics, both algorithms perform the function with a group of randomly created population, both have a fitness rate to calculate the population. PSO methodology is observed for document clustering limitation. It is found that the document clustering problem is successfully tackled with PSO methodology by optimizing for clustering process. A most useful advantage of the PSO is its capacity to cope with local optima by maintain, recombining and evaluation numerous candidate solutions concurrently. The Hybrid K-Mean with GA/PSO algorithm merges the capability of fast convergence of the PSO algorithm with the competence of ease to exploit preceding solution of GA for eliminating the early convergence. So PSO and GA are used in this research to expand efficient, robust and flexible algorithms to resolve a document clustering problems. REFERENCES [1] Abdel-Kader, Rehab F, 21, "Genetically improved PSO algorithm for efficient data clustering", In Machine Learning and Computing (ICMLC), 21 Second International Conference on, pp. 71-75. [2] Zhang, Ren-Long, Mi-Yuan Shan, Xiao-Hong Liu, and Li-Hong Zhang, 214, "A novel fuzzy hybrid quantum artificial immune clustering algorithm based on cloud model", Engineering Applications of Artificial Intelligence 35: 1-13. [3] Akhshabi, M., Tavakkoli-Moghaddam, R., & Rahnamay-Roodposhti, F. 214, A hybrid particle swarm optimization algorithm for a no-wait flow shop scheduling problem with the total flow time, The International Journal of Advanced Manufacturing Technology, 7(5-8), 1181-1188. [4] Li, Y. Q., Liu, Y. F., Song, D. D., Zhou, Y. P., Wang, L., Xu, S., & Cui, Y. F, 214, Particle swarm optimization-based protocol for partial least-squares discriminant analysis: Application to< sup> 1</sup> H nuclear magnetic resonance analysis of lung cancer metabonomics, Chemometrics and Intelligent Laboratory Systems, 135, 192-2. [5] Chuang, L. Y., Chang, H. W., Lin, M. C., & Yang, C. H, 212, Chaotic particle swarm optimization for detecting SNP SNP interactions for CXCL12-related genes in breast cancer prevention, European Journal of Cancer Prevention,21(4), 336-342. [6] Muthukaruppan, S., & Er, M. J, 212, A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease, Expert Systems with Applications, 39(14), 11657-11665. [7] Mohamad, M. S., Omatu, S., Deris, S., & Yoshioka, M, 213, A Constraint and Rule in an Enhancement of Binary Particle Swarm Optimization to Select Informative Genes for Cancer Classification, In Trends and Applications in Knowledge Discovery and Data Mining (pp. 168-178). Springer Berlin Heidelberg. [8] Marinakis, Y., & Marinaki, M, 213, A hybridized particle swarm optimization with expanding neighborhood topology for the feature selection problem, In Hybrid Metaheuristics (pp. 37-51). Springer Berlin Heidelberg. [9] Mahmood, Amjad, 21, "Replicating web contents using a hybrid particle swarm optimization", Information processing & management 46, no. 2: 17-179. [1] Rana, Sandeep, Sanjay Jasola, and Rajesh Kumar, 211, "A review on particle swarm optimization algorithms and their applications to data clustering", Artificial Intelligence Review 35, no. 3: 211-222. 214, IJARCSSE All Rights Reserved Page 982

[11] Frank H.F.Leung, 23, Tuning of the Structure and Parameters of a Neural Network Using an Improved Genetic Algorithm, In IEEE Transaction on Neural Networks, Vol 14,No1, page 79-88 [12] Goldberg D.E, 1989, Genetic Algorithms-in Search, Optimization and Machine Learning, Addison- Wesley Publishing Company Inc., London. [13] Akhshabi, M., Tavakkoli-Moghaddam, R., & Rahnamay-Roodposhti, F, 214, A hybrid particle swarm optimization algorithm for a no-wait flow shop scheduling problem with the total flow time, The International Journal of Advanced Manufacturing Technology, 7(5-8), 1181-1188 [14] Li W. T., Shi X. W., Xu L. and Hei Y.Q 28, Improved GA and PSO Culled Hybrid Algorithm for Antenna Array Pattern Synthesis, Progress In Electromagnetics Research, PIER 8, pp. 461 476. [15] Shi, X.H., Liang Y.C., Lee H.P., Lu C. and Wang L.M., An Improved GA and a Novel PSO-GA-Based Hybrid Algorithm, Information Processing Letters, Vol. 93, No. 5, (25), pp. 255-261. 214, IJARCSSE All Rights Reserved Page 983