An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

Similar documents
A Novel Validity Index for Determination of the Optimal Number of Clusters

A {k, n}-secret Sharing Scheme for Color Images

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

New Fuzzy Object Segmentation Algorithm for Video Sequences *

the data. Structured Principal Component Analysis (SPCA)

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Gray Codes for Reflectable Languages

Cluster-Based Cumulative Ensembles

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Pipelined Multipliers for Reconfigurable Hardware

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Model Based Approach for Content Based Image Retrievals Based on Fusion and Relevancy Methodology

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

Detection and Recognition of Non-Occluded Objects using Signature Map

IMPROVED FUZZY CLUSTERING METHOD BASED ON INTUITIONISTIC FUZZY PARTICLE SWARM OPTIMIZATION

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

Approximate logic synthesis for error tolerant applications

Weak Dependence on Initialization in Mixture of Linear Regressions

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Boosted Random Forest

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Acoustic Links. Maximizing Channel Utilization for Underwater

Extracting Partition Statistics from Semistructured Data

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

A Hybrid Neuro-Genetic Approach to Short-Term Traffic Volume Prediction

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Preliminary investigation of multi-wavelet denoising in partial discharge detection

arxiv: v1 [cs.db] 13 Sep 2017

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Multi Features Content-Based Image Retrieval Using Clustering and Decision Tree Algorithm

1. Introduction. 2. The Probable Stope Algorithm

Dynamic Algorithms Multiple Choice Test

A scheme for racquet sports video analysis with the combination of audio-visual information

Optimization of Two-Stage Cylindrical Gear Reducer with Adaptive Boundary Constraints

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

arxiv: v1 [cs.gr] 10 Apr 2015

Cluster Centric Fuzzy Modeling

Using Augmented Measurements to Improve the Convergence of ICP

FUZZY WATERSHED FOR IMAGE SEGMENTATION

Stable Road Lane Model Based on Clothoids

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

Exploiting Enriched Contextual Information for Mobile App Classification

Particle Swarm Optimization for the Design of High Diffraction Efficient Holographic Grating

Using Game Theory and Bayesian Networks to Optimize Cooperation in Ad Hoc Wireless Networks

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Dynamic Backlight Adaptation for Low Power Handheld Devices 1

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

Chapter 2: Introduction to Maple V

Gradient based progressive probabilistic Hough transform

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

Graph-Based vs Depth-Based Data Representation for Multiview Images

Spatial-Aware Collaborative Representation for Hyperspectral Remote Sensing Image Classification

Multi-Channel Wireless Networks: Capacity and Protocols

Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator

Adapting K-Medians to Generate Normalized Cluster Centers

Naïve Bayesian Rough Sets Under Fuzziness

SINCE the successful launch of high-resolution sensors and

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

An Efficient and Scalable Approach to CNN Queries in a Road Network

Evolutionary Feature Synthesis for Image Databases

CleanUp: Improving Quadrilateral Finite Element Meshes

Segmentation of brain MR image using fuzzy local Gaussian mixture model with bias field correction

SURVEY ON MEDICAL IMAGE SEGMENTATION USING ENHANCED K-MEANS AND KERNELIZED FUZZY C- MEANS

SINR-based Network Selection for Optimization in Heterogeneous Wireless Networks (HWNs)

Space- and Time-Efficient BDD Construction via Working Set Control

The Implementation of RRTs for a Remote-Controlled Mobile Robot

A Fast Kernel-based Multilevel Algorithm for Graph Clustering

Triangles. Learning Objectives. Pre-Activity

The Mathematics of Simple Ultrasonic 2-Dimensional Sensing

Improved flooding of broadcast messages using extended multipoint relaying

Test Case Generation from UML State Machines

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

HEXA: Compact Data Structures for Faster Packet Processing

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

13.1 Numerical Evaluation of Integrals Over One Dimension

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

Video Data and Sonar Data: Real World Data Fusion Example

Transcription:

IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive Cluster alidity Index Tzu-Chieh LIN, Hsiang-Cheh HUANG 2, Bin-Yih LIAO and Jeng-Shyang PAN Department of Eletroni Engineering, National Kaohsiung University of Applied Sienes Kaohsiung, 807, Taiwan E-mail:{tlin, byliao, span}@bit.kuas.edu.tw 2 Department of Eletrial Engineering, National University of Kaohsiung Kaohsiung, 8, Taiwan E-mail: hhuang@nuk.edu.tw Abstrat The partitioning or lustering method is an important researh branh in data mining area, and it divides the dataset into an arbitrary number of lusters based on the orrelation attribute of all elements of the dataset. Most datasets have the original lusters number, whih is estimated with luster validity index. But most methods give the error estimation for most real datasets. In order to solve this problem, this paper applies the optimization tehnique of geneti algorithm (GA) to the new adaptive luster validity index, whih is alled the Gene Index (GI). The algorithm applies GA to adust the weighting fators of adaptive luster validity index to train an optimal luster validity index. It is tested with many real datasets, and results show the proposed algorithm an give higher performane and aurately estimate the original luster number of real datasets ompared with the urrent luster validity index methods. Key words: Clustering, Geneti Algorithm, Cluster alidity Index, Optimization, Data Mining.. Introdution Data partitioning is ommonly enountered in real appliations. Lots of shemes are proposed to assess the performanes for speifi algorithms in literature. The main onern of data partitioning is how to orretly divide the data points into lusters. Some algorithms in literature are speifially designed for ertain databases. Thus, these may perform well in some ases but not always good in general. In this paper, we would like to propose a generalized sheme, whih is integrated with optimization tehniques, for better partitioning the data. There are a number of indies proposed in literature to assess the performanes of data lustering. The main ideas are twofold: () data points within the same luster should loate as lose as possible, and (2) data points in different lusters should be as apart as possible. Based on the two onepts, a variety of the luster validity indies are proposed. We make neessary simulations and verify that not all the indies perform well. Therefore, we employ the geneti algorithm (GA) [] for resulting in better performanes in data partitioning. This paper is organized as follows. In Setion 2 we point out the data partitioning shemes and the luster validity indies. In Setion 3 we desribe the proposed algorithm by integrating existing indies and training with GA. Simulation results are demonstrated in Setion 4. Finally, we onlude this paper in Setion 5. 2. Data Partitioning Shemes and Cluster alidity Indies In this paper, we employ the fuzzy C-means (FCM) [2] algorithm for data lustering, and then make omparisons among several indies. By using the onepts of fuzzy theory, every data point does not absolutely belong to a ertain luster; it is denoted by a floating number to represent the degree of belonging to a ertain luster. The maor drawbak for FCM or other algorithms is that the orret number of lusters annot be known exatly in advane. Thus, the luster validity indies with several kinds of representations are proposed to evaluate the orret number of lusters. Every index has its advantages and drawbaks. We ite several ommonly enountered indies; then we perform verifiations in Se. 2, and finally ombine the advantages of these indies and propose the geneti-based luster validity index in Se. 3. 2. Cluster validity index: PC index PC (partition oeffiient) index [3] was one of the measures used in early days, with the definition in Eq. (): n 2 PC ( U ) = u ik () n k= i= Manusript reeived July 30, 2007. Manusript revised September 20, 2007.

254 IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 where u ik denotes the degree of membership of x i in the luster k, x i is the i th of d-dimensional measured data (and we use d=2 here as an example), under the ondition that [, ], i, k u ik 0 ; u =, k. i= ik To assess the effetiveness of lustering algorithm, the larger the PC index value, the better the performane. 2.2 Cluster validity index: PE index PE (partition entropy) index was also proposed in [3], with the definition in Eq. (2): n ( ) = [ ( )] PE U u ik log u (2) ik n k = i= To assess the effetiveness of lustering algorithm, the smaller the PE index value, the better the performane. 2.3 Cluster validity index: XB index The XB index was proposed by Xie and Beni in [4] with the two important onepts of ompatness and separation. For a good lustering result, the data points within the same luster should be as ompat as possible, while any two different lusters should be as far as possible. It an be formulated by Eq. (3): XB n 2 2 u x ν i i = i = ( U, ; X ) 2 n min( νi νk ) = (3) i k where x i is the i th of d-dimensional measured data (and we use d=2 here), v k is the d-dimension enter of the luster. In Eq. (3), the numerator implies the ompatness and the denominator denotes the separation. Therefore, to assess the effetiveness of lustering algorithm, the smaller the XB index value, the better the performane. 2.4 Cluster validity index: K index The K index was proposed by Kwon [5] based on the improvement of the XB index. In Eq. (3), we find when n, XB 0, and it is generally inorret for pratial appliations. By modifying Eq. (3), we obtain Eq. (4): K n 2 2 2 u x ν + νi ν i i = i = = ( U, ; X ) 2 min( νi νk ) = (4) i k where ν denotes the geometri enter of data points. To assess the effetiveness of lustering algorithm, the smaller the XB index value, the better the performane. 2.5 Cluster validity index: B rit index B rit index was proposed in [6]. It is also omposed of the ompatness and separation parameters in order to obtain the optimal number of lusters. The measure of ompatness and separation are independently derived. First, the separation between lusters is denoted by G(), max δ( i, ) i, G() = min (5) δ(, ) i i where ( ) δ, is a distane measure between the i geometri enters of lusters i and, with the definition of δ T 2 (, ) ( ) A( ) i = (6) i i and A denotes a positive definite matrix with dimension of d d (or 2 2 here). For simpliity, people use the identity matrix I to replae the matrix A in Eq. (6) to verify the distane measure. Next, the ompatness is represented by the ratio of varianes between the data points of the urrent luster, and the data points within every luster, denoted by wt (), wt () d varq ( k ) q= k = = d vartotal ( q) q= (7) where var q denotes the urrent luster and var total denotes the variane of the whole data set. From experimental results, the value of G() is muh larger than that of wt () with the ranges of G() [0, 20] and wt () [0, 0.8], thus we need to inlude a weighting fator α to balane the effets from both fators, and we obtain B rit( ) = G( ) + α wt ( ) (8) max G( ) where α = denotes the weighting fator. max ( ) wt From derivations above, when the smaller B rit index is obtained, the lustering performane would be better. 2.6 Cluster validity index: S index S index was proposed in [7]. It also adopted the onepts of ompatness and separation. Unlike the B rit index in Se. 0, both fators are normalized to the values between 0 and to balane the effets from both fators. In measuring the ompatness, the mean distane of the lusters in the data set is alulated, u (, ; X ) = i x (9) i= ni x xi where n i denotes the number of data points within luster i, i is the geometri enter of luster i, and the total of mean distanes are alulated. The separation measure is simply denoted by o =, where d d min denotes the min minimum distane between any two lusters.

An Optimized Approah on Applying Geneti Algorithm to Adaptive Cluster alidity Index 255 Next, normalization of Eqs. (9) and (0) is performed by u, ; X min[ u, ; X ], ; X =, (0) un on ( ) ( ) ( ) max u (, ; X ) min u o (, ) min[ o (, )] (, ) [ ] [ (, ; X )] =. () max[ (, )] min[ (, )] o o Finally, the S index is defined by, ; X =, ; X,. (2) ( ) ( ) ( ) S un + To assess the effetiveness of lustering algorithm, the smaller the S index value, the better the performane. 2.7 Preliminary results with existing indies To evaluate the effetiveness of existing indies, we generate a two-dimensional, 2000-point, 9-luster testing database alled `My_sample, illustrated in Fig.. All six indies are examined, and results are in Table. With the database, we an expet that the olumn with k=9 should perform the best, i.e., the largest PC value and the smallest values of the other five should be obtained. As we an see, not all of the indies indiate that the orret lustering result is when k=9. Moreover, the riterion for PC is to searh for its maximum value, while for the rest indies the riterion is to find their minimum values. Based on the two findings, the optimization tehniques an be inluded into the lustering algorithm to searh for the better and more orret results. 3. Geneti-based Cluster alidity Index As we an see from Se. 2. to 2.6, every index has its own speifi onept for data lustering and the results in Se. 2.7 have a diversity of performanes. Therefore, we employ geneti algorithm (GA) for finding an optimized result based on the onept of every index above. GA onstitutes of three maor steps: rossover, mutation, and seletion. Based on the fitness funtion, we try to integrate our watermarking sheme with GA proedures. on Table : The index values for lustering from 2 to 0 lusters in six different shemes for My_sample database. The shaded bloks represent the orret lustering results. index k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=0 PC 0.722 0.673 0.625 0.640 0.674 0.720 0.757 0.797 0.770 PE 0.63 0.852.05.07.030 0.936 0.855 0.754 0.834 XB 0.277 0.0 0.88 0.224 0.095 0.00 0.068 0.042 0.628 K 555 22 377 449 9 202 39 87 300 B rit 7.87 0.85 9.86 0.77 8.4 8.79 8.59 8.39 7.42 S.000 0.69 0.69 0.485 0.334 0.32 0.26 0.220.000 3. Preproessing in GA We need to have hromosomes to perform the three steps in GA. We employ five popularly used databases, inluding auto-mpg [8], bupa [9], m [0], iris [], and wine [2] in Table 2 for GA optimization. Half of the data set in eah database is used for training, and the other half is used for testing. 3.2 Deiding the fitness funtion After onsidering pratial implementations in GA, and based on the indies desribed in Se. 0 to 0, in this paper, we proposed the geneti-based index for data lustering. The fitness funtion is denoted by INTRA( k ) max d ( i, ) k = i, (, ; X ) = α + β. (3) gene MSDt min d ( i, ) i In the first term, it denotes the ompatness with INTRA k = n t - x, and (4) ( ) t = k x x k MSD t = t - x. (5) nt n In the seond term, d( i, ) is the same as that defined in Eq. (6). Also, α and β are the weighting fators, whih at as the output after GA training. The goal for optimization is to find the minimized value in the fitness funtion. Under the best ondition, the fitness value reahes 0. 3.3 Proedures in GA training The GA proedures for optimized luster validity index are desribed as follows. Fig. The two-dimensional, 2000-point, 9-luster database My_sample. Step. Produing the hromosomes: 40 hromosomes are produed. Eah hromosome denotes the weighting fators in the fitness funtion, i.e., ( α i, β i), i 40. Beause the fitness funtion is omposed of two opposing onditions, we only onern about the ratio between the two weights; we set 0 α i, β i.

256 IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 Table 2: The five databases used in this paper. Training database # of data points Testing database # of data points auto-mpg_train 96 auto-mpg_test 96 bupa_train 73 bupa_test 72 m_train 737 m_test 736 iris_train 75 iris_test 75 wine_train 89 wine_test 89 Fitness values are alulated from the training databases in Table 2. At the beginning of first iteration, hromosome values are randomly set. In training, hromosome values are modified based on the output of the previous iteration. Step 2. Seleting the better hromosomes: All the 40 sets of hromosomes are inluded into the fitness funtion and the orresponding fitness sores are alulated. The 20 hromosomes with smaller fitness values are kept for use in the next iteration, and the other 20 are disarded. 20 new hromosomes in the next iteration are produed from rossover and mutation based on the 20 hromosomes remained. Step 3. Crossover of hromosome: From the 20 remained hromosomes, we randomly hoose 0 of them, and gather into 5 pairs, to perform the rossover operation. By swapping the α or β values of every pair, 0 new hromosomes are produed. Step 4. Mutation of hromosome: The 0 hromosomes that are not hosen in 0 are used in this step. The α values in the first five hromosomes are replaed by randomly set, new α values. Similar operation is performed on the β values of the other five. Step 5. The stopping ondition: One the pre-determined number of iterations is reahed, or when the fitness value equals 0, the training is stopped, and the weighting fators orresponding to the smallest fitness sore in the final iteration, ( α, β ), is the output. 4. Simulation Results After training for 000 iterations the GA optimization in Se. 3.3, we obtain the optimized weighting fators ( α, β ) = ( 0.856, 0.0826). With the two values, we an ompare the GA optimized result with those in Se. 2. to 2.6 by verifying the five test databases in Table 2. We depit the detailed results with the auto-mpg database in Table 3, the bupa database in Table 4, the iris database in Table 3: Index values from 2 to 0 lusters in seven different shemes. Shaded bloks show the orret results for auto-mpg database. index k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=0 PC 0.866 0.80 0.787 0.764 0.726 0.76 0.73 0.704 0.700 PE 0.330 0.522 0.596 0.679 0.804 0.847 0.869 0.95 0.944 XB 0.056 0.073 0.083 0.067 0.45 0.2 0.2 0.04 0.23 K.3 5.30 8.2 5.74 35.53 33.5 36.00 32.70 4.44 B rit 3.57 8.20 6.94 6.47 9.2 9.89..2 3.24 S.000 0.633 0.466 0.45 0.548 0.592 0.705 0.77.000 GI 0.523 0.487 0.52 0.536 0.780 0.854 0.960 0.974.48 Table 4: Index values from 2 to 0 lusters in seven different shemes. Shaded bloks show the orret results for bupa database. index k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=0 PC 0.882 0.664 0.562 0.476 0.4 0.383 0.346 0.328 0.295 PE 0.304 0.809.36.435.676.826 2.0 2.3 2.309 XB 0.065 0.5 0.587 0.623.480.307.073.395.407 K.64 94.6 0.2 8.5 284.2 256. 22.8 27.9 282.5 B rit 59.03 46.69 45.75 47.62 67.55 83.79 49.4 56.06 63.48 S.000 0.78 0.67 0.555 0.702 0.786 0.699 0.882.000 GI.088.225.286.328.754.787.690.903.98 Table 5, the wine database in Table 6, respetively. Numerial values in Table 3 depit the results for the auto-mpg database, whih has three lusters. We an see that only with the proposed GA-based index has the orret result. In bupa, m, iris, and wine databases, similar results an be obtained, and detailed omparisons an be found from Table 4 to Table 7, respetively. In addition, from Table 8, we see that the proposed GI results in orret luster numbers in four of the five test databases. Comparing to other six indies that only result in one orret luster number, our sheme gets better performane. In addition, regarding to the m database, none of the seven indies have the orret luster number. 5. Conlusion In this paper, we disussed about data lustering shemes and proposed a new luster validity index based on GA. GI index outperforms all the six existing indies in literature. However, lustering results for appliations to some database are not orret even after GA training. And this is the motivation for our researhes in the future. Aknowledgments This work is partially supported by National Siene Counil (Taiwan) under grant NSC95-228-E-005-034.

An Optimized Approah on Applying Geneti Algorithm to Adaptive Cluster alidity Index 257 Table 5: Index values from 2 to 0 lusters in seven different shemes. Shaded bloks show the orret results for m database. index k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=0 PC 0.809 0.704 0.597 0.528 0.474 0.423 0.378 0.342 0.32 PE 0.459 0.773.089.323.523.723.905 2.066 2.89 XB 0.096 0.25 0.97 0.222 0.23 0.296 0.388 0.604 0.539 K 70.86 92.96 46.9 65.7 73.6 223. 293.0 458.3 40.4 B rit 8.57 3.26.96 3.35 3.37 7.26 6.77 9.7 22.55 S.000 0.580 0.452 0.428 0.440 0.54 0.664 0.935.000 GI 0.67 0.595 0.660 0.72 0.77 0.866 0.990.24.206 Table 6: Index values from 2 to 0 lusters in seven different shemes. Shaded bloks show the orret results for iris database. index k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=0 PC 0.888 0.790 0.738 0.678 0.60 0.584 0.562 0.538 0.535 PE 0.290 0.559 0.736 0.933.08.26.337.435.486 XB 0.058 0.5 0.60 0.265 0.36 0.549 0.239 0.227 0.289 K 4.622 9.920 4.72 25.25 33.2 6.43 26.73 28.05 36.45 B rit 8.46 2.3 0.40 0.77 7.03 2.2 6.08 6.80 6.53 S.000 0.724 0.598 0.628 0.695 0.907 0.700 0.832.000 [3] J.C. Bezdek, Pattern Reognition with Fuzzy Obetive Funtion Algorithms, New York, NY: Plenum, 98. [4] X.L. Xie and G. Beni, "A validity measure for fuzzy lustering", IEEE Trans. Patt. Anal. Mahine Intell., ol. 3, No. 8, 99, pp. 84-846. [5] S.H. Kwon, "Cluster validity index for fuzzy lustering", Eletronis Letters, vol. 34, no. 22, pp. 276-277, 998. [6] A.-O. Boudraa, "Dynami estimation of number of lusters in data sets", Eletronis Letters, ol. 35, No. 9, 999, pp. 606-608. [7] D.-J. Kim, Y.-W. Park, and D.-J. Park, "A novel validity index for determination of the optimal number of lusters", IEICE. Trans. Inf. & Syst., ol. E84-D, No. 2, 200, pp. 28-285. [8] R. Quinlan, "Auto-mpg data", ftp://ftp.is.ui.edu/pub/mahine-learning-databases/auto-mpg/, 993. [9] BUPA Medial Researh Ltd, "BUPA liver disorders", ftp://ftp.is.ui.edu/pub/mahine-learning-databases/liver-disorders/,990. [0] T.S. Lim, "Contraeptive method hoie", ftp://ftp.is.ui.edu/pub/mahine-learning-databases/m/, 999. [] R.A. Fisher, "Iris plants database", ftp://ftp.is.ui.edu/pub/mahine-learning-databases/iris/, 988. [2] S. Aeberhard, "Wine reognition data", ftp://ftp.is.ui.edu/pub/mahine-learning-databases/wine/, 998. GI 0.442 0.50 0.602 0.755 0.887.47 0.88 0.953.09 Table 7: Index values from 2 to 0 lusters in seven different shemes. Shaded bloks show the orret results for wine database. index k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=0 PC 0.868 0.783 0.772 0.746 0.75 0.784 0.786 0.760 0.738 PE 0.328 0.572 0.636 0.720 0.738 0.663 0.677 0.764 0.830 XB 0.067 0.4 0.0 0.08 0.23 0.07 0.097 0.209 0.26 K 6.264 3.8.28.00 8.96 4.83 22.75 50.47 67.97 B rit 22.85 4.7.64 9.69.0 0.06 2.82 9.8 22.89 S.000 0.672 0.569 0.43 0.406 0.357 0.46 0.772.000 GI 0.570 0.566 0.605 0.64 0.84 0.828.06.594.896 Table 8: Comparisons of the seven indies for the five test databases. Our sheme performs the best. Database Original lusters PC PE XB K B rit S GA auto-mpg 3 2 2 2 2 5 5 3 bupa 2 2 2 2 2 5 5 2 m 3 2 2 2 2 4 4 2 iris 3 2 2 2 2 4 5 3 wine 3 2 2 2 2 5 7 3 Referenes [] D.E. Goldberg, Geneti Algorithms in Searh, Optimization and Mahine Learning, Boston, MA: Kluwer, 989. [2] J.C. Bezdek, R. Ehrlih, and W. Full, "FCM: Fuzzy C- Means Algorithm", Computers and Geosienes, ol. 0, No. 2-3, 984, pp. 6-20.