A Novel Validity Index for Determination of the Optimal Number of Clusters

IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers SUMMARY The strutural harateristis of lusters are investigated in the partitioning proess. Two partition funtions, whih show opposite properties around the optimal luster number, are found and a new luster validity index is presented based on the ombination of these funtions. Some properties of the index funtion are disussed and numerial examples are presented. key words: lustering, validity index, optimal luster number 1. Introdution The hard -means algorithm (HCMA) and the fuzzy - means algorithm (FCMA) are well known for their effiieny in lustering large data sets [2]. Although these algorithms require several parameters, the most signifiant one affeting the performane is known as the number of lusters. Different hoies of may lead to different lustering results. Thus, the estimation of the optimal luster number ( ) during the lustering proess is a prime onern. Many funtions, alled luster validity or validity riteria, are proposed in the literatures in order to find an optimal number of lusters. The partition oeffiient (v PC ) and the partition entropy (v PE ) whih use the partition matrix was introdued by Bezdek [2]. Other riteria whih take into aount the geometri properties of input data were proposed by Fukayama and Sugeno (v FS ) [3] and Xie and Beni (v XB ) [4]. The indies v PC and v PE are sensitive to noises or a weighting exponent m and v FS is sensitive to both high and low values of the weighting exponent m. Moreover, the indies v PC, v PE and v FS are no more useful for HCMA. Aording to Pal and Bezdek s analysis [5], the index v XB provided a good response over a wide range of hoies both for = 2to 10 and for m =1.01 to 7. However, v XB dereases monotonially as the number of lusters beomes very large and lose to the number of data n. To eliminate the monotonially dereasing tendeny, Kweon [6] added an ad ho punishing term and proposed a new validity index (v K ). Reently, another approah based on a dynami Manusript reeived July 31, 2000. Manusript revised Otober 10, 2000. The authors are with the Department of Eletrial Engineering, Korea Advaned Institute of Siene and Tehnology, 373-1 Kusong-dong, Yusong-gu, Taejon 305-701, Republi of Korea. The authors are with the Ageny for Defene Development,Yusong P.O.Box 35-1, Taejon, Republi of Korea. estimation method (v rit ) was presented by Boudraa [7]. In order to overome some limitations of the previous studies, we present a new validity riterion whih onsiders the strutural harateristis around the optimal luster number in the partitioning proess. It shows a lear valley at = and eliminates the dereasing tendeny for large. It is also appliable to both HCMA and FCMA. 2. Cluster Struture Related Funtions The lustering algorithms based on the objetive funtion (FCMA and HCMA) minimize sum of the intraluster distane in the proess of optimization. Figure 1 shows a simple partitioning proess for = 2to 4. The data onsist of three ompat lasses ( = 3), v i is a prototype assoiated with the i th luster and eah luster is distinguished by different markers. In this paper, it is defined that lusters are in the under-partitioned state when < and in the over-partitioned state when >. In addition, the mean intra-luster distane (MICD) of the i th luster is defined as MD i = x χ i v i x /n i, where χ i is a data set of the i th luster and n i is the number of data in the i th luster. When the data are strutually under-partitioned as shown in Fig. 1 (a), at least one luster maintains large MICD. As the partition state moves to the optimal and over-partitioned ones ( ), the large MICD abruptly dereases. On the other hand, the inter-luster minimum distane (ICMD) whih is defined as d min = min i j v i v j [4] beomes large when the data are under-partitioned and optimally partitioned states. As the state enters into the over-partitioned one, ICMD beomes very small beause at least one of ompat lasses is subdivided as shown in Fig. 1 (). Therefore, it is possible to find an optimal luster number by us- (a) = 2. (b) = 3. () =4. Fig. 1 Example of partitioning proess.

282 IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 (a) MICD and ICMD. (b) v u( ) and v o( ). Fig. 2 Illustrations of luster related funtions w.r.t.. ing two measures, MICD and ICMD, eah of whih presents differently varying aspets around. That is, at least one of MICD s abruptly hanges at and so does ICMD at + 1. A simple illustration of this tendeny is shown in Fig. 2(a). Let X =[x 1, x 2,, x n ] T be a finite data set of a p-dimensional feature spae, where x i is 1 p vetor. And let V =[v 1, v 2,, v ] T be a p prototype matrix, v i is 1 p vetor and eah of whih haraterizes one of the lusters. 2.1 Under-Partition Measure Funtion To find the under-partitioned status, we define v u (, V; X) as an under-partition measure funtion. v u (, V; X) = 1 MD i, 2 max. (1) i=1 It shows the mean of MICD over the luster number and measures the strutural ompatness of eah and every lass. When the data are optimally or over-partitioned, every lass beomes ompat and this makes v u ( ) small. Furthermore, as the luster number beomes very large and lose to the number of data points n, the mean distane beomes 0. However, in the ase of the under-partitioned state, v u ( ) beomes relatively large beause some of ompat lasses may be grouped to a single luster. Therefore, this funtion produes a break point at the optimal luster number, that is, it has very small values for and relatively large values for < as shown in Fig. 2(b). Thus, it plays a key role to determine whether an underpartitioned status ourred or not in the partitioning proess. 2.2 Over-Partition Measure Funtion An over-partition measure funtion is defined as v o (, V) as following: v o (, V) =, 2 max. (2) d min The denominator of this funtion (d min ), whih is the minimum distane between luster enters, measures inter-luster separation. When the data are optimally or under-partitioned, d min beomes large, hene v o ( ) yields a small value. However, as the data are overpartitioned, d min beomes very small beause some of ompat lasses may be subdivided into several lusters. Therefore, this funtion also produe a break point at, that is, it has very large values for > and relatively small values for as shown in Fig. 2(b). Thus, it also plays a key role to determine whether an over-partitioned status ourred or not in the partitioning proess. 3. Validity Index As desribed in the previous setion, both the partition measure funtions have break points at the optimal luster number. v u ( ) beomes small for and v o ( ) beomes small for. Sine both funtions have small values only at =, an appropriate ombination of eah funtion produes the optimal number of lusters easily. On the other hand, eah funtion has different sales with respet to the struture and number of data. In order to aommodate relative mismathes of eah one, we applied a normalization proess. Let us define partition measure vetors as v u =[v u (2, V; X),,v u ( max, V; X)], (3) v o =[v o (2, V),,v o ( max, V)]. (4) For eah vetor, maximum and minimum values are omputed as v max = max v u (, V; X), v min = min v u (, V; X), =2, 3,, max, (5) then, normalization of eah element beomes v un (, V; X) = v u(, V; X) v min. (6) v max v min Thus, v un ( ) always lies between 0 to 1. Consequently, normalized partition measure vetors are written as v un =[v un (2, V; X),,v un ( max, V; X)], (7) v on =[v on (2, V),,v on ( max, V)]. (8) By adding the two normalized partition measure funtions, a new validity index, v SV, is formulated as following: v SV (, V; X) =v un (, V; X)+v on (, V). (9) The goal is to find the optimal luster number with the smallest value of v SV ( ) for =2to max. 4. Experimental Results Two kinds of experiments were performed to verify the proposed method. The first one is intended to determine the optimal number of lusters with two noisy

LETTER 283 (a) Input data. Fig. 3 (b) Index funtions. (a) Input data. Fig. 4 (b) Index funtions. Data 1 and results. Data 2 and results. data sets, Data 1 and Data 2. The seond one is to show how effetively v SV works on objet extration in real images. The proposed index, v SV, is also ompared with other ones: v PC,v PE, v FS, v XB, v K, and v rit. For eah of the data sets, FCMA was performed with the weighting exponent m = 2and a terminating ondition ɛ =10 5. An alternating optimization tehnique is used for the lustering, and the fuzzy partition matrix is hosen as the initial value (U 0 ) instead of the prototype. Also, the partition matrix is randomly generated suh that fuzzy membership values satisfy the following onditions: u ik [0, 1] for all i, k and i u ik = 1 for all k. In the first experiment, the luster numbers are varied from 2to 10 for both low noisy data and relatively high noisy ones to find the optimal number of lusters. As shown in Figs. 3 (a) and 4 (a), eah data is preferably expeted to be = 6 for Data 1, and = 5 for Data 2 as the optimal luster numbers respetively. Figures 3 (b) and 4 (b) show validity related funtions with respet to the luster number. For the low noise ase (Data 1), two funtions, v un ( ) and v on ( ), show a fast gradient hange around = 6. Naturally, the index funtion v SV ( ) shows a steep valley at = 6. The similar results are obtained for the relatively high noise ase (Data 2). It is also investigated that more noises in the data generate a less steep valley at the optimal luster number. In other words, the steepness degree of the valley at = implies how the lusters are ompat and well-separated. Table 1 shows the values of the seven validity indies for = 2to 10. We highlighted the optimal value of hosen by eah index. For the noisy data sets, v PC or v PE gives inorret numbers. Although v FS works fairly well for two data sets, it is known that this index beomes unreliable for large or small m [5]. v XB and v K also work well in these experiments, however, they have a dereasing tendeny when the number of lusters beomes very large. The punishing term added by Kweon [6] does not play effiiently in eliminating the dereasing tendeny sine this ad ho term beomes relatively small for the large data set by the following relation: n u 2 ij x j v i 1 v i v. (10) j=1 i=1 i=1 Both the index v SV and v rit indiate the optimal luster number orretly and provide two advantages ompared with other methods. First, the index funtions do not derease when beomes large. Seond, these indies show a steep valley at the optimal luster number. It is also investigated that the proposed one shows a more steep valley at than v rit. Moreover, the index values remain around 0 to 1 regardless of the numbers and strutures of the data. The seond experiment shows how the proposed index works when applied to the objet extration problem from a real image. The image is taken from the Hamburg Taxi sequenes as shown in Fig. 5 (a). The image ontains two ars, the one, a taxi, is moving to the left and upper diretions and the other is parked near the taxi. The objetive is to extrat eah ar from the bakground image using some of the features. We hoose a segmented image as the first feature and motion vetors as the seond one as shown in Figs. 5 (b), (). Otsu s method [1] was used for the image segmentation, and the result is a binary image whih has 1 for the pixels lassified into the objet and 0 for the bakground. The motion vetors are omputed by the blok mathing algorithm with a blok size 4 4. The seleted features in this experiment are expressed as X =[x 1, x 2,, x n ] T, (11) where, x i = [s i,m xi,m yi ], and s i, m xi and m yi are a segmentation label, x and y diretional motions of the i th pixel respetively. The optimal luster numbers found by seven validity indies for = 2to 5 are shown in Table 2. Only two indies v rit and v SV find the orret luster number = 3. The other indies derease or inrease monotonially for = 2to 5. Hene, the lustering algorithm based on = 3 will provide the best results suh that all lusters are strutually ompat and well separated. Figures 6 (a) () are the lustering results whih show the extrated objets from the input image. 5. Conlusion We investigated the strutural harateristis of the lusters in the partitioning proess. And an effiient validity index is developed by ombining the two partition

284 IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 Fig. 5 Input image and features. Fig. 6 Clustering results. Table 1 Performane omparison. Data 1 Data 2 v PC v PE v FS v XB v K v rit v SV v PC v PE v FS v XB v K v rit v SV 2 0.68 0.69 32.78 0.15 147.76 66.64 1.00 0.69 0.67 36.07 0.14 179.31 48.46 1.00 3 0.61 0.97-24.14 0.07 72.14 38.92 0.52 0.62 0.96-26.01 0.08 100.63 30.39 0.54 4 0.63 1.03-57.02 0.04 46.22 28.08 0.32 0.66 0.94-92.67 0.04 52.20 19.98 0.26 5 0.67 0.99-80.34 0.03 36.50 22.45 0.21 0.64 1.05-99.00 0.03 47.36 17.41 0.16 6 0.69 1.01-83.45 0.02 27.28 16.72 0.09 0.59 1.26-95.24 0.04 58.49 22.69 0.25 7 0.63 1.18-78.92 0.07 71.55 58.46 0.60 0.55 1.41-91.80 0.04 53.14 23.75 0.29 8 0.60 1.31-76.49 0.05 51.58 43.19 0.46 0.53 1.52-89.80 0.03 49.68 25.17 0.35 9 0.57 1.40-75.58 0.04 44.32 44.22 0.51 0.50 1.59-87.49 0.04 60.47 49.58 0.81 10 0.54 1.51-70.08 0.05 54.45 74.05 1.00 0.49 1.68-85.88 0.04 56.51 55.61 1.00 Table 2 Test results of the image. v PC v PE v FS v XB v K v rit v SV 2 0.9639 0.0833-1412.1284 0.0284 198.2599 13.0161 1.0000 3 0.9821 0.0550-1912.4026 0.0088 62.7172 7.4593 0.4959 4 0.9955 0.0133-2192.3681 0.0021 16.8974 11.9804 0.7985 5 0.9997 0.0008-2250.0552 0.0001 3.2486 12.0756 1.0000 funtions, i.e., an over-partition measure funtion and an under-partition measure funtion. The proposed index was suessfully applied to two numerial data and a real image. It provided enhaned performanes when ompared with the previous studies. Most of all, v SV showed the steepest valley at the = for the three different data sets and did not derease for large. In addition, sine only the strutural harateristis are used instead of the partition matrix, this validity index works effetively not only for HCMA but for FCMA. Referenes [1] N. Otsu, A threshold seletion method from gray-level histogram, IEEE Trans. Syst., Man. & Cybern. vol.smc-9, no.1, pp.62 66, 1979. [2] J.C. Bezdek, Pattern reognition with fuzzy objetive funtion algorithms, New York, 1981. [3] Y. Fukayama and M. Sugeno, A new method of hoosing the number of lusters for the fuzzy -means method, Pro. 5th Fuzzy Syst. Symp., pp.247 250, 1989. [4] N.L. Xie and G.A. Beni, A validity measure for fuzzy lus-

LETTER 285 tering, IEEE Trans. PAMI, vol.13, no.8, pp.841 847, 1991. [5] N.R. Pal and J.C. Bezdek, On luster validity for the fuzzy -means model, IEEE Trans. Fuzzy Syst., vol.3, no.3, pp.370 379, 1995. [6] S.H. Kweon, Cluster validity index for fuzzy lustering, Eletron. Lett., vol.34, no.22, pp.2176 2177, 1999. [7] A.O. Boudraa, Dynami estimation of number of lusters in data sets, Eletron. Lett., vol.35, no.19, pp.1606 1607, 1999.