Rough Set based Cluster Ensemble Selection

Size: px

Start display at page:

Download "Rough Set based Cluster Ensemble Selection"

Victoria Newton
5 years ago
Views:

1 Rough Set based Cluster Ensemble Selection Xueen Wang, Deqiang Han, Chongzhao Han Ministry of Education Key Lab for Intelligent Networks and Network Security (MOE KLINNS Lab), Institute of Integrated Automation, School of Electronics and Information Engineering, Xi an Jiaotong University, Xi an, China Abstract Ensemble clustering have been attracting lots of attentions, which combining several base data partitions to generate a single consensus partition with improved stability and robustness. Diversity is critical for the success of ensemble clustering. To enhance this characteristic, a subset of cluster ensemble is selected by removing the redundant partitions. Combined with ranking and forward selection strategies, the significance of attribute defined in rough set theory is employed as a heuristic to find the subset of cluster ensemble. Experimental results on the UCI machine learning repository demonstrate that the proposed algorithm is feasible and effective. Keywords ensemble clustering; rough set; feature selection; attribute significance I. INTRODUCTION In typical, cluster ensemble framework produces a larger set of clustering results and then combines them using consensus function to create a single consensus partition with improved stability and robustness. Ensemble Clustering have proved to be a good alternative when facing cluster analysis problems[1]. There are many studies have been done to improve the performance of ensemble clustering, which focus on generation of multiple clusterings/partitions [2], consensus function[3-7] and selection of cluster ensemble[8-11]. The technique of cluster ensemble selection is to form an ensemble with a subset of the base partitions. Many researches have demonstrated that choosing a subset of base partitions to form a smaller cluster ensemble can perform as well as, or better than, using all available partitions [8, 12, 13]. By treating each base partition as a feature/attribute, the cluster ensemble selection is transformed into an unsupervised feature selection problem. The diversity among different partitions is crucial for ensemble clustering, which is a necessary condition for improving the performance. Also, diversity is critical for the selection of the partitions to be combined[11]. However, the diversity is hard to be defined clearly in unsupervised learning. The normalized mutual information (NMI) and adjusted rand index (A) are commonly employed to measure the diversity or quality of partition(s) in the literature. Zhou and Tang proposed an selection approach using NMI weights [12]. Hadjitodorov et al. selected the ensemble corresponding to the median A based diversity [13]. Fern and Lin designed three ensemble selection methods based on quality and diversity [14]. Hong et al. introduced a selective clustering ensemble method through resampling technique[9]. Jia et al. proposed a selective spectral clustering ensemble method based on bagging technique [8]. In order to enlarge the diversity in cluster ensemble, a direct way is removing the redundant features. While the rough set theory is an effective approach for removing the redundant features. Generally, a significance measure of attribute is defined and used as the heuristic in rough set based attribute reduction. The significance of attribute represents the discernibility power of the attribute in both supervised and unsupervised cases. In this paper, we employ the significance of attribute as a heuristic for unsupervised feature selection. By employing the significance of attribute defined in rough set theory, we proposed two unsupervised feature selection algorithm: ranking based feature selection and forward feature selection algorithm. The redundant feature may be reserved in the selected subset obtained by ranking based method. The forward feature selection can obtain feature subset with none redundant. The remainder of this paper is organized as follows. A background knowledge of ensemble clustering and fundamental of rough set are presented in section II. The proposed cluster ensemble selection algorithm is presented in section III. In section IV, the experimental results on UCI data repository is given. The last section concludes this paper. II. BACKGROUND AND RELATED WORKS A. Basic of Ensemble clustering The representation of ensemble clustering is reviewed in this section. Let X = { x } n i i= 1 denotes a set of p objects/samples/points in R, where p is the dimension of attribute (or feature) associated with the object; let P ={ P1,..., P l } be a cluster ensemble with l component (base) partitions/clusterings generated by individual clustering procedure; let P X be the set of all possible partitions with the set of objects X ( P P X ). The goal of ensemble clustering is to combine multiple partitions P into a single data partition (i.e., consensus partition) P P X, which can better represent the properties of each partition in P [1]. Generally, there are two crucial steps in developing an ensemble clustering algorithm: 1) generating multiple partitions of the data; 2) producing the consensus partition (i.e. consensus function). In general, the individual clustering can be generated in many ways such as changing initialization or other parameters of the clustering algorithm[4, 6], randomly selecting the ber of cluster for each cluster component[4, 15], using different feature/sample subset for clustering[2, 16,

2 17]. For the second step, the consensus function i.e., producing the consensus partition, have been proved to be a NP-hard problem for finding the best partition [18]. Many approaches have been proposed from various perspectives (e.g., clusterbased, statistical, and combinational) [4, 6, 19]. B. Basic of the Rough Set Theory The basic defines and notions of the rough set are shown in this subsection[20]. Information system is defined by Pawlak for data represent, which can be given as: S = ( U, C, D, V, f) (1) where U is the universe of discourse, a non-empty finite set of objects; C is a finite set of condition attributes; D is a finite set of decision attributes; V = V a C D a, where V a is a set of the domain of an attribute a C D ; f : U ( C D) V is the total decision function such that f ( xa, ) = va Vafor x U, a C D. For any subset A C D, an equivalence relation (also called indiscernibility relation) can be defined as: 2 IND( A) = {( x, y) U f ( x, a)= f ( y, a), a A} (2) A partition of U can be got based on IND( A ) denoted by U IND( A ) simply. Every block of the partition is an equivalence class which can be denoted x = y U ( x, y) IND( A). as[ ] { } A Let every X U, the lower approximation and upper approximation of X with respect to B can be get by: { [ ] A } [ ] A ( X) = x U x X { } A ( X) = x U x X The object in the lower approximation can be certain classified as X using A. The object in the upper approximation can be possibly classified as A using B. The pair of ( A ( X ), A ( X ) ) is referenced as the Pawlak s rough set of X with respect to A. The rough set theory have already been studied from the algebra viewpoint and information viewpoint respectively[21]. An attribute subset R is an information reduct of C, iff it satisfies the following conditions: HR ( )= HC ( ) (4) H ( R { a}) H( C), a R A (3) Where H ( A) is information entropy of the partition U IND( A ) produced by attribute set A. The probabilistic space of U IND( A) can be denoted as: [ U A: p] X1 X2 Xn A = px ( 1) px ( 2) px ( na) Where p( Xi) = Xi U, for i = 1,, na.the information entropy of A can be defined as[22]: (5) n A H ( A) = p( Xi)log ( p( Xi) ) (6) i= 1 Form the information viewpoint, the significance of attribute a in A can be defined as[23]: Sig( a, A)= H ( A)- H ( A - a ) (7) The significance measures the increment of discernibility power introduced by attribute a and it can works in unsupervised feature selection. III. ROUGH SET BASED CLUSTER ENSEMBLE SELECTION In this section, we first give the framework of the selective ensemble clustering method in Fig. 1. There are three steps for obtaining an consensus partition: generating a cluster ensemble contains a mount of diverse component (base) partitions; selecting a subset of all component partitions; combining the selected partitions to generate a consensus partition (consensus function). In this paper, we focus on the selection method of component partitions. Fig. 1. the framework of selective ensemble clustering A. Generation of clustering ensemble The diversity is crucial for improving the performance of ensemble learning. In this paper, the k-means (KM) clustering algorithm is the base learner. The diverse component partitions are generated by two approaches: 1) Random initialization of k-means. The KM algorithm is sensitive to the initialization, which can help us to generate diverse component partitions. 2) Random selection of feature subset. Each feature subset describe the data from a partial view, which can enlarge the diversity of component partitions. B. Consensus functions The consensus function is the main step in any ensemble clustering algorithm. In experiment, we select three popular consensus functions to combine the selected partitions: cluster-

3 based similarity partitioning algorithm (CSPA), metaclustering algorithm (MCLA) and evidence accumulation method (EAC). Where CSPA and MCLA are graph-based method proposed by Stehl and Ghosh[3]. EAC is a hierarchical agglomerative approach based on the co-association matrix proposed by Fred and Jain[4]. C. selection of component partitions The task of ensemble selection is select a subset of partitions to form a smaller yet better-performing cluster ensemble than using all available partitions[14]. Having obtained the cluster ensemble, each component partition in the cluster ensemble can be treated as a nominal attribute (feature). The problem is turn to be a unsupervised feature selection problem. There are two basic issues in each approach to feature selection: evaluation measure and search strategy. The diversity and quality are claimed to be critical for the selection of the partitions to be combined[11]. However, the quality and diversity are loosely defined concept in clustering. In rough set theory, the significance of attribute(s) can represent the correspondence discernibility power, which can be treated as a measure of quality for clustering. High classification accuracy may involved with high significance of attribute(s). In this paper, the significance is employed as an evaluation measure. Moreover, we give two kind of feature selection algorithm based on different search strategies. First, we give a ranking based feature selection algorithm employing the significance of single feature with respect to P. Feature ranking (weighting) is a simple and widely used search strategy in feature selection method which rank all features in a descending or ascending order with respect to the evaluation measure. There is a user defined parameter which is the ber of features need to be selected. The framework of the approach is given in Fig. 2. Input: the generated collection of partitions P, the ber of features need to be selected Num. Output: the selected subset of partitions 1: H ( P ) Getting the partition U IND( P ) and Compute the correspondence information entropy 2: For i = 1:l 3: H( P - P i ) Getting the partition U IND( P -P i ) and Compute the correspondence information entropy 4: Sig( Pi, P ) Calculate the significance of P i with respect to P 5: End For 6: Sorting partitions according to Sig( Pi, P ) in descending order. 7: Selecting the first Num of features as the reduced subset. Fig. 2. Ranking based feature selection algorithm As known, the ranking based method cannot handle feature redundancy. Redundant feature(s) may be reserved in the selected subset, which can reduce the diversity of the selected subset. Generally, a forward selection method is employed in the supervised feature selection, which can select a feature subset with none redundant feature. Likewise, we give a forward selection method using the significance of attribute(s) as a heuristic. The forward selection approach starts with an empty subset and incrementally adds features that result in an increase in the significance value. This procedure will be stopped when the significance of selected features equals to 1. Then, a information reduct was found. There is an alternative stopping criteria to end the procedure, which is define as: H( R) H( P )+log (1- ε ) (8) Where R is the select subset of features, the parameter ε [0,1) can de defined as discernibility error rate of R with respect to P. This reduct R is can be defined as an ( H, ε ) - approximate information reduct [24]. An information reduct can be obtained when sets ε equals to zero, i.e. there is no error between P and R. Setting ε equals to a small value, we can discard features which carry little or no further information than the selected subset. The framework of the forward selection approach is shown in Fig. 3. Input: the generated collection of partitions P and ε Output: the reduct of partitions 1: R 2: do 3: For P ( P R), Compute the partition U IND( P R { P} ). 4: Compute the information entropy H ( P R { P} ). 5: Selecting the partition P with the maximal sig( R { P}, P ), 6: if sig( R { P}, P) > sig( R, P) 7: P = P - P 8: R = R { P} 9: end if 10: until HR ( ) H( P ) + log 2 (1- ε ) 11: Return R. Fig. 3. forward unsupervised feature selection algorithm IV. EXPEMENTAL RESULTS In this section, the following experiment was carried out to find out whether the proposed selection algorithms can work out. Many authors have used real benchmark data sets with known class labels to evaluate clustering algorithms and we will follow this tradition here[15]. We selected five real data sets from UCI machine leaning repository[28]. There names and characteristics are given in Table 1. The Rand Index () method is employed to evaluate agreement between the consensus partitions and the real partition. The Rand Index between two partitions can be defined as[25]: 2( n00 + n11) ( Pi, Pj) = (9) nn ( 1) Where n 11 is the ber of pairs that objects in the pair are placed in the same group in P i and in the same group in P j ; n 00 is the ber of pairs that objects in the pair are placed in the different group in P i and in the different group in P j. 2

4 TABLE I. SUMARY OF UCI DATA SETS Data sets Samples Dimension Clusters 1 Glass WPBC Seeds Ecoli Lung The experiment is setup as follows. An cluster ensemble is generated using the scheme given in part A of section III. The ensemble size is set to 100( l =100 ). we set the ber of clusters as the same as the true class ber. The dimensionality of selected feature subset varies with a proportion of original feature set size from 0.3 to 0.8. Once the cluster ensemble have been generated, we run three selection algorithm to select different subset of the generated partitions. As the diversity is important in ensemble selection and it has been employed in many algorithm[8, 10, 13, 26, 27]. In our experiment, we employed a ranking based feature selection method as the baseline which employs NMI to measure the diversities of partitions and ranks the partitions according to the diversities[10]. Here, this method is denoted as. The proposed ranking based feature selection method which ranks the partitoins according its significance is denoted as. The proposed forward unsupervised feature selection method based on the significance of partition(s) is denoted as. For Sig- UFS, we select a small error rate ε =0.05. For the ranking based feature selection methods used in this experiments, the selected ber vary from 10 to 90 with step 10. Then nine subsets of all generated partitions are obtained for each ranking based selection method. In, there is only one subset is selected and the size of selected subsets is determined automatically. For each selected subset of cluster ensemble, we obtain the consensus partitions using CSPA, MCLA and EAC individually. For the EAC algorithm, we use the average linkage based hierarchical agglomerative clustering algorithm. The experiment is executed for 20 runs, the average value are reported in Fig. 4. The results for in the figure is a straight line because it is not related with the and the FULL ensemble clustering result is the case when equals to 100. Form the results, we can see that the ranking based method whether or NMI- Ranking can be able to obtain better performance than FULL by selecting a suitable (the size of selected subset). For the method, the selected subset get better results than FULL except for employing CSPA on data sets Glass and WPBC. The results for and FULL based ensemble clustering with different consensus function are collected and reported in Table II to Table IV. For and NMI- Ranking based selective ensemble clustering method, the best results among the different are also reported in Table II to Table IV. It can be seen that the performance of selection method is related to the consensus function employed in the ensemble clustering. For ensemble clustering method employing CSPA as consensus function, the and based selection methods achieve almost the same average accuracies, which is better than based selection method. For ensemble clustering with MCLA and EAC, the based selection method outperform the other methods. As shown in Fig. 4, the accuracies for ranking based selection method varies with. For and, the size of subset of partitions which get best performance with different consensus function are shown in Table V. The size of selected subset for is also shown in Table V. It can be observed that the based selection can get smaller size of best selected feature subset than based selection method in average. Meanwhile the based method can get as well as or even better results than based method. Also the best result of based selection method outperforms the method. However, determining the best is a challenge for the ranking based selection method. It can be observed that the best sizes for ranking based selection method have a wide range from 10 to 90. There is no crisp relation between the and. The best is related to both the data sets and the employed consensus function. Generally, the ranking based selection method is unstable with respect to the selected ber of features, where redundant feature(s) are reserved in the selected subset. Based on the experimental results, the method is suggested to select subset of all generated partitions when the can be determined in advance. Otherwise, the method, which can obtain better performance than FULL ensemble clustering, is suggested to select subset of all generated partitions. TABLE II. COMPASON OF SELECTIVE ENSEMBLE FOR CSPA FULL Glass WPBC Seeds Ecoli Lung Average TABLE III. COMPASON OF SELECTIVE ENSEMBLE FOR MCLA FULL Glass WPBC Seeds Ecoli Lung Average

5 TABLE IV. COMPASON OF SELECTIVE ENSEMBLE FOR EAC FULL Glass WPBC Seeds Ecoli Lung Average TABLE V. SIZE OF SELECTED FEATURE SUBSET CSPA MCLA EAC CSPA MCLA EAC Glass WPBC Seeds Ecoli Lung Average V. CONCLUSION In this paper, we proposed two kind of clustering ensemble selection method based on the attribute significance: Sig- Ranking and. Experimental results on UCI data sets have demonstrated that the significance of attribute(s) is a effective criteria in the clustering ensemble selection. The selective ensemble clustering based on two proposed selection methods can achieve higher accuracies than ensemble clustering with all generated partitions. ACKNOWLEDGMENT This work is supported by National Natural Science Foundation of China (No , No ) and Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No ). We want to thank Prof. Dr. Alexander Strehl who graciously shares the code for comparison purposes. REFERENCES [1] S. Vega-Pons and J. Ruiz-Shulcloper, "A Survey of Clustering Ensemble Algorithms," International Journal of Pattern Recognition and Artificial Intelligence, vol. 25, pp , May [2] A. Topchy, A. K. Jain, and W. Punch, "Clustering ensembles: models of consensus and weak partitions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp , [3] A. Strehl and J. Ghosh, "Cluster ensembles - a knowledge reuse framework for combining multiple partitions," J. Mach. Learn. Res., vol. 3, pp , [4] A. Fred and A. Jain, "Combining multiple clusterings using evidence accumulation " IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp , [5] N. Iam-On, T. Boongoen, S. Garrett, and C. Price, "A Link-Based Approach to the Cluster Ensemble Problem," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp , [6] A. Topchy, A. K. Jain, and W. Punch, "A Mixture Model for Clustering Ensembles " in Proceedings SIAM International Conference on Data Mining, 2004, pp [7] H. G. Ayad and M. S. Kamel, "On voting-based consensus of cluster ensembles," Pattern Recognition, vol. 43, pp , May [8] J. H. Jia, X. Xiao, B. X. Liu, and L. C. Jiao, "Bagging-based spectral clustering ensemble selection," Pattern Recognition Letters, vol. 32, pp , [9] Y. Hong, S. Kwong, H. Wang, and Q. Ren, "Resampling-based selective clustering ensembles," Pattern recognition letters, vol. 30, pp , [10] J. Azimi and X. Fern, "Adaptive cluster ensemble selection," presented at the Proceedings of the 21st international jont conference on Artifical intelligence, Pasadena, California, USA, [11] M. C. Naldi, A. C. P. L. F. Carvalho, and R. J. G. B. Campello, "Cluster ensemble selection based on relative validity indexes," Data Mining and Knowledge Discovery, pp. 1-31, [12] Z.H. Zhou and W. Tang, "Clusterer ensemble," Knowledge-Based Systems, vol. 19, pp , [13] S. T. Hadjitodorov, L. I. Kuncheva, and L. P. Todorova, "Moderate diversity for better cluster ensembles," Information Fusion, vol. 7, pp , [14] X. Z. Fern and W. Lin, "Cluster Ensemble Selection," Statistical Analysis and Data Mining, vol. 1, pp , [15] L. I. Kuncheva and D. P. Vetrov, "Evaluation of Stability of k-means Cluster Ensembles with Respect to Random Initialization," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, pp , [16] C. Domeniconi and M. Al-Razgan, "Weighted cluster ensembles: Methods and analysis," ACM Trans. Knowl. Discov. Data, vol. 2, pp. 1-40, [17] B. Fischer and J. M. Buhmann, "Bagging for path-based clustering," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp , [18] A. Goder and V. Filkov, " Consensus Clustering Algorithms: Comparison and Refinement," presented at the 2008 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), San Francisco, [19] J. P. Barthelemy and B. Leclerc, "The median procedure for partition," in In Partitioning Data Sets, AMS DIMACS Series in Discrete Mathematics vol. 19, I. J. C. e. al, Ed., ed, [20] Z. Pawlak, "Rough sets," International Journal of Parallel Programming, vol. 11, pp , Oct [21] G. Wang, J. Zhao, J. An, and Y. Wu, "A Comparative Study of Algebra Viewpoint and Information Viewpoint in Attribute Reduction," Fundamenta Informaticae, vol. 68, pp , Sep [22] C. E. Shannon, "A mathematical theory of communication," Bell System Technical Journal, pp , , [23] Q. Hu, D. Yu, and Z. Xie, "Information-preserving hybrid data reduction based on fuzzy-rough techniques," Pattern Recognition Letters, vol. 27, pp , [24] D. Slezak, "Approximate entropy reducts," Fundamenta Informaticae, vol. 53, pp , Dec [25] L. Hubert and P. Arabie, "Comparing Partitions," Journal of Classification, vol. 2, pp , [26] L. I. Kuncheva and S. T. Hadjitodorov, "Using diversity in cluster ensembles," in 2004 IEEE International Conference on Systems, Man and Cybernetics, vol.2, pp , [27] A. Tsymbal, M. Pechenizkiy, and P. Cunningham, "Diversity in search strategies for ensemble feature selection," Information Fusion, vol. 6, pp , [28] Frank A, and Asuncion A. UCI Machine Learning Repository [ Irvine, CA: University of California, School of Information and Computer Science, [29]

6 Glass:CSPA Glass:MCLA Glass:EAC WPBC:CSPA WPBC:MCLA WPBC:EAC Seeds:CSPA 0.89 Ecoli:CSPA Seeds:MCLA Ecoli:MCLA Seeds:EAC Ecoli:EA C Fig. 4. Clustering accuracy of UCI datasets with different ber of selected partitions

7 lung:cspa Fig. 4. (continued) lung:mcla lung:eac 0.6

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal: