Rough Set based Cluster Ensemble Selection

Size: px
Start display at page:

Download "Rough Set based Cluster Ensemble Selection"

Transcription

1 Rough Set based Cluster Ensemble Selection Xueen Wang, Deqiang Han, Chongzhao Han Ministry of Education Key Lab for Intelligent Networks and Network Security (MOE KLINNS Lab), Institute of Integrated Automation, School of Electronics and Information Engineering, Xi an Jiaotong University, Xi an, China Abstract Ensemble clustering have been attracting lots of attentions, which combining several base data partitions to generate a single consensus partition with improved stability and robustness. Diversity is critical for the success of ensemble clustering. To enhance this characteristic, a subset of cluster ensemble is selected by removing the redundant partitions. Combined with ranking and forward selection strategies, the significance of attribute defined in rough set theory is employed as a heuristic to find the subset of cluster ensemble. Experimental results on the UCI machine learning repository demonstrate that the proposed algorithm is feasible and effective. Keywords ensemble clustering; rough set; feature selection; attribute significance I. INTRODUCTION In typical, cluster ensemble framework produces a larger set of clustering results and then combines them using consensus function to create a single consensus partition with improved stability and robustness. Ensemble Clustering have proved to be a good alternative when facing cluster analysis problems[1]. There are many studies have been done to improve the performance of ensemble clustering, which focus on generation of multiple clusterings/partitions [2], consensus function[3-7] and selection of cluster ensemble[8-11]. The technique of cluster ensemble selection is to form an ensemble with a subset of the base partitions. Many researches have demonstrated that choosing a subset of base partitions to form a smaller cluster ensemble can perform as well as, or better than, using all available partitions [8, 12, 13]. By treating each base partition as a feature/attribute, the cluster ensemble selection is transformed into an unsupervised feature selection problem. The diversity among different partitions is crucial for ensemble clustering, which is a necessary condition for improving the performance. Also, diversity is critical for the selection of the partitions to be combined[11]. However, the diversity is hard to be defined clearly in unsupervised learning. The normalized mutual information (NMI) and adjusted rand index (A) are commonly employed to measure the diversity or quality of partition(s) in the literature. Zhou and Tang proposed an selection approach using NMI weights [12]. Hadjitodorov et al. selected the ensemble corresponding to the median A based diversity [13]. Fern and Lin designed three ensemble selection methods based on quality and diversity [14]. Hong et al. introduced a selective clustering ensemble method through resampling technique[9]. Jia et al. proposed a selective spectral clustering ensemble method based on bagging technique [8]. In order to enlarge the diversity in cluster ensemble, a direct way is removing the redundant features. While the rough set theory is an effective approach for removing the redundant features. Generally, a significance measure of attribute is defined and used as the heuristic in rough set based attribute reduction. The significance of attribute represents the discernibility power of the attribute in both supervised and unsupervised cases. In this paper, we employ the significance of attribute as a heuristic for unsupervised feature selection. By employing the significance of attribute defined in rough set theory, we proposed two unsupervised feature selection algorithm: ranking based feature selection and forward feature selection algorithm. The redundant feature may be reserved in the selected subset obtained by ranking based method. The forward feature selection can obtain feature subset with none redundant. The remainder of this paper is organized as follows. A background knowledge of ensemble clustering and fundamental of rough set are presented in section II. The proposed cluster ensemble selection algorithm is presented in section III. In section IV, the experimental results on UCI data repository is given. The last section concludes this paper. II. BACKGROUND AND RELATED WORKS A. Basic of Ensemble clustering The representation of ensemble clustering is reviewed in this section. Let X = { x } n i i= 1 denotes a set of p objects/samples/points in R, where p is the dimension of attribute (or feature) associated with the object; let P ={ P1,..., P l } be a cluster ensemble with l component (base) partitions/clusterings generated by individual clustering procedure; let P X be the set of all possible partitions with the set of objects X ( P P X ). The goal of ensemble clustering is to combine multiple partitions P into a single data partition (i.e., consensus partition) P P X, which can better represent the properties of each partition in P [1]. Generally, there are two crucial steps in developing an ensemble clustering algorithm: 1) generating multiple partitions of the data; 2) producing the consensus partition (i.e. consensus function). In general, the individual clustering can be generated in many ways such as changing initialization or other parameters of the clustering algorithm[4, 6], randomly selecting the ber of cluster for each cluster component[4, 15], using different feature/sample subset for clustering[2, 16,

2 17]. For the second step, the consensus function i.e., producing the consensus partition, have been proved to be a NP-hard problem for finding the best partition [18]. Many approaches have been proposed from various perspectives (e.g., clusterbased, statistical, and combinational) [4, 6, 19]. B. Basic of the Rough Set Theory The basic defines and notions of the rough set are shown in this subsection[20]. Information system is defined by Pawlak for data represent, which can be given as: S = ( U, C, D, V, f) (1) where U is the universe of discourse, a non-empty finite set of objects; C is a finite set of condition attributes; D is a finite set of decision attributes; V = V a C D a, where V a is a set of the domain of an attribute a C D ; f : U ( C D) V is the total decision function such that f ( xa, ) = va Vafor x U, a C D. For any subset A C D, an equivalence relation (also called indiscernibility relation) can be defined as: 2 IND( A) = {( x, y) U f ( x, a)= f ( y, a), a A} (2) A partition of U can be got based on IND( A ) denoted by U IND( A ) simply. Every block of the partition is an equivalence class which can be denoted x = y U ( x, y) IND( A). as[ ] { } A Let every X U, the lower approximation and upper approximation of X with respect to B can be get by: { [ ] A } [ ] A ( X) = x U x X { } A ( X) = x U x X The object in the lower approximation can be certain classified as X using A. The object in the upper approximation can be possibly classified as A using B. The pair of ( A ( X ), A ( X ) ) is referenced as the Pawlak s rough set of X with respect to A. The rough set theory have already been studied from the algebra viewpoint and information viewpoint respectively[21]. An attribute subset R is an information reduct of C, iff it satisfies the following conditions: HR ( )= HC ( ) (4) H ( R { a}) H( C), a R A (3) Where H ( A) is information entropy of the partition U IND( A ) produced by attribute set A. The probabilistic space of U IND( A) can be denoted as: [ U A: p] X1 X2 Xn A = px ( 1) px ( 2) px ( na) Where p( Xi) = Xi U, for i = 1,, na.the information entropy of A can be defined as[22]: (5) n A H ( A) = p( Xi)log ( p( Xi) ) (6) i= 1 Form the information viewpoint, the significance of attribute a in A can be defined as[23]: Sig( a, A)= H ( A)- H ( A - a ) (7) The significance measures the increment of discernibility power introduced by attribute a and it can works in unsupervised feature selection. III. ROUGH SET BASED CLUSTER ENSEMBLE SELECTION In this section, we first give the framework of the selective ensemble clustering method in Fig. 1. There are three steps for obtaining an consensus partition: generating a cluster ensemble contains a mount of diverse component (base) partitions; selecting a subset of all component partitions; combining the selected partitions to generate a consensus partition (consensus function). In this paper, we focus on the selection method of component partitions. Fig. 1. the framework of selective ensemble clustering A. Generation of clustering ensemble The diversity is crucial for improving the performance of ensemble learning. In this paper, the k-means (KM) clustering algorithm is the base learner. The diverse component partitions are generated by two approaches: 1) Random initialization of k-means. The KM algorithm is sensitive to the initialization, which can help us to generate diverse component partitions. 2) Random selection of feature subset. Each feature subset describe the data from a partial view, which can enlarge the diversity of component partitions. B. Consensus functions The consensus function is the main step in any ensemble clustering algorithm. In experiment, we select three popular consensus functions to combine the selected partitions: cluster-

3 based similarity partitioning algorithm (CSPA), metaclustering algorithm (MCLA) and evidence accumulation method (EAC). Where CSPA and MCLA are graph-based method proposed by Stehl and Ghosh[3]. EAC is a hierarchical agglomerative approach based on the co-association matrix proposed by Fred and Jain[4]. C. selection of component partitions The task of ensemble selection is select a subset of partitions to form a smaller yet better-performing cluster ensemble than using all available partitions[14]. Having obtained the cluster ensemble, each component partition in the cluster ensemble can be treated as a nominal attribute (feature). The problem is turn to be a unsupervised feature selection problem. There are two basic issues in each approach to feature selection: evaluation measure and search strategy. The diversity and quality are claimed to be critical for the selection of the partitions to be combined[11]. However, the quality and diversity are loosely defined concept in clustering. In rough set theory, the significance of attribute(s) can represent the correspondence discernibility power, which can be treated as a measure of quality for clustering. High classification accuracy may involved with high significance of attribute(s). In this paper, the significance is employed as an evaluation measure. Moreover, we give two kind of feature selection algorithm based on different search strategies. First, we give a ranking based feature selection algorithm employing the significance of single feature with respect to P. Feature ranking (weighting) is a simple and widely used search strategy in feature selection method which rank all features in a descending or ascending order with respect to the evaluation measure. There is a user defined parameter which is the ber of features need to be selected. The framework of the approach is given in Fig. 2. Input: the generated collection of partitions P, the ber of features need to be selected Num. Output: the selected subset of partitions 1: H ( P ) Getting the partition U IND( P ) and Compute the correspondence information entropy 2: For i = 1:l 3: H( P - P i ) Getting the partition U IND( P -P i ) and Compute the correspondence information entropy 4: Sig( Pi, P ) Calculate the significance of P i with respect to P 5: End For 6: Sorting partitions according to Sig( Pi, P ) in descending order. 7: Selecting the first Num of features as the reduced subset. Fig. 2. Ranking based feature selection algorithm As known, the ranking based method cannot handle feature redundancy. Redundant feature(s) may be reserved in the selected subset, which can reduce the diversity of the selected subset. Generally, a forward selection method is employed in the supervised feature selection, which can select a feature subset with none redundant feature. Likewise, we give a forward selection method using the significance of attribute(s) as a heuristic. The forward selection approach starts with an empty subset and incrementally adds features that result in an increase in the significance value. This procedure will be stopped when the significance of selected features equals to 1. Then, a information reduct was found. There is an alternative stopping criteria to end the procedure, which is define as: H( R) H( P )+log (1- ε ) (8) Where R is the select subset of features, the parameter ε [0,1) can de defined as discernibility error rate of R with respect to P. This reduct R is can be defined as an ( H, ε ) - approximate information reduct [24]. An information reduct can be obtained when sets ε equals to zero, i.e. there is no error between P and R. Setting ε equals to a small value, we can discard features which carry little or no further information than the selected subset. The framework of the forward selection approach is shown in Fig. 3. Input: the generated collection of partitions P and ε Output: the reduct of partitions 1: R 2: do 3: For P ( P R), Compute the partition U IND( P R { P} ). 4: Compute the information entropy H ( P R { P} ). 5: Selecting the partition P with the maximal sig( R { P}, P ), 6: if sig( R { P}, P) > sig( R, P) 7: P = P - P 8: R = R { P} 9: end if 10: until HR ( ) H( P ) + log 2 (1- ε ) 11: Return R. Fig. 3. forward unsupervised feature selection algorithm IV. EXPEMENTAL RESULTS In this section, the following experiment was carried out to find out whether the proposed selection algorithms can work out. Many authors have used real benchmark data sets with known class labels to evaluate clustering algorithms and we will follow this tradition here[15]. We selected five real data sets from UCI machine leaning repository[28]. There names and characteristics are given in Table 1. The Rand Index () method is employed to evaluate agreement between the consensus partitions and the real partition. The Rand Index between two partitions can be defined as[25]: 2( n00 + n11) ( Pi, Pj) = (9) nn ( 1) Where n 11 is the ber of pairs that objects in the pair are placed in the same group in P i and in the same group in P j ; n 00 is the ber of pairs that objects in the pair are placed in the different group in P i and in the different group in P j. 2

4 TABLE I. SUMARY OF UCI DATA SETS Data sets Samples Dimension Clusters 1 Glass WPBC Seeds Ecoli Lung The experiment is setup as follows. An cluster ensemble is generated using the scheme given in part A of section III. The ensemble size is set to 100( l =100 ). we set the ber of clusters as the same as the true class ber. The dimensionality of selected feature subset varies with a proportion of original feature set size from 0.3 to 0.8. Once the cluster ensemble have been generated, we run three selection algorithm to select different subset of the generated partitions. As the diversity is important in ensemble selection and it has been employed in many algorithm[8, 10, 13, 26, 27]. In our experiment, we employed a ranking based feature selection method as the baseline which employs NMI to measure the diversities of partitions and ranks the partitions according to the diversities[10]. Here, this method is denoted as. The proposed ranking based feature selection method which ranks the partitoins according its significance is denoted as. The proposed forward unsupervised feature selection method based on the significance of partition(s) is denoted as. For Sig- UFS, we select a small error rate ε =0.05. For the ranking based feature selection methods used in this experiments, the selected ber vary from 10 to 90 with step 10. Then nine subsets of all generated partitions are obtained for each ranking based selection method. In, there is only one subset is selected and the size of selected subsets is determined automatically. For each selected subset of cluster ensemble, we obtain the consensus partitions using CSPA, MCLA and EAC individually. For the EAC algorithm, we use the average linkage based hierarchical agglomerative clustering algorithm. The experiment is executed for 20 runs, the average value are reported in Fig. 4. The results for in the figure is a straight line because it is not related with the and the FULL ensemble clustering result is the case when equals to 100. Form the results, we can see that the ranking based method whether or NMI- Ranking can be able to obtain better performance than FULL by selecting a suitable (the size of selected subset). For the method, the selected subset get better results than FULL except for employing CSPA on data sets Glass and WPBC. The results for and FULL based ensemble clustering with different consensus function are collected and reported in Table II to Table IV. For and NMI- Ranking based selective ensemble clustering method, the best results among the different are also reported in Table II to Table IV. It can be seen that the performance of selection method is related to the consensus function employed in the ensemble clustering. For ensemble clustering method employing CSPA as consensus function, the and based selection methods achieve almost the same average accuracies, which is better than based selection method. For ensemble clustering with MCLA and EAC, the based selection method outperform the other methods. As shown in Fig. 4, the accuracies for ranking based selection method varies with. For and, the size of subset of partitions which get best performance with different consensus function are shown in Table V. The size of selected subset for is also shown in Table V. It can be observed that the based selection can get smaller size of best selected feature subset than based selection method in average. Meanwhile the based method can get as well as or even better results than based method. Also the best result of based selection method outperforms the method. However, determining the best is a challenge for the ranking based selection method. It can be observed that the best sizes for ranking based selection method have a wide range from 10 to 90. There is no crisp relation between the and. The best is related to both the data sets and the employed consensus function. Generally, the ranking based selection method is unstable with respect to the selected ber of features, where redundant feature(s) are reserved in the selected subset. Based on the experimental results, the method is suggested to select subset of all generated partitions when the can be determined in advance. Otherwise, the method, which can obtain better performance than FULL ensemble clustering, is suggested to select subset of all generated partitions. TABLE II. COMPASON OF SELECTIVE ENSEMBLE FOR CSPA FULL Glass WPBC Seeds Ecoli Lung Average TABLE III. COMPASON OF SELECTIVE ENSEMBLE FOR MCLA FULL Glass WPBC Seeds Ecoli Lung Average

5 TABLE IV. COMPASON OF SELECTIVE ENSEMBLE FOR EAC FULL Glass WPBC Seeds Ecoli Lung Average TABLE V. SIZE OF SELECTED FEATURE SUBSET CSPA MCLA EAC CSPA MCLA EAC Glass WPBC Seeds Ecoli Lung Average V. CONCLUSION In this paper, we proposed two kind of clustering ensemble selection method based on the attribute significance: Sig- Ranking and. Experimental results on UCI data sets have demonstrated that the significance of attribute(s) is a effective criteria in the clustering ensemble selection. The selective ensemble clustering based on two proposed selection methods can achieve higher accuracies than ensemble clustering with all generated partitions. ACKNOWLEDGMENT This work is supported by National Natural Science Foundation of China (No , No ) and Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No ). We want to thank Prof. Dr. Alexander Strehl who graciously shares the code for comparison purposes. REFERENCES [1] S. Vega-Pons and J. Ruiz-Shulcloper, "A Survey of Clustering Ensemble Algorithms," International Journal of Pattern Recognition and Artificial Intelligence, vol. 25, pp , May [2] A. Topchy, A. K. Jain, and W. Punch, "Clustering ensembles: models of consensus and weak partitions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp , [3] A. Strehl and J. Ghosh, "Cluster ensembles - a knowledge reuse framework for combining multiple partitions," J. Mach. Learn. Res., vol. 3, pp , [4] A. Fred and A. Jain, "Combining multiple clusterings using evidence accumulation " IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp , [5] N. Iam-On, T. Boongoen, S. Garrett, and C. Price, "A Link-Based Approach to the Cluster Ensemble Problem," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp , [6] A. Topchy, A. K. Jain, and W. Punch, "A Mixture Model for Clustering Ensembles " in Proceedings SIAM International Conference on Data Mining, 2004, pp [7] H. G. Ayad and M. S. Kamel, "On voting-based consensus of cluster ensembles," Pattern Recognition, vol. 43, pp , May [8] J. H. Jia, X. Xiao, B. X. Liu, and L. C. Jiao, "Bagging-based spectral clustering ensemble selection," Pattern Recognition Letters, vol. 32, pp , [9] Y. Hong, S. Kwong, H. Wang, and Q. Ren, "Resampling-based selective clustering ensembles," Pattern recognition letters, vol. 30, pp , [10] J. Azimi and X. Fern, "Adaptive cluster ensemble selection," presented at the Proceedings of the 21st international jont conference on Artifical intelligence, Pasadena, California, USA, [11] M. C. Naldi, A. C. P. L. F. Carvalho, and R. J. G. B. Campello, "Cluster ensemble selection based on relative validity indexes," Data Mining and Knowledge Discovery, pp. 1-31, [12] Z.H. Zhou and W. Tang, "Clusterer ensemble," Knowledge-Based Systems, vol. 19, pp , [13] S. T. Hadjitodorov, L. I. Kuncheva, and L. P. Todorova, "Moderate diversity for better cluster ensembles," Information Fusion, vol. 7, pp , [14] X. Z. Fern and W. Lin, "Cluster Ensemble Selection," Statistical Analysis and Data Mining, vol. 1, pp , [15] L. I. Kuncheva and D. P. Vetrov, "Evaluation of Stability of k-means Cluster Ensembles with Respect to Random Initialization," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, pp , [16] C. Domeniconi and M. Al-Razgan, "Weighted cluster ensembles: Methods and analysis," ACM Trans. Knowl. Discov. Data, vol. 2, pp. 1-40, [17] B. Fischer and J. M. Buhmann, "Bagging for path-based clustering," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp , [18] A. Goder and V. Filkov, " Consensus Clustering Algorithms: Comparison and Refinement," presented at the 2008 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), San Francisco, [19] J. P. Barthelemy and B. Leclerc, "The median procedure for partition," in In Partitioning Data Sets, AMS DIMACS Series in Discrete Mathematics vol. 19, I. J. C. e. al, Ed., ed, [20] Z. Pawlak, "Rough sets," International Journal of Parallel Programming, vol. 11, pp , Oct [21] G. Wang, J. Zhao, J. An, and Y. Wu, "A Comparative Study of Algebra Viewpoint and Information Viewpoint in Attribute Reduction," Fundamenta Informaticae, vol. 68, pp , Sep [22] C. E. Shannon, "A mathematical theory of communication," Bell System Technical Journal, pp , , [23] Q. Hu, D. Yu, and Z. Xie, "Information-preserving hybrid data reduction based on fuzzy-rough techniques," Pattern Recognition Letters, vol. 27, pp , [24] D. Slezak, "Approximate entropy reducts," Fundamenta Informaticae, vol. 53, pp , Dec [25] L. Hubert and P. Arabie, "Comparing Partitions," Journal of Classification, vol. 2, pp , [26] L. I. Kuncheva and S. T. Hadjitodorov, "Using diversity in cluster ensembles," in 2004 IEEE International Conference on Systems, Man and Cybernetics, vol.2, pp , [27] A. Tsymbal, M. Pechenizkiy, and P. Cunningham, "Diversity in search strategies for ensemble feature selection," Information Fusion, vol. 6, pp , [28] Frank A, and Asuncion A. UCI Machine Learning Repository [ Irvine, CA: University of California, School of Information and Computer Science, [29]

6 Glass:CSPA Glass:MCLA Glass:EAC WPBC:CSPA WPBC:MCLA WPBC:EAC Seeds:CSPA 0.89 Ecoli:CSPA Seeds:MCLA Ecoli:MCLA Seeds:EAC Ecoli:EA C Fig. 4. Clustering accuracy of UCI datasets with different ber of selected partitions

7 lung:cspa Fig. 4. (continued) lung:mcla lung:eac 0.6

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:

More information

A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions

A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions Journal of mathematics and computer Science 6 (2013) 154-165 A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions Mohammad Ahmadzadeh Mazandaran University of Science and Technology m.ahmadzadeh@ustmb.ac.ir

More information

A Comparison of Resampling Methods for Clustering Ensembles

A Comparison of Resampling Methods for Clustering Ensembles A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department

More information

Consensus clustering by graph based approach

Consensus clustering by graph based approach Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr

More information

Minimal Test Cost Feature Selection with Positive Region Constraint

Minimal Test Cost Feature Selection with Positive Region Constraint Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR DATA CLUSTERING

COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR DATA CLUSTERING Author manuscript, published in "IEEE International Workshop on Machine Learning for Signal Processing, Grenoble : France (29)" COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS

K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS B Kalai Selvi PG Scholar, Department of CSE, Adhiyamaan College of Engineering, Hosur, Tamil Nadu, (India) ABSTRACT Data mining is the

More information

On Reduct Construction Algorithms

On Reduct Construction Algorithms 1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering

Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering Journal of Computational Information Systems 10: 12 (2014) 5147 5154 Available at http://www.jofcis.com Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering Ye TIAN 1, Peng YANG

More information

Attribute Reduction using Forward Selection and Relative Reduct Algorithm

Attribute Reduction using Forward Selection and Relative Reduct Algorithm Attribute Reduction using Forward Selection and Relative Reduct Algorithm P.Kalyani Associate Professor in Computer Science, SNR Sons College, Coimbatore, India. ABSTRACT Attribute reduction of an information

More information

Clustering Ensembles Based on Normalized Edges

Clustering Ensembles Based on Normalized Edges Clustering Ensembles Based on Normalized Edges Yan Li 1,JianYu 2, Pengwei Hao 1,3, and Zhulin Li 1 1 Center for Information Science, Peking University, Beijing, 100871, China {yanli, lizhulin}@cis.pku.edu.cn

More information

DATA clustering is an unsupervised learning technique

DATA clustering is an unsupervised learning technique IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS 1 Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities Dong Huang, Member, IEEE, Chang-Dong Wang, Member, IEEE, Hongxing

More information

THE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS

THE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS THE ENSEMBLE CONCEPTUAL CLUSTERING OF SYMBOLIC DATA FOR CUSTOMER LOYALTY ANALYSIS Marcin Pełka 1 1 Wroclaw University of Economics, Faculty of Economics, Management and Tourism, Department of Econometrics

More information

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

Feature Selection Based on Relative Attribute Dependency: An Experimental Study Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han, Ricardo Sanchez, Xiaohua Hu, T.Y. Lin Department of Computer Science, California State University Dominguez

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM XIAO-DONG ZENG, SAM CHAO, FAI WONG Faculty of Science and Technology, University of Macau, Macau, China E-MAIL: ma96506@umac.mo, lidiasc@umac.mo,

More information

Clustering ensemble method

Clustering ensemble method https://doi.org/10.1007/s13042-017-0756-7 ORIGINAL ARTICLE Clustering ensemble method Tahani Alqurashi 1 Wenjia Wang 1 Received: 28 September 2015 / Accepted: 20 October 2017 The Author(s) 2018 Abstract

More information

Distance based Clustering for Categorical Data

Distance based Clustering for Categorical Data Distance based Clustering for Categorical Data Extended Abstract Dino Ienco and Rosa Meo Dipartimento di Informatica, Università di Torino Italy e-mail: {ienco, meo}@di.unito.it Abstract. Learning distances

More information

Multi-Aspect Tagging for Collaborative Structuring

Multi-Aspect Tagging for Collaborative Structuring Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

Clustering of Data with Mixed Attributes based on Unified Similarity Metric Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Individualized Error Estimation for Classification and Regression Models

Individualized Error Estimation for Classification and Regression Models Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning Cluster Validation Ke Chen Reading: [5.., KPM], [Wang et al., 9], [Yang & Chen, ] COMP4 Machine Learning Outline Motivation and Background Internal index Motivation and general ideas Variance-based internal

More information

Survey on Rough Set Feature Selection Using Evolutionary Algorithm

Survey on Rough Set Feature Selection Using Evolutionary Algorithm Survey on Rough Set Feature Selection Using Evolutionary Algorithm M.Gayathri 1, Dr.C.Yamini 2 Research Scholar 1, Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women,

More information

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors)

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors) Advances in Fuzzy Clustering and Its Applications J. Valente de Oliveira and W. Pedrycz (Editors) Contents Preface 3 1 Soft Cluster Ensembles 1 1.1 Introduction................................... 1 1.1.1

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Fuzzy Entropy based feature selection for classification of hyperspectral data

Fuzzy Entropy based feature selection for classification of hyperspectral data Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering NIT Kurukshetra, 136119 mpce_pal@yahoo.co.uk Abstract: This paper proposes to use

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Improving Classifier Performance by Imputing Missing Values using Discretization Method

Improving Classifier Performance by Imputing Missing Values using Discretization Method Improving Classifier Performance by Imputing Missing Values using Discretization Method E. CHANDRA BLESSIE Assistant Professor, Department of Computer Science, D.J.Academy for Managerial Excellence, Coimbatore,

More information

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY

RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY RECORD-TO-RECORD TRAVEL ALGORITHM FOR ATTRIBUTE REDUCTION IN ROUGH SET THEORY MAJDI MAFARJA 1,2, SALWANI ABDULLAH 1 1 Data Mining and Optimization Research Group (DMO), Center for Artificial Intelligence

More information

American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

K-modes Clustering Algorithm for Categorical Data

K-modes Clustering Algorithm for Categorical Data K-modes Clustering Algorithm for Categorical Data Neha Sharma Samrat Ashok Technological Institute Department of Information Technology, Vidisha, India Nirmal Gaud Samrat Ashok Technological Institute

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Robust Ensemble Clustering by Matrix Completion

Robust Ensemble Clustering by Matrix Completion Robust Ensemble Clustering by Matrix Completion Jinfeng Yi, Tianbao Yang, Rong Jin, Anil K. Jain, Mehrdad Mahdavi Department of Computer Science and Engineering, Michigan State University Machine Learning

More information

DISCRETIZATION BASED ON CLUSTERING METHODS. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

DISCRETIZATION BASED ON CLUSTERING METHODS. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania DISCRETIZATION BASED ON CLUSTERING METHODS Daniela Joiţa Titu Maiorescu University, Bucharest, Romania daniela.oita@utm.ro Abstract. Many data mining algorithms require as a pre-processing step the discretization

More information

Some questions of consensus building using co-association

Some questions of consensus building using co-association Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper

More information

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

A New Clustering Algorithm On Nominal Data Sets

A New Clustering Algorithm On Nominal Data Sets A New Clustering Algorithm On Nominal Data Sets Bin Wang Abstract This paper presents a new clustering technique named as the Olary algorithm, which is suitable to cluster nominal data sets. This algorithm

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Wisdom of Crowds Cluster Ensemble

Wisdom of Crowds Cluster Ensemble Wisdom of Crowds Cluster Ensemble Hosein Alizadeh 1, Muhammad Yousefnezhad 2 and Behrouz Minaei Bidgoli 3 Abstract: The Wisdom of Crowds is a phenomenon described in social science that suggests four criteria

More information

Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction

Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction Hongbo Liu 1, Ajith Abraham 2, Hong Ye 1 1 School of Computer Science, Dalian Maritime University, Dalian 116026, China

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing

Meta-Clustering. Parasaran Raman PhD Candidate School of Computing Meta-Clustering Parasaran Raman PhD Candidate School of Computing What is Clustering? Goal: Group similar items together Unsupervised No labeling effort Popular choice for large-scale exploratory data

More information

Cluster quality assessment by the modified Renyi-ClipX algorithm

Cluster quality assessment by the modified Renyi-ClipX algorithm Issue 3, Volume 4, 2010 51 Cluster quality assessment by the modified Renyi-ClipX algorithm Dalia Baziuk, Aleksas Narščius Abstract This paper presents the modified Renyi-CLIPx clustering algorithm and

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software August 2010, Volume 36, Issue 9. http://www.jstatsoft.org/ LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles Natthakan Iam-on Aberystwyth University Simon

More information

Multiple Classifier Fusion using k-nearest Localized Templates

Multiple Classifier Fusion using k-nearest Localized Templates Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,

More information

LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL SEARCH ALGORITHM

LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL SEARCH ALGORITHM International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 4, April 2013 pp. 1593 1601 LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values Patrick G. Clark Department of Electrical Eng. and Computer Sci. University of Kansas Lawrence,

More information

Weighted Clustering Ensembles

Weighted Clustering Ensembles Weighted Clustering Ensembles Muna Al-Razgan Carlotta Domeniconi Abstract Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can

More information

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

A Model of Machine Learning Based on User Preference of Attributes

A Model of Machine Learning Based on User Preference of Attributes 1 A Model of Machine Learning Based on User Preference of Attributes Yiyu Yao 1, Yan Zhao 1, Jue Wang 2 and Suqing Han 2 1 Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Weighted-Object Ensemble Clustering

Weighted-Object Ensemble Clustering 213 IEEE 13th International Conference on Data Mining Weighted-Object Ensemble Clustering Yazhou Ren School of Computer Science and Engineering South China University of Technology Guangzhou, 516, China

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Consensus Clusterings

Consensus Clusterings Consensus Clusterings Nam Nguyen, Rich Caruana Department of Computer Science, Cornell University Ithaca, New York 14853 {nhnguyen,caruana}@cs.cornell.edu Abstract In this paper we address the problem

More information

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering Nghiem Van Tinh 1, Vu Viet Vu 1, Tran Thi Ngoc Linh 1 1 Thai Nguyen University of

More information

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Hanan Ayad and Mohamed Kamel Pattern Analysis and Machine Intelligence Lab, Systems Design Engineering, University of Waterloo,

More information

Relative Constraints as Features

Relative Constraints as Features Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer

More information

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier Rough Set Approach to Unsupervised Neural based Pattern Classifier Ashwin Kothari, Member IAENG, Avinash Keskar, Shreesha Srinath, and Rakesh Chalsani Abstract Early Convergence, input feature space with

More information

Unsupervised Feature Selection for Sparse Data

Unsupervised Feature Selection for Sparse Data Unsupervised Feature Selection for Sparse Data Artur Ferreira 1,3 Mário Figueiredo 2,3 1- Instituto Superior de Engenharia de Lisboa, Lisboa, PORTUGAL 2- Instituto Superior Técnico, Lisboa, PORTUGAL 3-

More information

Measurement of Similarity Using Cluster Ensemble Approach for Categorical Data

Measurement of Similarity Using Cluster Ensemble Approach for Categorical Data American Journal Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-5, Issue-12, pp-282-298 www.ajer.org Research Paper Open Access Measurement Similarity Using Approach for Categorical

More information

Lorentzian Distance Classifier for Multiple Features

Lorentzian Distance Classifier for Multiple Features Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey

More information

Reconstruction-based Classification Rule Hiding through Controlled Data Modification

Reconstruction-based Classification Rule Hiding through Controlled Data Modification Reconstruction-based Classification Rule Hiding through Controlled Data Modification Aliki Katsarou, Aris Gkoulalas-Divanis, and Vassilios S. Verykios Abstract In this paper, we propose a reconstruction

More information

Active Sampling for Constrained Clustering

Active Sampling for Constrained Clustering Paper: Active Sampling for Constrained Clustering Masayuki Okabe and Seiji Yamada Information and Media Center, Toyohashi University of Technology 1-1 Tempaku, Toyohashi, Aichi 441-8580, Japan E-mail:

More information

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images Sensors & Transducers 04 by IFSA Publishing, S. L. http://www.sensorsportal.com Cluster Validity ification Approaches Based on Geometric Probability and Application in the ification of Remotely Sensed

More information

Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier

Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier , pp.34-38 http://dx.doi.org/10.14257/astl.2015.117.08 Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier Dong-Min Woo 1 and Viet Dung Do 1 1 Department

More information

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm Ann. Data. Sci. (2015) 2(3):293 300 DOI 10.1007/s40745-015-0060-x Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm Li-min Du 1,2 Yang Xu 1 Hua Zhu 1 Received: 30 November

More information

Agglomerative clustering on vertically partitioned data

Agglomerative clustering on vertically partitioned data Agglomerative clustering on vertically partitioned data R.Senkamalavalli Research Scholar, Department of Computer Science and Engg., SCSVMV University, Enathur, Kanchipuram 631 561 sengu_cool@yahoo.com

More information

An ICA-Based Multivariate Discretization Algorithm

An ICA-Based Multivariate Discretization Algorithm An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Color-Based Classification of Natural Rock Images Using Classifier Combinations

Color-Based Classification of Natural Rock Images Using Classifier Combinations Color-Based Classification of Natural Rock Images Using Classifier Combinations Leena Lepistö, Iivari Kunttu, and Ari Visa Tampere University of Technology, Institute of Signal Processing, P.O. Box 553,

More information

THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION

THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION THE AREA UNDER THE ROC CURVE AS A CRITERION FOR CLUSTERING EVALUATION Helena Aidos, Robert P.W. Duin and Ana Fred Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal Pattern Recognition

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

Feature Selection Algorithm with Discretization and PSO Search Methods for Continuous Attributes

Feature Selection Algorithm with Discretization and PSO Search Methods for Continuous Attributes Feature Selection Algorithm with Discretization and PSO Search Methods for Continuous Attributes Madhu.G 1, Rajinikanth.T.V 2, Govardhan.A 3 1 Dept of Information Technology, VNRVJIET, Hyderabad-90, INDIA,

More information

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE sbsridevi89@gmail.com 287 ABSTRACT Fingerprint identification is the most prominent method of biometric

More information

An Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters

An Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters An Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters Akhtar Sabzi Department of Information Technology Qom University, Qom, Iran asabzii@gmail.com Yaghoub Farjami Department

More information

Combining Multiple Clustering Systems

Combining Multiple Clustering Systems Combining Multiple Clustering Systems Constantinos Boulis and Mari Ostendorf Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA boulis,mo@ee.washington.edu Abstract.

More information

arxiv: v1 [cs.ai] 25 Sep 2012

arxiv: v1 [cs.ai] 25 Sep 2012 Feature selection with test cost constraint Fan Min a,, Qinghua Hu b, William Zhu a a Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China b Tianjin University, Tianjin 300072,

More information

Automatic Group-Outlier Detection

Automatic Group-Outlier Detection Automatic Group-Outlier Detection Amine Chaibi and Mustapha Lebbah and Hanane Azzag LIPN-UMR 7030 Université Paris 13 - CNRS 99, av. J-B Clément - F-93430 Villetaneuse {firstname.secondname}@lipn.univ-paris13.fr

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

CLASSIFICATION FOR SCALING METHODS IN DATA MINING CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department

More information