Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering
|
|
- Lorraine Hutchinson
- 5 years ago
- Views:
Transcription
1 Journal of Computational Information Systems 10: 12 (2014) Available at Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering Ye TIAN 1, Peng YANG 2, 1 Key Laboratory of Photonic and Electronic Bandgap Materials, Ministry of Education, School of Physics and Electronic Engineering, Harbin Normal University, Harbin , China 2 College of Information and Communication Engineering, Harbin Engineering University, Harbin , China Abstract Cluster ensemble has been shown to be an effective thought of improving the accuracy and stability of single clustering algorithms. It consists of generating a set of partition results from a same data set and combining them into a final one. In this paper, we develop a novel cluster ensemble method named Cluster Ensemble algorithm using the Binary k-means and Spectral Clustering (CEBKSC). By using the binary k-means algorithm and the spectral clustering method, the proposed method requires low computational complexity and is therefore very suitable for large text data sets. It works by firstly using the binary k-means algorithm to create a set of partition results and then integrating these results by using the spectral clustering. In addition, we introduce a matrix transformation technique to lower the computational cost of the spectral clustering. Experiments show that the proposed method has better clustering quality and is faster than several other cluster ensemble methods. Keywords: Cluster Ensemble; Binary k-means; Spectral Clustering; Matrix Transformation 1 Introduction Clustering analysis, which belongs to the non-supervision pattern recognition problem, can be viewed as a process of clustering the unlabeled data objects into k (we denote k as the number of desired classes) groups with several clustering criteria such that the intracluster dissimilarity is minimized while the intercluster dissimilarity is maximized [1]. It is an essential technique in the research areas which involve analyzing multivariate data such as pattern classification, data mining, taxonomy, text retrieval and image segmentation [2]. Over the past half centuries, a large variety of clustering algorithms has been proposed. Traditional clustering algorithms such as k-means and its variants impose a convex spherical sample space on the data sets. When the sample space is not convex, these algorithms tend to obtain local optimum. Project supported by the Science and Technology Research Projects of Heilongjiang Education Department (No ). Corresponding author. address: yangpeng @163.com (Peng YANG) / Copyright 2014 Binary Information Press DOI: /jcis10617 June 15, 2014
2 5148 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Recently, many studies show that the cluster ensemble methods can provide consistent, robust, novel, and stable solutions [3, 4]. In the thoughts of cluster ensemble, the design of the consensus function plays a significant role where a new partition, which belongs to the integration of all the clustering results obtained in the generation step, is computed. And the function will directly affect the clustering quality of the cluster ensemble. We will use the spectral clustering to combine all the partition results obtained in the generation step in this paper. Spectral clustering algorithm [5-7] which exploits the pairwise similarities of data objects has been shown to be more effective than traditional clustering algorithm in finding clusters. Because of the advantage, spectral clustering algorithm is now widely used in several areas such as computer vision and information retrieval [8-10]. However, when the number of data objects (denoted by n) is large, spectral clustering algorithm will encounter a quadratic resource bottleneck in computing the pairwise similarities among n data objects [11]. Furthermore, it is sensitive to the scaling parameter when constructing the similarity matrix. In order to lower the computational complexity of the eigen value decomposition (EVD) of the similarity matrix of the spectral clustering algorithm, we adopt a matrix transformation technique, which transforms equivalently the EVD of the graph Laplacian matrix to that of a matrix with much smaller size, and use a cosine function, which does not require any scaling parameters, instead of some other similarity measures such as Gaussian kernel to compute the pairwise similarities of data objects in this paper. The rest of the paper is organized as follows: Section 2 surveys the contributions upon which this paper builds. Section 3 is devoted to giving the detailed steps of Cluster Ensemble algorithm using the Binary k-means and Spectral Clustering (CEBKSC). Section 4 presents the main results. Section 5 gives some conclusions and the looking. 2 Related Works 2.1 Cluster ensemble Given a set of data objects, the cluster ensemble method consists of two principal steps: Generation, which is about the generation of a set of partition results of these objects. Ensemble (Integration or Combination), which is a process of combining these results into a final one Generation Generation is the first step of clustering ensemble method where a set of partition results is generated. In general, there are no constraints about how these results should be generated. Therefore, different clustering algorithms or the same algorithm with different initialization parameters can be used to generate these results in this step. However, it is advisable to use the clustering algorithms with linear computational complexity to generate the partition results. Therefore, the k-means method using the binary thought is applied in this paper.
3 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Ensemble Ensemble is very important in any clustering ensemble algorithm. In fact, the great challenge in clustering ensemble is just the design of an appropriate ensemble method. In this step, the final consensus partition, which is the result of any clustering ensemble algorithm, is computed. However, the consensus among a set of clustering results is not obtained in the same way in all cases. There are two main ensemble approaches: Points co-association and Median partition. The basic thought of the first approach is to avoid the correspondence problem by using a coincidence matrix between all pairs of data objects. The matrixes of the clustering results are then used to construct a new matrix (Co-association matrix) and a final result is obtained by performing some agglomerative clustering algorithms such as single-link and complete-link [4], or by using a graph partitioning algorithm, METIS [12], shown in Cluster-based Similarity Partitioning Algorithm (CSPA) which was proposed in literature [3]. In the second one, the consensus partition is obtained by selecting an optimization problem that finds the median partition of the cluster ensemble. The median partition is a partition maximizing the similarity of all partitions in the cluster ensemble. 2.2 Ensemble When the k-means method is applied to data with the number of the classes k=2, it is fast and stable. Therefore, it is easy to image that we can also get stable partition results when clustering data set with classes greater than 2 using the k-means method, if we adopt the following binary thought. Fig. 1: The diagram of binary thought The binary thought can be described as follows. Data objects will be firstly partitioned into two clusters and then each cluster will be partitioned into two, and repeat. Fig. 1 depicts the diagram of the binary thought. Algorithm 1 shows the k-means method using the binary thought. Algorithm 1 Binary k-means algorithm (BKA) Input: Data objects {x 1, x 2,..., x n }, number of desired classes k. Step 1 Compute iteration times R = int(log 2 k) + 1. Step 2 for r = 1 to R do Compute and renew the number of clusters M = 2 r 1.
4 5150 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Compute and renew the size of clusters n d = n/2 r 1. for m = 1 to M do Call the kmeans(n d, 2) to partition these clusters. end for end for Step 3 Compute the number of leaves M = 2 R. Step 4 Merge M leaves into k. Output: The cluster membership for each data object. 3 Cluster Ensemble using Binary k-means and Spectral Clustering Given a text data set X = {x 1, x 2,..., x n }, let P = {p 1, p 2,..., p r } represent a set of partition results of X. And we generate a hypergraph (denoted by H = {h 1, h 2,..., h r }) of P with n vertices and t = rk (t << n) hyperedges by using the thought of generating a hypergraph proposed in literature [3]. Because the computational complexity of the EVD of the similarity matrix S is proportional to O(n 3 ), we adopt a matrix transformation technique to lower it. The procedure is as follows. As the eigensystem of similarity matrix S can take the form: If we substitute S = HH T, Eq. (1) can take a different form: Sx = λx. (1) HH T x = λx. (2) Without loss of generality, suppose X R n m (n m) and c = rank(x). Compute the singular value decomposition (SVD) of X, X = UΣV T with U T U = V T V = I m, Σ = diag(σ 1, σ 2,..., σ m ), and I is a unit matrix. Ensure that the eigenvalues in Σ are in descending order. As the EVD of the matrix XX T is XX T = UΣ 2 U T and the EVD of the matrix X T X is X T X = V Σ 2 V T, the left singular vectors U can be obtained by computing the EVD of the X T X. Thus, the main computational complexity of computing the left singular vectors U is only O(m 3 ). Theorem 3.1 Assume X R n m (n m) and c = rank(x). If there exists a matrix V = {v 1, v 2,..., v c } which consists of the linearly independent eigenvectors of X T X such that V T (X T X) V = diag(σ 2 1, σ 2 2,..., σ 2 c, 0,..., 0), σ i is the i-th nonzero singular value of X corresponding to the right singular vector v i and the left singular vector u i, the relationship between the two singular vectors can then take the form [13]: u i = Xv i /σ i. (3)
5 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Theorem 3.2 Assume X = UΣV T R n m (n m), k < c = rank(x). If we let X k = k i=1 u iσ i vi T = U k Σ k Vk T represent the best rank-k approximation to X with U k = [u 1, u 2,..., u k ], V k = [v 1, v 2,..., v k ] and Σ k = [σ 1, σ 2,..., σ k ], where the eigenvalues in Σ are in descending order, we can have the following equation [13]. min X Y rank(y )=k 2 F = X X k 2 F = σ2 k+1 + σk σc 2. (4) Theorem 3.2, which is the theoretical basis for the concepts such as image enhancement and data reduction, illustrates that we can use the first k columns of eigenvectors U to perform clustering. Algorithm 2 shows the above processes. Algorithm 2 Cluster Ensemble algorithm using the Binary k-means and Spectral Clustering (CEBKSC). Input: n m text-term coincidence matrix X, number of desired classes k. Step 1 Call the BKA to cluster the n texts into k groups. Run BKA r times to generate the partition results P. Step 2 Construct the hypergraph H, and compute the similarity matrix S, S = HH T. Step 3 Compute the first k eigenvectors v 1, v 2,..., v k of the matrix H T H. Step 4 Compute the eigenvector u i, u i = Hv i /σ i. Step 5 Let Z R n k be the matrix consisting of the vectors {u 1, u 2,..., u k }. Step 6 Use k-means algorithm to cluster n rows of Z into k groups. Output: Cluster membership for each text object. 4 Experiment and Results Analysis We design an experiment to investigate the performance of our proposed algorithm. compare five different clustering algorithms, including: And we 1) Cluster-based Similarity Partitioning Algorithm, CSPA. It uses METIS to obtain the consensus partition of a similarity matrix (co-association matrix). 2) HyperGraph Partitioning Algorithm, HGPA. In this algorithm, HMETIS is applied to obtain the partition of a hypergraph. 3) Meta-CLustering Algorithm, MCLA. In this algorithm, METIS is used to partition a similarity matrix between clusters. 4) Hybrid Bipartite Graph Formulation, HBGF. We apply the spectral clustering to partition the bipartite graph. 5) Cluster Ensemble based on the Binary k-means and Spectral Clustering, CEBKSC. We use both binary k-means and spectral clustering to solve the text cluster ensemble problem. All of the above algorithms involve the MATLAB built-in k-means function whose number of replications is 10 and maximum number of iterations is 100.
6 5152 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) The description of the experimental data sets Our experiment uses six data sets, including: 1) tr31 and tr41. They are derived from TREC-6, and TREC-7 collections. The real categories of the two data sets correspond to the queries of the particular categories. 2) re0 and re1. They are selected from Reuters text categorization test collection Distribution 1.0. We divide the labels into two subsets. And for each subset, we select the text with a single label. 3) reviews and hitech. They are derived from the San Jose Mercury newspaper articles distributed as part of the TREC collection (TIPSTER Vol. 3). reviews contains texts about food, movies, music, radio, and restaurants. hitech contains texts about computers, electronics, health, medical, research, and technology. And no two texts in these texts will share the same DESCRIPT tag which may contain multiple categories. Table 1: The description of experimental data sets Data sets Instances Features Classes tr tr re re reviews hitech The verification of the effectiveness of our method We measure the quality via the Normalized Mutual Information (NMI) which uses information theoretic measure to quantify the match between the category label and the cluster label, and the Average Normalized Mutual Information (ANMI) which measures the average normalized mutual information between a set of r labels and the final label. Note: The highest scores in Tables 2 and 3, and the shortest times in Table 4 are bold marked. Table 2: NMI comparisons of five cluster ensemble methods Data sets CSPA HGPA MCLA HBGF CEBKSC tr tr re re reviews hitech
7 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Table 3: ANMI comparisons of five cluster ensemble methods Data sets CSPA HGPA MCLA HBGF CEBKSC tr tr re re reviews hitech Tables 2, 3 present the comparison results. Each result is an average over 10 runs. The results show that: 1) The clustering quality of CSPA is better than that of the HGPA and MCLA in all data sets. For CSPA, it calls the efficient graph partition method METIS and is stable. 2) In all of the experimental data sets, the clustering quality of the two spectral cluster ensemble methods, HBGF and CEBKSC, is better than that of the three graph and hypergraph based methods, CSPA, HGPA and MCLA. 3) CEBKSC slightly outperforms HBGF. For CEBKSC, it can obtain higher NMI values than HBGF in all of the data sets, and can obtain the highest ANMI values in the remaining data sets except for the re0. Table 4: Runtime comparisons of five cluster ensemble methods at the ensemble step Data sets CSPA HGPA MCLA HBGF CEBKSC tr tr re re reviews hitech From Table 4, we can make the following observations: 1) CSPA is the slowest method in all of the cluster ensemble methods followed by the HBGF. For CSPA, it has a computational and storage complexity of O(mkn 2 ), which is quadratic of the number of texts. And HBGF calls a time-consuming spectral clustering to partition a bipartite graph. 2) MCLA is slightly slower than CEBKSC and HGPA. For MCLA, it has a computational complexity of O(m 2 k 2 n). 3) CEBKSC and HGPA are the fastest methods. For CEBKSC, it requires a significantly reduced computational complexity for applying a matrix transformation technique. And the computational complexity of HGPA is only O(mkn). 5 Conclusions and Looking In this paper, we develop a cluster ensemble method using the binary k-means and spectral clustering. The proposed algorithm takes the advantages of the binary k-means method and the spectral clustering method, whereas the shortcomings are avoided. On one hand, the usage of the
8 5154 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) binary k-means method permits the formation of partitions that are different from each other. On the other hand, the application of the spectral clustering method to the partition results rather than directly to the texts, yields superior clustering performance. Moreover, a matrix transformation technique is adopted to address the computational and memory problems of the spectral clustering. In the future, techniques to avoid the bottleneck of our method including the acceleration of binary k-means method will be researched. And, we will investigate the probability of SAR image segmentation using our method. References [1] Bai Xue, Luo Si-wei, Yin Hui, Ni Wei-yuan, Multi-Feature Similarity Measures Under Information- Based Clustering Framework for Image Segmentation, Journal of Computational Information Systems, 2012, 8 (15): [2] Vega-Pons S, Ruiz-Shulcloper J, A Survey of Clustering Ensemble Algorithms, International Journal of Pattern Recognition and Artificial Intelligence, 2011, 25 (3): [3] Strehl A, Ghosh J, Cluster Ensembles-A Knowledge Reuse Framework for Combining Partitionings, In Proc. Conference on Artificial Intelligence (AAAI 2002), Edmonton, AAAI/MIT Press, 2002, [4] Fred A L, and Jain A K, Combining Multiple Clusterings using Evidence Accumulation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (6): [5] Meila M, Shi J, Learning Segmentation by Random Walks, Proc. Conf. Neural Information Processing Systems, 2000, [6] Shi J, Malik J, Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 22 (8): [7] Fowlkes C, Belongie S, Chung F, Malik J, Spectral Grouping using the Nyström Method, IEEE Trans. Pattern Analysis and Machine Intelligence, 2004, 26 (2): [8] Dhillon I S. Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001, [9] Xu Wei, Liu Xin, Gong Yi-hong, Document Clustering Based on Non-negative Matrix Factorization, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003, [10] Yu S X, Shi Jian-bo, Multiclass spectral clustering, Computer Vision, Proceedings. Ninth IEEE International Conference on. IEEE, 2003, [11] Liu Rong, Zhang Hao, Segmentation of 3D Meshes Through Spectral Clustering, Computer Graphics and Applications, PG Proceedings. 12th Pacific Conference on. IEEE, 2004, [12] Karypis G, Kumar V, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on scientific Computing, 1998, 20 (1): [13] Berry M W, Large-Scale Sparse Singular Value Computations, International Journal of Supercomputer Applications, 1992, 6 (1):
Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI
Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:
More informationK-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS
K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS B Kalai Selvi PG Scholar, Department of CSE, Adhiyamaan College of Engineering, Hosur, Tamil Nadu, (India) ABSTRACT Data mining is the
More informationConsensus clustering by graph based approach
Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr
More informationA Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis
A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract
More informationA Comparison of Resampling Methods for Clustering Ensembles
A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationA Graph Based Approach for Clustering Ensemble of Fuzzy Partitions
Journal of mathematics and computer Science 6 (2013) 154-165 A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions Mohammad Ahmadzadeh Mazandaran University of Science and Technology m.ahmadzadeh@ustmb.ac.ir
More informationHierarchical Multi level Approach to graph clustering
Hierarchical Multi level Approach to graph clustering by: Neda Shahidi neda@cs.utexas.edu Cesar mantilla, cesar.mantilla@mail.utexas.edu Advisor: Dr. Inderjit Dhillon Introduction Data sets can be presented
More informationClustering ensemble method
https://doi.org/10.1007/s13042-017-0756-7 ORIGINAL ARTICLE Clustering ensemble method Tahani Alqurashi 1 Wenjia Wang 1 Received: 28 September 2015 / Accepted: 20 October 2017 The Author(s) 2018 Abstract
More informationDimensionality Reduction using Relative Attributes
Dimensionality Reduction using Relative Attributes Mohammadreza Babaee 1, Stefanos Tsoukalas 1, Maryam Babaee Gerhard Rigoll 1, and Mihai Datcu 1 Institute for Human-Machine Communication, Technische Universität
More informationNormalized Graph cuts. by Gopalkrishna Veni School of Computing University of Utah
Normalized Graph cuts by Gopalkrishna Veni School of Computing University of Utah Image segmentation Image segmentation is a grouping technique used for image. It is a way of dividing an image into different
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationEnsemble Combination for Solving the Parameter Selection Problem in Image Segmentation
Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation Pakaket Wattuya and Xiaoyi Jiang Department of Mathematics and Computer Science University of Münster, Germany {wattuya,xjiang}@math.uni-muenster.de
More informationRough Set based Cluster Ensemble Selection
Rough Set based Cluster Ensemble Selection Xueen Wang, Deqiang Han, Chongzhao Han Ministry of Education Key Lab for Intelligent Networks and Network Security (MOE KLINNS Lab), Institute of Integrated Automation,
More informationWeighted-Object Ensemble Clustering
213 IEEE 13th International Conference on Data Mining Weighted-Object Ensemble Clustering Yazhou Ren School of Computer Science and Engineering South China University of Technology Guangzhou, 516, China
More informationA Local Learning Approach for Clustering
A Local Learning Approach for Clustering Mingrui Wu, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics 72076 Tübingen, Germany {mingrui.wu, bernhard.schoelkopf}@tuebingen.mpg.de Abstract
More informationLRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier
LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072
More informationNormalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage
Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage D. Peluffo-Ordoñez 1, A. E. Castro-Ospina 1, D. Chavez-Chamorro 1, C. D. Acosta-Medina 1, and G. Castellanos-Dominguez 1 1- Signal
More informationEfficient Semi-supervised Spectral Co-clustering with Constraints
2010 IEEE International Conference on Data Mining Efficient Semi-supervised Spectral Co-clustering with Constraints Xiaoxiao Shi, Wei Fan, Philip S. Yu Department of Computer Science, University of Illinois
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationFace Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN
2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine
More informationExplore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan
Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents
More informationConsensus Clusterings
Consensus Clusterings Nam Nguyen, Rich Caruana Department of Computer Science, Cornell University Ithaca, New York 14853 {nhnguyen,caruana}@cs.cornell.edu Abstract In this paper we address the problem
More informationScalable Clustering of Signed Networks Using Balance Normalized Cut
Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.
More informationAPPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić
FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 31, No 2 (2016), 569 578 APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD Aleksandar Trokicić Abstract. Constrained clustering algorithms as an input
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationRefined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings
Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Hanan Ayad and Mohamed Kamel Pattern Analysis and Machine Intelligence Lab, Systems Design Engineering, University of Waterloo,
More informationSemi-supervised Data Representation via Affinity Graph Learning
1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073
More informationAn efficient face recognition algorithm based on multi-kernel regularization learning
Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel
More informationOPTIMAL DYNAMIC LOAD BALANCE IN DISTRIBUTED SYSTEMS FOR CLIENT SERVER ASSIGNMENT
OPTIMAL DYNAMIC LOAD BALANCE IN DISTRIBUTED SYSTEMS FOR CLIENT SERVER ASSIGNMENT D.SARITHA Department of CS&SE, Andhra University Visakhapatnam, Andhra Pradesh Ch. SATYANANDA REDDY Professor, Department
More informationBipartite Graph Partitioning and Content-based Image Clustering
Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the
More informationAN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH
AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationDocument Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure
Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com
More informationHierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery, 10, 141 168, 2005 c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. Hierarchical Clustering Algorithms for Document Datasets YING ZHAO
More informationAdvances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors)
Advances in Fuzzy Clustering and Its Applications J. Valente de Oliveira and W. Pedrycz (Editors) Contents Preface 3 1 Soft Cluster Ensembles 1 1.1 Introduction................................... 1 1.1.1
More informationClustering with Multiple Graphs
Clustering with Multiple Graphs Wei Tang Department of Computer Sciences The University of Texas at Austin Austin, U.S.A wtang@cs.utexas.edu Zhengdong Lu Inst. for Computational Engineering & Sciences
More informationStudy and Implementation of CHAMELEON algorithm for Gene Clustering
[1] Study and Implementation of CHAMELEON algorithm for Gene Clustering 1. Motivation Saurav Sahay The vast amount of gathered genomic data from Microarray and other experiments makes it extremely difficult
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationAn ICA-Based Multivariate Discretization Algorithm
An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of
More informationFace recognition based on improved BP neural network
Face recognition based on improved BP neural network Gaili Yue, Lei Lu a, College of Electrical and Control Engineering, Xi an University of Science and Technology, Xi an 710043, China Abstract. In order
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More informationA Graph Clustering Algorithm Based on Minimum and Normalized Cut
A Graph Clustering Algorithm Based on Minimum and Normalized Cut Jiabing Wang 1, Hong Peng 1, Jingsong Hu 1, and Chuangxin Yang 1, 1 School of Computer Science and Engineering, South China University of
More informationDOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.
DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view
More informationIntroduction to spectral clustering
Introduction to spectral clustering Denis Hamad LASL ULCO Denis.Hamad@lasl.univ-littoral.fr Philippe Biela HEI LAGIS Philippe.Biela@hei.fr Data Clustering Data clustering Data clustering is an important
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationDATA clustering is an unsupervised learning technique
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS 1 Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities Dong Huang, Member, IEEE, Chang-Dong Wang, Member, IEEE, Hongxing
More informationOnline Cross-Modal Hashing for Web Image Retrieval
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-6) Online Cross-Modal Hashing for Web Image Retrieval Liang ie Department of Mathematics Wuhan University of Technology, China
More informationGlobal Fuzzy C-Means with Kernels
Global Fuzzy C-Means with Kernels Gyeongyong Heo Hun Choi Jihong Kim Department of Electronic Engineering Dong-eui University Busan, Korea Abstract Fuzzy c-means (FCM) is a simple but powerful clustering
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationSTUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES
25-29 JATIT. All rights reserved. STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES DR.S.V.KASMIR RAJA, 2 A.SHAIK ABDUL KHADIR, 3 DR.S.S.RIAZ AHAMED. Dean (Research),
More informationNon-negative Matrix Factorization for Multimodal Image Retrieval
Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia
More informationInvestigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text
Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text Amir Hossein Jadidinejad Mitra Mohtarami Hadi Amiri Computer Engineering Department, Islamic Azad University,
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationBipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationAn Improvement of Centroid-Based Classification Algorithm for Text Classification
An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,
More informationAccumulation. Instituto Superior Técnico / Instituto de Telecomunicações. Av. Rovisco Pais, Lisboa, Portugal.
Combining Multiple Clusterings Using Evidence Accumulation Ana L.N. Fred and Anil K. Jain + Instituto Superior Técnico / Instituto de Telecomunicações Av. Rovisco Pais, 149-1 Lisboa, Portugal email: afred@lx.it.pt
More informationHIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH
HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH Yi Yang, Haitao Li, Yanshun Han, Haiyan Gu Key Laboratory of Geo-informatics of State Bureau of
More informationClustering via Random Walk Hitting Time on Directed Graphs
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (8) Clustering via Random Walk Hitting Time on Directed Graphs Mo Chen Jianzhuang Liu Xiaoou Tang, Dept. of Information Engineering
More informationIJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:
Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,
More informationNikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas. Aristotle University of Thessaloniki
KERNEL MATRIX TRIMMING FOR IMPROVED KERNEL K-MEANS CLUSTERING Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas Aristotle University of Thessaloniki ABSTRACT The Kernel k-means
More informationClustering of Data with Mixed Attributes based on Unified Similarity Metric
Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationAn Approach for Reduction of Rain Streaks from a Single Image
An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute
More informationActive Sampling for Constrained Clustering
Paper: Active Sampling for Constrained Clustering Masayuki Okabe and Seiji Yamada Information and Media Center, Toyohashi University of Technology 1-1 Tempaku, Toyohashi, Aichi 441-8580, Japan E-mail:
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan
More informationApproximate Nearest Centroid Embedding for Kernel k-means
Approximate Nearest Centroid Embedding for Kernel k-means Ahmed Elgohary, Ahmed K. Farahat, Mohamed S. Kamel, and Fakhri Karray University of Waterloo, Waterloo, Canada N2L 3G1 {aelgohary,afarahat,mkamel,karray}@uwaterloo.ca
More informationA REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH
A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH Sandhya V. Kawale Prof. Dr. S. M. Kamalapur M.E. Student Associate Professor Deparment of Computer Engineering, Deparment of Computer Engineering, K. K. Wagh
More informationA Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data
Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data
More informationRobust Lossless Image Watermarking in Integer Wavelet Domain using SVD
Robust Lossless Image Watermarking in Integer Domain using SVD 1 A. Kala 1 PG scholar, Department of CSE, Sri Venkateswara College of Engineering, Chennai 1 akala@svce.ac.in 2 K. haiyalnayaki 2 Associate
More informationPrincipal Coordinate Clustering
Principal Coordinate Clustering Ali Sekmen, Akram Aldroubi, Ahmet Bugra Koku, Keaton Hamm Department of Computer Science, Tennessee State University Department of Mathematics, Vanderbilt University Department
More informationMinoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University
Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University
More informationClustering Ensembles Based on Normalized Edges
Clustering Ensembles Based on Normalized Edges Yan Li 1,JianYu 2, Pengwei Hao 1,3, and Zhulin Li 1 1 Center for Information Science, Peking University, Beijing, 100871, China {yanli, lizhulin}@cis.pku.edu.cn
More informationClustering Documents in Large Text Corpora
Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationAlgebraic Techniques for Analysis of Large Discrete-Valued Datasets
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets Mehmet Koyutürk 1,AnanthGrama 1, and Naren Ramakrishnan 2 1 Dept. of Computer Sciences, Purdue University W. Lafayette, IN, 47907, USA
More informationRobust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma
Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected
More informationImproving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationGraph and Hypergraph Partitioning for Parallel Computing
Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:
More informationCLASSIFICATION FOR SCALING METHODS IN DATA MINING
CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department
More informationEfficient FM Algorithm for VLSI Circuit Partitioning
Efficient FM Algorithm for VLSI Circuit Partitioning M.RAJESH #1, R.MANIKANDAN #2 #1 School Of Comuting, Sastra University, Thanjavur-613401. #2 Senior Assistant Professer, School Of Comuting, Sastra University,
More informationGeneralized trace ratio optimization and applications
Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO
More informationObservational Learning with Modular Networks
Observational Learning with Modular Networks Hyunjung Shin, Hyoungjoo Lee and Sungzoon Cho {hjshin72, impatton, zoon}@snu.ac.kr Department of Industrial Engineering, Seoul National University, San56-1,
More informationCOMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR DATA CLUSTERING
Author manuscript, published in "IEEE International Workshop on Machine Learning for Signal Processing, Grenoble : France (29)" COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR
More informationText Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering
Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationAN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING
AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering
More informationGlobally and Locally Consistent Unsupervised Projection
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering
More informationText clustering based on a divide and merge strategy
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More information