Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering

Size: px
Start display at page:

Download "Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering"

Transcription

1 Journal of Computational Information Systems 10: 12 (2014) Available at Cluster Ensemble Algorithm using the Binary k-means and Spectral Clustering Ye TIAN 1, Peng YANG 2, 1 Key Laboratory of Photonic and Electronic Bandgap Materials, Ministry of Education, School of Physics and Electronic Engineering, Harbin Normal University, Harbin , China 2 College of Information and Communication Engineering, Harbin Engineering University, Harbin , China Abstract Cluster ensemble has been shown to be an effective thought of improving the accuracy and stability of single clustering algorithms. It consists of generating a set of partition results from a same data set and combining them into a final one. In this paper, we develop a novel cluster ensemble method named Cluster Ensemble algorithm using the Binary k-means and Spectral Clustering (CEBKSC). By using the binary k-means algorithm and the spectral clustering method, the proposed method requires low computational complexity and is therefore very suitable for large text data sets. It works by firstly using the binary k-means algorithm to create a set of partition results and then integrating these results by using the spectral clustering. In addition, we introduce a matrix transformation technique to lower the computational cost of the spectral clustering. Experiments show that the proposed method has better clustering quality and is faster than several other cluster ensemble methods. Keywords: Cluster Ensemble; Binary k-means; Spectral Clustering; Matrix Transformation 1 Introduction Clustering analysis, which belongs to the non-supervision pattern recognition problem, can be viewed as a process of clustering the unlabeled data objects into k (we denote k as the number of desired classes) groups with several clustering criteria such that the intracluster dissimilarity is minimized while the intercluster dissimilarity is maximized [1]. It is an essential technique in the research areas which involve analyzing multivariate data such as pattern classification, data mining, taxonomy, text retrieval and image segmentation [2]. Over the past half centuries, a large variety of clustering algorithms has been proposed. Traditional clustering algorithms such as k-means and its variants impose a convex spherical sample space on the data sets. When the sample space is not convex, these algorithms tend to obtain local optimum. Project supported by the Science and Technology Research Projects of Heilongjiang Education Department (No ). Corresponding author. address: yangpeng @163.com (Peng YANG) / Copyright 2014 Binary Information Press DOI: /jcis10617 June 15, 2014

2 5148 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Recently, many studies show that the cluster ensemble methods can provide consistent, robust, novel, and stable solutions [3, 4]. In the thoughts of cluster ensemble, the design of the consensus function plays a significant role where a new partition, which belongs to the integration of all the clustering results obtained in the generation step, is computed. And the function will directly affect the clustering quality of the cluster ensemble. We will use the spectral clustering to combine all the partition results obtained in the generation step in this paper. Spectral clustering algorithm [5-7] which exploits the pairwise similarities of data objects has been shown to be more effective than traditional clustering algorithm in finding clusters. Because of the advantage, spectral clustering algorithm is now widely used in several areas such as computer vision and information retrieval [8-10]. However, when the number of data objects (denoted by n) is large, spectral clustering algorithm will encounter a quadratic resource bottleneck in computing the pairwise similarities among n data objects [11]. Furthermore, it is sensitive to the scaling parameter when constructing the similarity matrix. In order to lower the computational complexity of the eigen value decomposition (EVD) of the similarity matrix of the spectral clustering algorithm, we adopt a matrix transformation technique, which transforms equivalently the EVD of the graph Laplacian matrix to that of a matrix with much smaller size, and use a cosine function, which does not require any scaling parameters, instead of some other similarity measures such as Gaussian kernel to compute the pairwise similarities of data objects in this paper. The rest of the paper is organized as follows: Section 2 surveys the contributions upon which this paper builds. Section 3 is devoted to giving the detailed steps of Cluster Ensemble algorithm using the Binary k-means and Spectral Clustering (CEBKSC). Section 4 presents the main results. Section 5 gives some conclusions and the looking. 2 Related Works 2.1 Cluster ensemble Given a set of data objects, the cluster ensemble method consists of two principal steps: Generation, which is about the generation of a set of partition results of these objects. Ensemble (Integration or Combination), which is a process of combining these results into a final one Generation Generation is the first step of clustering ensemble method where a set of partition results is generated. In general, there are no constraints about how these results should be generated. Therefore, different clustering algorithms or the same algorithm with different initialization parameters can be used to generate these results in this step. However, it is advisable to use the clustering algorithms with linear computational complexity to generate the partition results. Therefore, the k-means method using the binary thought is applied in this paper.

3 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Ensemble Ensemble is very important in any clustering ensemble algorithm. In fact, the great challenge in clustering ensemble is just the design of an appropriate ensemble method. In this step, the final consensus partition, which is the result of any clustering ensemble algorithm, is computed. However, the consensus among a set of clustering results is not obtained in the same way in all cases. There are two main ensemble approaches: Points co-association and Median partition. The basic thought of the first approach is to avoid the correspondence problem by using a coincidence matrix between all pairs of data objects. The matrixes of the clustering results are then used to construct a new matrix (Co-association matrix) and a final result is obtained by performing some agglomerative clustering algorithms such as single-link and complete-link [4], or by using a graph partitioning algorithm, METIS [12], shown in Cluster-based Similarity Partitioning Algorithm (CSPA) which was proposed in literature [3]. In the second one, the consensus partition is obtained by selecting an optimization problem that finds the median partition of the cluster ensemble. The median partition is a partition maximizing the similarity of all partitions in the cluster ensemble. 2.2 Ensemble When the k-means method is applied to data with the number of the classes k=2, it is fast and stable. Therefore, it is easy to image that we can also get stable partition results when clustering data set with classes greater than 2 using the k-means method, if we adopt the following binary thought. Fig. 1: The diagram of binary thought The binary thought can be described as follows. Data objects will be firstly partitioned into two clusters and then each cluster will be partitioned into two, and repeat. Fig. 1 depicts the diagram of the binary thought. Algorithm 1 shows the k-means method using the binary thought. Algorithm 1 Binary k-means algorithm (BKA) Input: Data objects {x 1, x 2,..., x n }, number of desired classes k. Step 1 Compute iteration times R = int(log 2 k) + 1. Step 2 for r = 1 to R do Compute and renew the number of clusters M = 2 r 1.

4 5150 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Compute and renew the size of clusters n d = n/2 r 1. for m = 1 to M do Call the kmeans(n d, 2) to partition these clusters. end for end for Step 3 Compute the number of leaves M = 2 R. Step 4 Merge M leaves into k. Output: The cluster membership for each data object. 3 Cluster Ensemble using Binary k-means and Spectral Clustering Given a text data set X = {x 1, x 2,..., x n }, let P = {p 1, p 2,..., p r } represent a set of partition results of X. And we generate a hypergraph (denoted by H = {h 1, h 2,..., h r }) of P with n vertices and t = rk (t << n) hyperedges by using the thought of generating a hypergraph proposed in literature [3]. Because the computational complexity of the EVD of the similarity matrix S is proportional to O(n 3 ), we adopt a matrix transformation technique to lower it. The procedure is as follows. As the eigensystem of similarity matrix S can take the form: If we substitute S = HH T, Eq. (1) can take a different form: Sx = λx. (1) HH T x = λx. (2) Without loss of generality, suppose X R n m (n m) and c = rank(x). Compute the singular value decomposition (SVD) of X, X = UΣV T with U T U = V T V = I m, Σ = diag(σ 1, σ 2,..., σ m ), and I is a unit matrix. Ensure that the eigenvalues in Σ are in descending order. As the EVD of the matrix XX T is XX T = UΣ 2 U T and the EVD of the matrix X T X is X T X = V Σ 2 V T, the left singular vectors U can be obtained by computing the EVD of the X T X. Thus, the main computational complexity of computing the left singular vectors U is only O(m 3 ). Theorem 3.1 Assume X R n m (n m) and c = rank(x). If there exists a matrix V = {v 1, v 2,..., v c } which consists of the linearly independent eigenvectors of X T X such that V T (X T X) V = diag(σ 2 1, σ 2 2,..., σ 2 c, 0,..., 0), σ i is the i-th nonzero singular value of X corresponding to the right singular vector v i and the left singular vector u i, the relationship between the two singular vectors can then take the form [13]: u i = Xv i /σ i. (3)

5 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Theorem 3.2 Assume X = UΣV T R n m (n m), k < c = rank(x). If we let X k = k i=1 u iσ i vi T = U k Σ k Vk T represent the best rank-k approximation to X with U k = [u 1, u 2,..., u k ], V k = [v 1, v 2,..., v k ] and Σ k = [σ 1, σ 2,..., σ k ], where the eigenvalues in Σ are in descending order, we can have the following equation [13]. min X Y rank(y )=k 2 F = X X k 2 F = σ2 k+1 + σk σc 2. (4) Theorem 3.2, which is the theoretical basis for the concepts such as image enhancement and data reduction, illustrates that we can use the first k columns of eigenvectors U to perform clustering. Algorithm 2 shows the above processes. Algorithm 2 Cluster Ensemble algorithm using the Binary k-means and Spectral Clustering (CEBKSC). Input: n m text-term coincidence matrix X, number of desired classes k. Step 1 Call the BKA to cluster the n texts into k groups. Run BKA r times to generate the partition results P. Step 2 Construct the hypergraph H, and compute the similarity matrix S, S = HH T. Step 3 Compute the first k eigenvectors v 1, v 2,..., v k of the matrix H T H. Step 4 Compute the eigenvector u i, u i = Hv i /σ i. Step 5 Let Z R n k be the matrix consisting of the vectors {u 1, u 2,..., u k }. Step 6 Use k-means algorithm to cluster n rows of Z into k groups. Output: Cluster membership for each text object. 4 Experiment and Results Analysis We design an experiment to investigate the performance of our proposed algorithm. compare five different clustering algorithms, including: And we 1) Cluster-based Similarity Partitioning Algorithm, CSPA. It uses METIS to obtain the consensus partition of a similarity matrix (co-association matrix). 2) HyperGraph Partitioning Algorithm, HGPA. In this algorithm, HMETIS is applied to obtain the partition of a hypergraph. 3) Meta-CLustering Algorithm, MCLA. In this algorithm, METIS is used to partition a similarity matrix between clusters. 4) Hybrid Bipartite Graph Formulation, HBGF. We apply the spectral clustering to partition the bipartite graph. 5) Cluster Ensemble based on the Binary k-means and Spectral Clustering, CEBKSC. We use both binary k-means and spectral clustering to solve the text cluster ensemble problem. All of the above algorithms involve the MATLAB built-in k-means function whose number of replications is 10 and maximum number of iterations is 100.

6 5152 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) The description of the experimental data sets Our experiment uses six data sets, including: 1) tr31 and tr41. They are derived from TREC-6, and TREC-7 collections. The real categories of the two data sets correspond to the queries of the particular categories. 2) re0 and re1. They are selected from Reuters text categorization test collection Distribution 1.0. We divide the labels into two subsets. And for each subset, we select the text with a single label. 3) reviews and hitech. They are derived from the San Jose Mercury newspaper articles distributed as part of the TREC collection (TIPSTER Vol. 3). reviews contains texts about food, movies, music, radio, and restaurants. hitech contains texts about computers, electronics, health, medical, research, and technology. And no two texts in these texts will share the same DESCRIPT tag which may contain multiple categories. Table 1: The description of experimental data sets Data sets Instances Features Classes tr tr re re reviews hitech The verification of the effectiveness of our method We measure the quality via the Normalized Mutual Information (NMI) which uses information theoretic measure to quantify the match between the category label and the cluster label, and the Average Normalized Mutual Information (ANMI) which measures the average normalized mutual information between a set of r labels and the final label. Note: The highest scores in Tables 2 and 3, and the shortest times in Table 4 are bold marked. Table 2: NMI comparisons of five cluster ensemble methods Data sets CSPA HGPA MCLA HBGF CEBKSC tr tr re re reviews hitech

7 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) Table 3: ANMI comparisons of five cluster ensemble methods Data sets CSPA HGPA MCLA HBGF CEBKSC tr tr re re reviews hitech Tables 2, 3 present the comparison results. Each result is an average over 10 runs. The results show that: 1) The clustering quality of CSPA is better than that of the HGPA and MCLA in all data sets. For CSPA, it calls the efficient graph partition method METIS and is stable. 2) In all of the experimental data sets, the clustering quality of the two spectral cluster ensemble methods, HBGF and CEBKSC, is better than that of the three graph and hypergraph based methods, CSPA, HGPA and MCLA. 3) CEBKSC slightly outperforms HBGF. For CEBKSC, it can obtain higher NMI values than HBGF in all of the data sets, and can obtain the highest ANMI values in the remaining data sets except for the re0. Table 4: Runtime comparisons of five cluster ensemble methods at the ensemble step Data sets CSPA HGPA MCLA HBGF CEBKSC tr tr re re reviews hitech From Table 4, we can make the following observations: 1) CSPA is the slowest method in all of the cluster ensemble methods followed by the HBGF. For CSPA, it has a computational and storage complexity of O(mkn 2 ), which is quadratic of the number of texts. And HBGF calls a time-consuming spectral clustering to partition a bipartite graph. 2) MCLA is slightly slower than CEBKSC and HGPA. For MCLA, it has a computational complexity of O(m 2 k 2 n). 3) CEBKSC and HGPA are the fastest methods. For CEBKSC, it requires a significantly reduced computational complexity for applying a matrix transformation technique. And the computational complexity of HGPA is only O(mkn). 5 Conclusions and Looking In this paper, we develop a cluster ensemble method using the binary k-means and spectral clustering. The proposed algorithm takes the advantages of the binary k-means method and the spectral clustering method, whereas the shortcomings are avoided. On one hand, the usage of the

8 5154 Y. Tian et al. /Journal of Computational Information Systems 10: 12 (2014) binary k-means method permits the formation of partitions that are different from each other. On the other hand, the application of the spectral clustering method to the partition results rather than directly to the texts, yields superior clustering performance. Moreover, a matrix transformation technique is adopted to address the computational and memory problems of the spectral clustering. In the future, techniques to avoid the bottleneck of our method including the acceleration of binary k-means method will be researched. And, we will investigate the probability of SAR image segmentation using our method. References [1] Bai Xue, Luo Si-wei, Yin Hui, Ni Wei-yuan, Multi-Feature Similarity Measures Under Information- Based Clustering Framework for Image Segmentation, Journal of Computational Information Systems, 2012, 8 (15): [2] Vega-Pons S, Ruiz-Shulcloper J, A Survey of Clustering Ensemble Algorithms, International Journal of Pattern Recognition and Artificial Intelligence, 2011, 25 (3): [3] Strehl A, Ghosh J, Cluster Ensembles-A Knowledge Reuse Framework for Combining Partitionings, In Proc. Conference on Artificial Intelligence (AAAI 2002), Edmonton, AAAI/MIT Press, 2002, [4] Fred A L, and Jain A K, Combining Multiple Clusterings using Evidence Accumulation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (6): [5] Meila M, Shi J, Learning Segmentation by Random Walks, Proc. Conf. Neural Information Processing Systems, 2000, [6] Shi J, Malik J, Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 22 (8): [7] Fowlkes C, Belongie S, Chung F, Malik J, Spectral Grouping using the Nyström Method, IEEE Trans. Pattern Analysis and Machine Intelligence, 2004, 26 (2): [8] Dhillon I S. Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001, [9] Xu Wei, Liu Xin, Gong Yi-hong, Document Clustering Based on Non-negative Matrix Factorization, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003, [10] Yu S X, Shi Jian-bo, Multiclass spectral clustering, Computer Vision, Proceedings. Ninth IEEE International Conference on. IEEE, 2003, [11] Liu Rong, Zhang Hao, Segmentation of 3D Meshes Through Spectral Clustering, Computer Graphics and Applications, PG Proceedings. 12th Pacific Conference on. IEEE, 2004, [12] Karypis G, Kumar V, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on scientific Computing, 1998, 20 (1): [13] Berry M W, Large-Scale Sparse Singular Value Computations, International Journal of Supercomputer Applications, 1992, 6 (1):

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI

Consensus Clustering. Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering Javier Béjar URL - Spring 2019 CS - MAI Consensus Clustering The ensemble of classifiers is a well established strategy in supervised learning Unsupervised learning aims the same goal:

More information

K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS

K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS K-MEANS BASED CONSENSUS CLUSTERING (KCC) A FRAMEWORK FOR DATASETS B Kalai Selvi PG Scholar, Department of CSE, Adhiyamaan College of Engineering, Hosur, Tamil Nadu, (India) ABSTRACT Data mining is the

More information

Consensus clustering by graph based approach

Consensus clustering by graph based approach Consensus clustering by graph based approach Haytham Elghazel 1, Khalid Benabdeslemi 1 and Fatma Hamdi 2 1- University of Lyon 1, LIESP, EA4125, F-69622 Villeurbanne, Lyon, France; {elghazel,kbenabde}@bat710.univ-lyon1.fr

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

A Comparison of Resampling Methods for Clustering Ensembles

A Comparison of Resampling Methods for Clustering Ensembles A Comparison of Resampling Methods for Clustering Ensembles Behrouz Minaei-Bidgoli Computer Science Department Michigan State University East Lansing, MI, 48824, USA Alexander Topchy Computer Science Department

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions

A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions Journal of mathematics and computer Science 6 (2013) 154-165 A Graph Based Approach for Clustering Ensemble of Fuzzy Partitions Mohammad Ahmadzadeh Mazandaran University of Science and Technology m.ahmadzadeh@ustmb.ac.ir

More information

Hierarchical Multi level Approach to graph clustering

Hierarchical Multi level Approach to graph clustering Hierarchical Multi level Approach to graph clustering by: Neda Shahidi neda@cs.utexas.edu Cesar mantilla, cesar.mantilla@mail.utexas.edu Advisor: Dr. Inderjit Dhillon Introduction Data sets can be presented

More information

Clustering ensemble method

Clustering ensemble method https://doi.org/10.1007/s13042-017-0756-7 ORIGINAL ARTICLE Clustering ensemble method Tahani Alqurashi 1 Wenjia Wang 1 Received: 28 September 2015 / Accepted: 20 October 2017 The Author(s) 2018 Abstract

More information

Dimensionality Reduction using Relative Attributes

Dimensionality Reduction using Relative Attributes Dimensionality Reduction using Relative Attributes Mohammadreza Babaee 1, Stefanos Tsoukalas 1, Maryam Babaee Gerhard Rigoll 1, and Mihai Datcu 1 Institute for Human-Machine Communication, Technische Universität

More information

Normalized Graph cuts. by Gopalkrishna Veni School of Computing University of Utah

Normalized Graph cuts. by Gopalkrishna Veni School of Computing University of Utah Normalized Graph cuts by Gopalkrishna Veni School of Computing University of Utah Image segmentation Image segmentation is a grouping technique used for image. It is a way of dividing an image into different

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation

Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation Ensemble Combination for Solving the Parameter Selection Problem in Image Segmentation Pakaket Wattuya and Xiaoyi Jiang Department of Mathematics and Computer Science University of Münster, Germany {wattuya,xjiang}@math.uni-muenster.de

More information

Rough Set based Cluster Ensemble Selection

Rough Set based Cluster Ensemble Selection Rough Set based Cluster Ensemble Selection Xueen Wang, Deqiang Han, Chongzhao Han Ministry of Education Key Lab for Intelligent Networks and Network Security (MOE KLINNS Lab), Institute of Integrated Automation,

More information

Weighted-Object Ensemble Clustering

Weighted-Object Ensemble Clustering 213 IEEE 13th International Conference on Data Mining Weighted-Object Ensemble Clustering Yazhou Ren School of Computer Science and Engineering South China University of Technology Guangzhou, 516, China

More information

A Local Learning Approach for Clustering

A Local Learning Approach for Clustering A Local Learning Approach for Clustering Mingrui Wu, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics 72076 Tübingen, Germany {mingrui.wu, bernhard.schoelkopf}@tuebingen.mpg.de Abstract

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage

Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage D. Peluffo-Ordoñez 1, A. E. Castro-Ospina 1, D. Chavez-Chamorro 1, C. D. Acosta-Medina 1, and G. Castellanos-Dominguez 1 1- Signal

More information

Efficient Semi-supervised Spectral Co-clustering with Constraints

Efficient Semi-supervised Spectral Co-clustering with Constraints 2010 IEEE International Conference on Data Mining Efficient Semi-supervised Spectral Co-clustering with Constraints Xiaoxiao Shi, Wei Fan, Philip S. Yu Department of Computer Science, University of Illinois

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Consensus Clusterings

Consensus Clusterings Consensus Clusterings Nam Nguyen, Rich Caruana Department of Computer Science, Cornell University Ithaca, New York 14853 {nhnguyen,caruana}@cs.cornell.edu Abstract In this paper we address the problem

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić

APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 31, No 2 (2016), 569 578 APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD Aleksandar Trokicić Abstract. Constrained clustering algorithms as an input

More information

Semi supervised clustering for Text Clustering

Semi supervised clustering for Text Clustering Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings

Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Refined Shared Nearest Neighbors Graph for Combining Multiple Data Clusterings Hanan Ayad and Mohamed Kamel Pattern Analysis and Machine Intelligence Lab, Systems Design Engineering, University of Waterloo,

More information

Semi-supervised Data Representation via Affinity Graph Learning

Semi-supervised Data Representation via Affinity Graph Learning 1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

OPTIMAL DYNAMIC LOAD BALANCE IN DISTRIBUTED SYSTEMS FOR CLIENT SERVER ASSIGNMENT

OPTIMAL DYNAMIC LOAD BALANCE IN DISTRIBUTED SYSTEMS FOR CLIENT SERVER ASSIGNMENT OPTIMAL DYNAMIC LOAD BALANCE IN DISTRIBUTED SYSTEMS FOR CLIENT SERVER ASSIGNMENT D.SARITHA Department of CS&SE, Andhra University Visakhapatnam, Andhra Pradesh Ch. SATYANANDA REDDY Professor, Department

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com

More information

Hierarchical Clustering Algorithms for Document Datasets

Hierarchical Clustering Algorithms for Document Datasets Data Mining and Knowledge Discovery, 10, 141 168, 2005 c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. Hierarchical Clustering Algorithms for Document Datasets YING ZHAO

More information

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors)

Advances in Fuzzy Clustering and Its Applications. J. Valente de Oliveira and W. Pedrycz (Editors) Advances in Fuzzy Clustering and Its Applications J. Valente de Oliveira and W. Pedrycz (Editors) Contents Preface 3 1 Soft Cluster Ensembles 1 1.1 Introduction................................... 1 1.1.1

More information

Clustering with Multiple Graphs

Clustering with Multiple Graphs Clustering with Multiple Graphs Wei Tang Department of Computer Sciences The University of Texas at Austin Austin, U.S.A wtang@cs.utexas.edu Zhengdong Lu Inst. for Computational Engineering & Sciences

More information

Study and Implementation of CHAMELEON algorithm for Gene Clustering

Study and Implementation of CHAMELEON algorithm for Gene Clustering [1] Study and Implementation of CHAMELEON algorithm for Gene Clustering 1. Motivation Saurav Sahay The vast amount of gathered genomic data from Microarray and other experiments makes it extremely difficult

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

An ICA-Based Multivariate Discretization Algorithm

An ICA-Based Multivariate Discretization Algorithm An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of

More information

Face recognition based on improved BP neural network

Face recognition based on improved BP neural network Face recognition based on improved BP neural network Gaili Yue, Lei Lu a, College of Electrical and Control Engineering, Xi an University of Science and Technology, Xi an 710043, China Abstract. In order

More information

Improving Image Segmentation Quality Via Graph Theory

Improving Image Segmentation Quality Via Graph Theory International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,

More information

A Graph Clustering Algorithm Based on Minimum and Normalized Cut

A Graph Clustering Algorithm Based on Minimum and Normalized Cut A Graph Clustering Algorithm Based on Minimum and Normalized Cut Jiabing Wang 1, Hong Peng 1, Jingsong Hu 1, and Chuangxin Yang 1, 1 School of Computer Science and Engineering, South China University of

More information

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.

DOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved. DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view

More information

Introduction to spectral clustering

Introduction to spectral clustering Introduction to spectral clustering Denis Hamad LASL ULCO Denis.Hamad@lasl.univ-littoral.fr Philippe Biela HEI LAGIS Philippe.Biela@hei.fr Data Clustering Data clustering Data clustering is an important

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

DATA clustering is an unsupervised learning technique

DATA clustering is an unsupervised learning technique IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS 1 Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities Dong Huang, Member, IEEE, Chang-Dong Wang, Member, IEEE, Hongxing

More information

Online Cross-Modal Hashing for Web Image Retrieval

Online Cross-Modal Hashing for Web Image Retrieval Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-6) Online Cross-Modal Hashing for Web Image Retrieval Liang ie Department of Mathematics Wuhan University of Technology, China

More information

Global Fuzzy C-Means with Kernels

Global Fuzzy C-Means with Kernels Global Fuzzy C-Means with Kernels Gyeongyong Heo Hun Choi Jihong Kim Department of Electronic Engineering Dong-eui University Busan, Korea Abstract Fuzzy c-means (FCM) is a simple but powerful clustering

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES

STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES 25-29 JATIT. All rights reserved. STUDYING THE FEASIBILITY AND IMPORTANCE OF GRAPH-BASED IMAGE SEGMENTATION TECHNIQUES DR.S.V.KASMIR RAJA, 2 A.SHAIK ABDUL KHADIR, 3 DR.S.S.RIAZ AHAMED. Dean (Research),

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia

More information

Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text

Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text Amir Hossein Jadidinejad Mitra Mohtarami Hadi Amiri Computer Engineering Department, Islamic Azad University,

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Bipartite Edge Prediction via Transductive Learning over Product Graphs

Bipartite Edge Prediction via Transductive Learning over Product Graphs Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

More information

Accumulation. Instituto Superior Técnico / Instituto de Telecomunicações. Av. Rovisco Pais, Lisboa, Portugal.

Accumulation. Instituto Superior Técnico / Instituto de Telecomunicações. Av. Rovisco Pais, Lisboa, Portugal. Combining Multiple Clusterings Using Evidence Accumulation Ana L.N. Fred and Anil K. Jain + Instituto Superior Técnico / Instituto de Telecomunicações Av. Rovisco Pais, 149-1 Lisboa, Portugal email: afred@lx.it.pt

More information

HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH

HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH Yi Yang, Haitao Li, Yanshun Han, Haiyan Gu Key Laboratory of Geo-informatics of State Bureau of

More information

Clustering via Random Walk Hitting Time on Directed Graphs

Clustering via Random Walk Hitting Time on Directed Graphs Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (8) Clustering via Random Walk Hitting Time on Directed Graphs Mo Chen Jianzhuang Liu Xiaoou Tang, Dept. of Information Engineering

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas. Aristotle University of Thessaloniki

Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas. Aristotle University of Thessaloniki KERNEL MATRIX TRIMMING FOR IMPROVED KERNEL K-MEANS CLUSTERING Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas Aristotle University of Thessaloniki ABSTRACT The Kernel k-means

More information

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

Clustering of Data with Mixed Attributes based on Unified Similarity Metric Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

An Approach for Reduction of Rain Streaks from a Single Image

An Approach for Reduction of Rain Streaks from a Single Image An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute

More information

Active Sampling for Constrained Clustering

Active Sampling for Constrained Clustering Paper: Active Sampling for Constrained Clustering Masayuki Okabe and Seiji Yamada Information and Media Center, Toyohashi University of Technology 1-1 Tempaku, Toyohashi, Aichi 441-8580, Japan E-mail:

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan

More information

Approximate Nearest Centroid Embedding for Kernel k-means

Approximate Nearest Centroid Embedding for Kernel k-means Approximate Nearest Centroid Embedding for Kernel k-means Ahmed Elgohary, Ahmed K. Farahat, Mohamed S. Kamel, and Fakhri Karray University of Waterloo, Waterloo, Canada N2L 3G1 {aelgohary,afarahat,mkamel,karray}@uwaterloo.ca

More information

A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH

A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH Sandhya V. Kawale Prof. Dr. S. M. Kamalapur M.E. Student Associate Professor Deparment of Computer Engineering, Deparment of Computer Engineering, K. K. Wagh

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Robust Lossless Image Watermarking in Integer Wavelet Domain using SVD

Robust Lossless Image Watermarking in Integer Wavelet Domain using SVD Robust Lossless Image Watermarking in Integer Domain using SVD 1 A. Kala 1 PG scholar, Department of CSE, Sri Venkateswara College of Engineering, Chennai 1 akala@svce.ac.in 2 K. haiyalnayaki 2 Associate

More information

Principal Coordinate Clustering

Principal Coordinate Clustering Principal Coordinate Clustering Ali Sekmen, Akram Aldroubi, Ahmet Bugra Koku, Keaton Hamm Department of Computer Science, Tennessee State University Department of Mathematics, Vanderbilt University Department

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Clustering Ensembles Based on Normalized Edges

Clustering Ensembles Based on Normalized Edges Clustering Ensembles Based on Normalized Edges Yan Li 1,JianYu 2, Pengwei Hao 1,3, and Zhulin Li 1 1 Center for Information Science, Peking University, Beijing, 100871, China {yanli, lizhulin}@cis.pku.edu.cn

More information

Clustering Documents in Large Text Corpora

Clustering Documents in Large Text Corpora Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets Algebraic Techniques for Analysis of Large Discrete-Valued Datasets Mehmet Koyutürk 1,AnanthGrama 1, and Naren Ramakrishnan 2 1 Dept. of Computer Sciences, Purdue University W. Lafayette, IN, 47907, USA

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Graph and Hypergraph Partitioning for Parallel Computing

Graph and Hypergraph Partitioning for Parallel Computing Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:

More information

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

CLASSIFICATION FOR SCALING METHODS IN DATA MINING CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department

More information

Efficient FM Algorithm for VLSI Circuit Partitioning

Efficient FM Algorithm for VLSI Circuit Partitioning Efficient FM Algorithm for VLSI Circuit Partitioning M.RAJESH #1, R.MANIKANDAN #2 #1 School Of Comuting, Sastra University, Thanjavur-613401. #2 Senior Assistant Professer, School Of Comuting, Sastra University,

More information

Generalized trace ratio optimization and applications

Generalized trace ratio optimization and applications Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO

More information

Observational Learning with Modular Networks

Observational Learning with Modular Networks Observational Learning with Modular Networks Hyunjung Shin, Hyoungjoo Lee and Sungzoon Cho {hjshin72, impatton, zoon}@snu.ac.kr Department of Industrial Engineering, Seoul National University, San56-1,

More information

COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR DATA CLUSTERING

COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR DATA CLUSTERING Author manuscript, published in "IEEE International Workshop on Machine Learning for Signal Processing, Grenoble : France (29)" COMBINING MULTIPLE PARTITIONS CREATED WITH A GRAPH-BASED CONSTRUCTION FOR

More information

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

Globally and Locally Consistent Unsupervised Projection

Globally and Locally Consistent Unsupervised Projection Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering

More information

Text clustering based on a divide and merge strategy

Text clustering based on a divide and merge strategy Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information