The clustering in general is the task of grouping a set of objects in such a way that objects

Size: px
Start display at page:

Download "The clustering in general is the task of grouping a set of objects in such a way that objects"


1 Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis Abstract This course project provide the basic theory of spectral clustering from a graph partitioning point of view. It also derives two typical spectral clustering algorithms: ratiocuts and normalized-cuts. We propose experiments on large web-graphs and discuss/analyse the results. Finally, we summarize the algorithms we have used and discuss the possibility and possible issues of using parallel computing to improve the performance. I. Introduction The clustering in general is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups (clusters). Spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. It refers to a set of heuristic algorithms, all based on the overall idea of computing the first few singular vectors and then clustering in a low (in certain cases simply one) dimensional subspace. I.1 Problem Statement In this course project I focus on using spectral clustering as an approximation solution for k- way graph partitioning problem and try to solve graph partitioning problem for large scale undirected social network graphs using two common spectral clustering techniques: ratio-cuts and normalized-cuts. II. Basic Theory of Spectral Clustering In this section I discuss the mathematical objects used in spectral clustering and the link between spectral clustering and graph partitioning. I then show how spectral clustering can be derived as an approximation to such graph partitioning problems. II.1 Graph Laplacians Suppose we have an undirected weighted graph G = (V, E). In the spectral clustering algorithm, the vertices in G is a set of vertices needs to be clustered into k clusters. Various ways can be used Second year Ph.D. student working with Prof. John Owens 1

2 to compute the edge weight between each pair of vertices. These weights form the weight matrix W, where w ij = w ji 0. The degree of a vertex v i V is defined as: d i = n w ij j=1 We can thus define the degree matrix D as the diagonal matrix with the degrees d 1, d 2,..., d n on the diagonal. The unnormalized graph Laplacian matrix is defined as L = D W. and there are two matrices which are called normalized graph Laplacians: L sym = D 1/2 LD 1/2 = I D 1/2 WD 1/2 L rw = D 1 L = I D 1 W. Von [3] s tutorial covers several properties of laplacian matrices. Now with these definitions, we can view the spectral clustering problem from the perspective of graph partitioning. II.2 Graph partitioning point of view Given a similarity graph with adjacency matrix W, the simplest and most direct way to construct a partition is to solve the mincut problem. This consists of choosing the partition A 1, A 2,..., A k which minimizes k cut(a 1,..., A k ) = cut(a i, Â i ) i=1 Since mincut always causes imbalance graph partitioning (e.g. one partition only contains one vertex), the objective function needs to be improved to guarantee that sets A 1,..., A k are "reasonably large". The two most common objective fuctions which encode this are RatioCut and the normalized cut Ncut: RatioCut(A 1,..., A k ) = Ncut(A 1,..., A k ) = k cut(a i, Â i ) A i=1 i k cut(a i, Â i ) vol(a i=1 i ) In RatioCut, the size of a subset A of a graph is measured by its number of vertices A, while Ncut the size is measured by the weights of its edges vol(a). Now we look at RatioCut and Ncut separately. First the relaxation of the RatioCut minimization problem in the case of a general value k looks like this: Given a partition of V into k sets A 1,..., A k, define k indicator vectors h i = (h 1,i,..., h n,i ) by: h i,j = { 1/ Ai if i A j 0 otherwise Then we set the matrix H R n k as the matrix containing those k indicator vectors as columns. Observe that the columns in H are orthonormal to each other, that is H H = 1. h Lh = n w ij (h i h j ) 2 i,j=1 2

3 Thus we have: = 2 cut( A i, Â i ) A i RatioCut(A 1,..., A k ) = 1 2 = (H LH) ii. k i=1 (H LH) ii. According to Dhillon [1] s paper, ratio cuts problem can be expressed as a trace maximization problem: max A i,...,a k Tr(H LH), subject to H H = I, H ij = h i,j. If we relax the problem by allowing the entries of the matrix H to take arbitrary real values, then the relaxed problem becomes: max H R n k Tr(H LH), H H = I. Von [3] s tutorial also shows the same relaxed problem also works for normalized cuts. normalized cuts we use different laplacian matrix and the problem looks like this: max Tr(U D 1/2 LD 1/2 U), U U = I. U R n k where U = D 1/2 H. A well-known solution to such problem is obtained by computing the top k eigenvectors of the laplacian matrix. These eigenvectors are then used to compute a discrete partitioning of the points. In III. Algorithm In this section, we specify the algorithm we use for this course project. The normalized spectral clustering algorithm is taken from Ng [2] and Dhillon [1] s papers; the unnormalized spectral clustering algorithm is taken from Von [3] s tutorial. Algorithm 1 Unnormalized spectral clustering algorithm 1: procedure RatioCut(W, k) Take a weight matrix and cut the graph into k parts 2: Construct diagonal matrix D 3: L D W Compute the unnormalized Laplacian L 4: Compute top k eigenvectors v 1,..., v k of L 5: Form matrix V R n k be the matrix containing the vectors v 1,..., v k as columns 6: for i 1 to n do 7: y i = V{i, :} let y i R k be the vector corresponding to the i-th row of V 8: for 9: Cluster the points (y i ) i=1,...,n in R k with the k-means algorithm into clusters C 1,..., C k 10: return Clusters A 1,..., A k with A i = j y j C i 11: procedure The normalized spectral clustering algorithm uses a different laplacian matrix L and normalizes the eigenvectors before using k-means to cluster them into k partitions: Both algorithms stated use the same framework and two different graph Laplacians. The main trick is to change the representation of the eigenvector v i to points y i R k. This change of 3

4 Algorithm 2 Normalized spectral clustering algorithm 1: procedure NCut(W, k) Take a weight matrix and cut the graph into k parts 2: Construct diagonal matrix D 3: L D 1/2 WD 1/2 Compute the normalized Laplacian L 4: Compute top k eigenvectors v 1,..., v k of L 5: Form matrix V R n k be the matrix containing the vectors v 1,..., v k as columns 6: for i 1 to n do 7: y ij = V i,j / V{i, :} let y i R k be the normalized vector corresponding to the i-th row of V 8: for 9: Cluster the points (y i ) i=1,...,n in R k with the k-means algorithm into clusters C 1,..., C k 10: return Clusters A 1,..., A k with A i = j y j C i 11: procedure representation enhances the cluster-properties in the data, so that they can be trivially detected in the new representation. IV. Implementation The implementation of the algorithm is straightforward using MATLAB. Before each algorithm, we first use a simple python script to prepare the graph data into Matrix Market format. In the unnnormalized spectral clustering algorithm the Laplacian matrix is symmetric, in the normalized spectral clustering algorithm the Laplacian matrix is symmetric positive definite. Thus we use MATLAB function eigs() to compute the top k eigenvectors. During the k-means phase we also apply MATLAB function kmeans(). One issue during the implementation is that sometimes when using eigs() to compute eigenvectors, not all eigenvalues converge. Also, because the result of k-means largely deps on the initial condition, the solution of our algorithm is not deterministic. In the worst time, empty clusters will be formed. V.1 Experiment Environment V. Experiment Results The machine we use in this course project has an 1.70GHz Intel(R) Core(TM) i7-2637m CPU with 8G RAM. The code runs under MATLAB R2012b. V.2 Data Sources For this course project, we use 4 graphs to do our experiments as the following graph shows. The first one is generated block diagonal graph which contains 4 connected components, it is to test the correctness of our implementation of the two algorithms. The second and third graph are three undirected graph, one is the Enron graph, each node denotes a person, one edge between two persons implies communication between these two persons; the other one is one category of Arxiv collaboration graph, each node is an author in Arxiv network, there is an edge between each two co-authors; the last one is a bi-partite Charity Net graph which records how multiple donors donate for different charities. The first two graphs are unweighted while the last graph is weighted by the amount of donation each donor makes. From the matrix view we 4

5 can see that both the test block diagonal graph and charitynet graph has different disconnected component, which makes clustering/partitioning much easier, while the other two graphs has either only few connected component (enron- ) or multiple uniformly distributed connected components, which makes clustering/partitioning a relatively difficult task. (a) Test Block Diagonal Graph (b) Enron Graph (c) Arxiv Collaboration Graph (d) Charitynet Graph Figure 1: Matrix view of graph topology of the four graphs we use in the course project. V.3 Result and Discussion We first show the result on partitioning the test block diagonal graph. The graph has 4 connected components, and using k = 4 to perform a ratio cut algorithm, we successfully get the optimal partitioning solution with 0 cut. The following table shows the results of performing ratio cut on enron- graph and arxiv collaboration graph with the k set as 6, 8 and 16. The table shows a great reduction of the total cuts compared with randomly picking edges as cuts in the graph. However, we can see from the table below that the size of partitioning is ill-balanced for the arxiv collaboration and enron graph. The reason behind this is the topology of graph. As we have stated, the enron graph contains only few connected components, the arxiv collaboration graph contains multiple 5

6 uniformly distributed connected components. Stdev of cut number Partition Numer 6 Partition Number 8 Partition Number 16 Arxiv Collaboration Enron (a) Enron Graph (b) Arxiv Collaboration Graph Figure 2: Edges of cut and totla edges in graph ratio for enron and arxiv collaboration graph using ratiocuts algorithm. The second experiment we have done is on a weighted bi-partite graph CharityNet graph. According to the nice topology of the graph itself (with 27 connected components), we get better result in terms of the balance of partitioning when applied normalized cuts algorithm on it. As the following graph shows. Figure 3: Size of 16 partitions when applied ncut algorithm on charitynet graph vs average partition size. The experiment results show that both ratiocut and ncut can give a good approximation of the min-cut graph partitioning problem in terms of reducing the cut size. However, even ncut algorithm performed on weighted graph with multiple connected components still cannot give me perfect result in terms of load balancing of partition size. Also, I find out for dealing with exscale graph data, using clustering or community detection algorithm as a method to do graph partitioning is unnecessary for basic primitives such as breadth first search, shortest path and connected component labeling. VI. Conclusion In this course project I learned about the idea of spectral clustering: construct graph partitions based on eigenvectors of the adjacency matrix. I realized this beautiful theory has been exted 6

7 to data mining, machine learning and many more fields. The success of spectral clustering is based on its ability to solve very general problems without any assumptions on the form of the clusters. It is also easy to implemented once we define the similarity graph. The main process is to solve a linear problem, there is no issues such as converge cretiria or restarting algorithm with different initializations. However, from the experiments I have done, I find out that defining a good similarity graph and choose the right laplacian matrix is important and has a great influence on the stability of the algorithm. Also, clustering/graph partitioning as a irregular problem is an interesting topic in general purpose GPU computing. Parallel computing is one promising solution to overcome the computational challenge of large scale clustering problem. With the linear algebra tools such as cusparse and cublas, both the building blocks of spectral clustering algorithm: eigenvector computing and k-means algorithm can be implemented on GPU. The issues come with this are the following: 1) An efficient data layout for graph which enhance the performance of irregular memory access 2) Ways to decouple depencies between graph nodes/edges when doing the computation. References [1] Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 04, pages , New York, NY, USA, ACM. [2] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2: , [3] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4): ,

8 Appix Listing 1: Normalized Laplacian Matrix Code function [ L ] = laplacian( A ) %LAPLACIAN Summary of this function goes here % Detailed explanation goes here row = sum(a,2); row = sqrt(row); one = ones(size(row), 1); row = one./row; n = length(row); D = spdiags(row(:),0,n,n); L = D*A*D; Listing 2: Unnormalized Laplacian Matrix Code function [ L ] = laplacian2( A ) %LAPLACIAN Summary of this function goes here % Detailed explanation goes here row = sum(a,2); n = length(row); D = spdiags(row(:),0,n,n); L = D-A; L = inv(l); Listing 3: Spectral Clustering Code function [ idx,c, sum ] = cluster( L, k ) [V, ]=eigs(l,k); [n1,n] = size(v); for i=1:n, row = V(i,:); norm_row = norm(row); for j=1:k, V(i,j)=V(i,j)/norm_row; [idx,c, sum]=kmeans(v,k); fid = fopen( partition.txt, wt ); for i=1:n1, fprintf(fid, %d\n, idx(i)); fclose(fid); 3

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6, Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database October 6, 2015 Spectral Clustering on Handwritten Digits Database Danielle Advisor: Kasso Okoudjou Department of Mathematics University of Maryland- College Park Advance

More information

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information


APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 31, No 2 (2016), 569 578 APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD Aleksandar Trokicić Abstract. Constrained clustering algorithms as an input

More information

Application of Spectral Clustering Algorithm

Application of Spectral Clustering Algorithm 1/27 Application of Spectral Clustering Algorithm Danielle Middlebrooks Advisor: Kasso Okoudjou Department of Mathematics University of Maryland- College Park Advance

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

Clustering in Networks

Clustering in Networks Clustering in Networks (Spectral Clustering with the Graph Laplacian... a brief introduction) Tom Carter Computer Science CSU Stanislaus tom/clustering April 1, 2012 1 Our general

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA Chandola@UB CSE 474/574 1 / 19 Outline

More information

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Apr 7, 2010 Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg 1 Data Clustering Graph Clustering Goal: Given data points X1,, Xn and similarities

More information

Introduction to spectral clustering

Introduction to spectral clustering Introduction to spectral clustering Vasileios Zografos Klas Nordberg What this course is Basic introduction into the core ideas of spectral clustering Sufficient to

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral Clustering Course Webpage : Survey Survey Survey Competition I Out! Preliminary report of

More information

A Weighted Kernel PCA Approach to Graph-Based Image Segmentation

A Weighted Kernel PCA Approach to Graph-Based Image Segmentation A Weighted Kernel PCA Approach to Graph-Based Image Segmentation Carlos Alzate Johan A. K. Suykens ESAT-SCD-SISTA Katholieke Universiteit Leuven Leuven, Belgium January 25, 2007 International Conference

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 11 Fei Wang Associate Professor Department of Computer Science and Engineering Clustering II Spectral

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

Graph drawing in spectral layout

Graph drawing in spectral layout Graph drawing in spectral layout Maureen Gallagher Colleen Tygh John Urschel Ludmil Zikatanov Beginning: July 8, 203; Today is: October 2, 203 Introduction Our research focuses on the use of spectral graph

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Clustering Lecture 8 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein Clustering: Unsupervised learning Clustering Requires

More information

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015 Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva Note to other teachers and users of these slides: We would be

More information

My favorite application using eigenvalues: partitioning and community detection in social networks

My favorite application using eigenvalues: partitioning and community detection in social networks My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,

More information

Lesson 2 7 Graph Partitioning

Lesson 2 7 Graph Partitioning Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:

More information

Non Overlapping Communities

Non Overlapping Communities Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from:

More information

MATH 567: Mathematical Techniques in Data

MATH 567: Mathematical Techniques in Data Supervised and unsupervised learning Supervised learning problems: MATH 567: Mathematical Techniques in Data (X, Y ) P (X, Y ). Data Science Clustering I is labelled (input/output) with joint density We

More information

Spectral Clustering and Community Detection in Labeled Graphs

Spectral Clustering and Community Detection in Labeled Graphs Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208

More information

Algebraic Graph Theory- Adjacency Matrix and Spectrum

Algebraic Graph Theory- Adjacency Matrix and Spectrum Algebraic Graph Theory- Adjacency Matrix and Spectrum Michael Levet December 24, 2013 Introduction This tutorial will introduce the adjacency matrix, as well as spectral graph theory. For those familiar

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Spectral Graph Clustering

Spectral Graph Clustering Spectral Graph Clustering Benjamin Auffarth Universitat de Barcelona course report for Técnicas Avanzadas de Aprendizaje at Universitat Politècnica de Catalunya January 15, 2007 Abstract Spectral clustering

More information

Introduction aux Systèmes Collaboratifs Multi-Agents

Introduction aux Systèmes Collaboratifs Multi-Agents M1 EEAII - Découverte de la Recherche (ViRob) Introduction aux Systèmes Collaboratifs Multi-Agents UPJV, Département EEA Fabio MORBIDI Laboratoire MIS Équipe Perception et Robotique E-mail:

More information

Application of Graph Clustering on Scientific Papers Subject Classification

Application of Graph Clustering on Scientific Papers Subject Classification Application of Graph Clustering on Scientific Papers Subject Classification Yutian Liu, Zhefei Yu, Qi Zeng In this paper, we realize the equivalence between k-means clustering and graph spectrum clustering,

More information

Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas. Aristotle University of Thessaloniki

Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas. Aristotle University of Thessaloniki KERNEL MATRIX TRIMMING FOR IMPROVED KERNEL K-MEANS CLUSTERING Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas Aristotle University of Thessaloniki ABSTRACT The Kernel k-means

More information

Introduction to spectral clustering

Introduction to spectral clustering Introduction to spectral clustering Denis Hamad LASL ULCO Philippe Biela HEI LAGIS Data Clustering Data clustering Data clustering is an important

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix

More information

Course Introduction / Review of Fundamentals of Graph Theory

Course Introduction / Review of Fundamentals of Graph Theory Course Introduction / Review of Fundamentals of Graph Theory Hiroki Sayama Rise of Network Science (From Barabasi 2010) 2 Network models Many discrete parts involved Classic mean-field

More information


REGULAR GRAPHS OF GIVEN GIRTH. Contents REGULAR GRAPHS OF GIVEN GIRTH BROOKE ULLERY Contents 1. Introduction This paper gives an introduction to the area of graph theory dealing with properties of regular graphs of given girth. A large portion

More information

Constrained Clustering with Interactive Similarity Learning

Constrained Clustering with Interactive Similarity Learning SCIS & ISIS 2010, Dec. 8-12, 2010, Okayama Convention Center, Okayama, Japan Constrained Clustering with Interactive Similarity Learning Masayuki Okabe Toyohashi University of Technology Tenpaku 1-1, Toyohashi,

More information

Principal Coordinate Clustering

Principal Coordinate Clustering Principal Coordinate Clustering Ali Sekmen, Akram Aldroubi, Ahmet Bugra Koku, Keaton Hamm Department of Computer Science, Tennessee State University Department of Mathematics, Vanderbilt University Department

More information

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. Graph Databases Basic

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts Inderjit Dhillon, Yuqiang Guan and Brian Kulis University of Texas at Austin Department of Computer Sciences Austin, TX 78712 UTCS Technical

More information

Generalized trace ratio optimization and applications

Generalized trace ratio optimization and applications Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information


BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY General definitions; Representations; Graph Traversals; Topological sort; Graphs definitions & representations Graph theory is a fundamental tool in sparse

More information

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras Spectral Graph Sparsification: overview of theory and practical methods Yiannis Koutis University of Puerto Rico - Rio Piedras Graph Sparsification or Sketching Compute a smaller graph that preserves some

More information

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan Explore Co-clustering on Job Applications Qingyun Wan SUNet ID:qywan 1 Introduction In the job marketplace, the supply side represents the job postings posted by job posters and the demand side presents

More information

Statistical Physics of Community Detection

Statistical Physics of Community Detection Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined

More information

Machine learning - HT Clustering

Machine learning - HT Clustering Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not

More information

Comparison of Graph Clustering Algorithms Aditya Dubey #1, Sanjiv Sharma #2

Comparison of Graph Clustering Algorithms Aditya Dubey #1, Sanjiv Sharma #2 Comparison of Graph Clustering Algorithms Aditya Dubey #1, Sanjiv Sharma #2 Department of CSE/IT Madhav Institute of Technology & Science Gwalior, India Abstract Clustering algorithms are one of the ways

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Design of Orthogonal Graph Wavelet Filter Banks

Design of Orthogonal Graph Wavelet Filter Banks Design of Orthogonal Graph Wavelet Filter Banks Xi ZHANG Department of Communication Engineering and Informatics The University of Electro-Communications Chofu-shi, Tokyo, 182-8585 JAPAN E-mail:

More information

Image Segmentation continued Graph Based Methods

Image Segmentation continued Graph Based Methods Image Segmentation continued Graph Based Methods Previously Images as graphs Fully-connected graph node (vertex) for every pixel link between every pair of pixels, p,q affinity weight w pq for each link

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

An Introduction to Graph Theory

An Introduction to Graph Theory An Introduction to Graph Theory CIS008-2 Logic and Foundations of Mathematics David Goodwin 12:00, Friday 17 th February 2012 Outline 1 Graphs 2 Paths and cycles 3 Graphs and

More information

Efficient Semi-supervised Spectral Co-clustering with Constraints

Efficient Semi-supervised Spectral Co-clustering with Constraints 2010 IEEE International Conference on Data Mining Efficient Semi-supervised Spectral Co-clustering with Constraints Xiaoxiao Shi, Wei Fan, Philip S. Yu Department of Computer Science, University of Illinois

More information

WDM Network Provisioning

WDM Network Provisioning IO2654 Optical Networking WDM Network Provisioning Paolo Monti Optical Networks Lab (ONLab), Communication Systems Department (COS) Some of the material is taken from the

More information

An introduction to graph analysis and modeling Descriptive Analysis of Network Data

An introduction to graph analysis and modeling Descriptive Analysis of Network Data An introduction to graph analysis and modeling Descriptive Analysis of Network Data MSc in Statistics for Smart Data ENSAI Autumn semester 2017 1 / 68 Outline 1 Basic

More information

Segmentation Computer Vision Spring 2018, Lecture 27

Segmentation Computer Vision Spring 2018, Lecture 27 Segmentation 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach

Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach Biljana Stojkoska, Ilinka Ivanoska, Danco Davcev, 1 Faculty of Electrical Engineering and Information

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

CSCI5070 Advanced Topics in Social Computing

CSCI5070 Advanced Topics in Social Computing CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong!! 2012 All Rights Reserved. Outline Graphs Origins Definition Spectral Properties Type of

More information

Feature Selection for fmri Classification

Feature Selection for fmri Classification Feature Selection for fmri Classification Chuang Wu Program of Computational Biology Carnegie Mellon University Pittsburgh, PA 15213 Abstract The functional Magnetic Resonance Imaging

More information

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec. Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming

More information

Hierarchical Multi level Approach to graph clustering

Hierarchical Multi level Approach to graph clustering Hierarchical Multi level Approach to graph clustering by: Neda Shahidi Cesar mantilla, Advisor: Dr. Inderjit Dhillon Introduction Data sets can be presented

More information

Size Regularized Cut for Data Clustering

Size Regularized Cut for Data Clustering Size Regularized Cut for Data Clustering Yixin Chen Department of CS Univ. of New Orleans Ya Zhang Department of EECS Uinv. of Kansas Xiang Ji NEC-Labs America, Inc.

More information

Clustering Algorithms for general similarity measures

Clustering Algorithms for general similarity measures Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information



More information

WDM Network Provisioning

WDM Network Provisioning IO2654 Optical Networking WDM Network Provisioning Paolo Monti Optical Networks Lab (ONLab), Communication Systems Department (COS) Some of the material is taken from the

More information

Locally Linear Landmarks for large-scale manifold learning

Locally Linear Landmarks for large-scale manifold learning Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced

More information

CSCI 5454 Ramdomized Min Cut

CSCI 5454 Ramdomized Min Cut CSCI 5454 Ramdomized Min Cut Sean Wiese, Ramya Nair April 8, 013 1 Randomized Minimum Cut A classic problem in computer science is finding the minimum cut of an undirected graph. If we are presented with

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about. 1 Overview This class is about the following

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Lecture 2: From Structured Data to Graphs and Spectral Analysis

Lecture 2: From Structured Data to Graphs and Spectral Analysis Lecture 2: From Structured Data to Graphs and Spectral Analysis Radu Balan February 9, 2017 Data Sets Today we discuss type of data sets and graphs. The overarching problem is the following: Main Problem

More information

Co-Clustering by Similarity Refinement

Co-Clustering by Similarity Refinement Co-Clustering by Similarity Refinement Jian Zhang SRI International 333 Ravenswood Ave Menlo Park, CA 94025 Abstract When a data set contains objects of multiple types, to cluster the objects of one type,

More information

Advanced Machine Learning Practical 2: Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means)

Advanced Machine Learning Practical 2: Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means) Advanced Machine Learning Practical : Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails:

More information

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation Targil : Image Segmentation Image segmentation Many slides from Steve Seitz Segment region of the image which: elongs to a single object. Looks uniform (gray levels, color ) Have the same attributes (texture

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Solving problems on graph algorithms

Solving problems on graph algorithms Solving problems on graph algorithms Workshop Organized by: ACM Unit, Indian Statistical Institute, Kolkata. Tutorial-3 Date: 06.07.2017 Let G = (V, E) be an undirected graph. For a vertex v V, G {v} is

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Lecture 19: Graph Partitioning

Lecture 19: Graph Partitioning Lecture 19: Graph Partitioning David Bindel 3 Nov 2011 Logistics Please finish your project 2. Please start your project 3. Graph partitioning Given: Graph G = (V, E) Possibly weights (W V, W E ). Possibly

More information

Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage

Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage D. Peluffo-Ordoñez 1, A. E. Castro-Ospina 1, D. Chavez-Chamorro 1, C. D. Acosta-Medina 1, and G. Castellanos-Dominguez 1 1- Signal

More information



More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparative Study on Exact Triangle Counting Algorithms on the GPU A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive

More information

Efficient FM Algorithm for VLSI Circuit Partitioning

Efficient FM Algorithm for VLSI Circuit Partitioning Efficient FM Algorithm for VLSI Circuit Partitioning M.RAJESH #1, R.MANIKANDAN #2 #1 School Of Comuting, Sastra University, Thanjavur-613401. #2 Senior Assistant Professer, School Of Comuting, Sastra University,

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Math.3336: Discrete Mathematics. Chapter 10 Graph Theory

Math.3336: Discrete Mathematics. Chapter 10 Graph Theory Math.3336: Discrete Mathematics Chapter 10 Graph Theory Instructor: Dr. Blerina Xhabli Department of Mathematics, University of Houston blerina Email: Fall

More information