The clustering in general is the task of grouping a set of objects in such a way that objects

Similar documents
Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Spectral Clustering on Handwritten Digits Database

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Visual Representations for Machine Learning

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić

Application of Spectral Clustering Algorithm

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Clustering in Networks

Introduction to Machine Learning

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Introduction to spectral clustering

Machine Learning for Data Science (CS4786) Lecture 11

A Weighted Kernel PCA Approach to Graph-Based Image Segmentation

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

Spectral Methods for Network Community Detection and Graph Partitioning

Graph drawing in spectral layout

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Mining Social Network Graphs

My favorite application using eigenvalues: partitioning and community detection in social networks

Lesson 2 7 Graph Partitioning

Non Overlapping Communities

MATH 567: Mathematical Techniques in Data

Spectral Clustering and Community Detection in Labeled Graphs

Segmentation: Clustering, Graph Cut and EM

Algebraic Graph Theory- Adjacency Matrix and Spectrum

Social-Network Graphs

TELCOM2125: Network Science and Analysis

Spectral Graph Clustering

Introduction aux Systèmes Collaboratifs Multi-Agents

Application of Graph Clustering on Scientific Papers Subject Classification

Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas. Aristotle University of Thessaloniki

Introduction to spectral clustering

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Social Network Analysis

Course Introduction / Review of Fundamentals of Graph Theory

REGULAR GRAPHS OF GIVEN GIRTH. Contents

Constrained Clustering with Interactive Similarity Learning

Principal Coordinate Clustering

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Clustering: Classic Methods and Modern Views

Big Data Management and NoSQL Databases

Community Detection. Community

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

Generalized trace ratio optimization and applications

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Statistical Physics of Community Detection

Machine learning - HT Clustering

Comparison of Graph Clustering Algorithms Aditya Dubey #1, Sanjiv Sharma #2

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Design of Orthogonal Graph Wavelet Filter Banks

Image Segmentation continued Graph Based Methods

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

An Introduction to Graph Theory

Efficient Semi-supervised Spectral Co-clustering with Constraints

WDM Network Provisioning

An introduction to graph analysis and modeling Descriptive Analysis of Network Data

Segmentation Computer Vision Spring 2018, Lecture 27

Modularity CMSC 858L

Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

CSCI5070 Advanced Topics in Social Computing

Feature Selection for fmri Classification

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Hierarchical Multi level Approach to graph clustering

Size Regularized Cut for Data Clustering

Clustering Algorithms for general similarity measures

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

OPTIMAL DYNAMIC LOAD BALANCE IN DISTRIBUTED SYSTEMS FOR CLIENT SERVER ASSIGNMENT

WDM Network Provisioning

Locally Linear Landmarks for large-scale manifold learning

CSCI 5454 Ramdomized Min Cut

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Lecture 2: From Structured Data to Graphs and Spectral Analysis

Co-Clustering by Similarity Refinement

Advanced Machine Learning Practical 2: Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means)

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation

General Instructions. Questions

Solving problems on graph algorithms

A Course in Machine Learning

ECS289: Scalable Machine Learning

Lecture 19: Graph Partitioning

Normalized Cuts Clustering with Prior Knowledge and a Pre-clustering Stage

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

CS 5220: Parallel Graph Algorithms. David Bindel

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

6 Randomized rounding of semidefinite programs

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Efficient FM Algorithm for VLSI Circuit Partitioning

Extracting Information from Complex Networks

Behavioral Data Mining. Lecture 18 Clustering

Math.3336: Discrete Mathematics. Chapter 10 Graph Theory

Transcription:

Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory of spectral clustering from a graph partitioning point of view. It also derives two typical spectral clustering algorithms: ratiocuts and normalized-cuts. We propose experiments on large web-graphs and discuss/analyse the results. Finally, we summarize the algorithms we have used and discuss the possibility and possible issues of using parallel computing to improve the performance. I. Introduction The clustering in general is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups (clusters). Spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. It refers to a set of heuristic algorithms, all based on the overall idea of computing the first few singular vectors and then clustering in a low (in certain cases simply one) dimensional subspace. I.1 Problem Statement In this course project I focus on using spectral clustering as an approximation solution for k- way graph partitioning problem and try to solve graph partitioning problem for large scale undirected social network graphs using two common spectral clustering techniques: ratio-cuts and normalized-cuts. II. Basic Theory of Spectral Clustering In this section I discuss the mathematical objects used in spectral clustering and the link between spectral clustering and graph partitioning. I then show how spectral clustering can be derived as an approximation to such graph partitioning problems. II.1 Graph Laplacians Suppose we have an undirected weighted graph G = (V, E). In the spectral clustering algorithm, the vertices in G is a set of vertices needs to be clustered into k clusters. Various ways can be used Second year Ph.D. student working with Prof. John Owens 1

to compute the edge weight between each pair of vertices. These weights form the weight matrix W, where w ij = w ji 0. The degree of a vertex v i V is defined as: d i = n w ij j=1 We can thus define the degree matrix D as the diagonal matrix with the degrees d 1, d 2,..., d n on the diagonal. The unnormalized graph Laplacian matrix is defined as L = D W. and there are two matrices which are called normalized graph Laplacians: L sym = D 1/2 LD 1/2 = I D 1/2 WD 1/2 L rw = D 1 L = I D 1 W. Von [3] s tutorial covers several properties of laplacian matrices. Now with these definitions, we can view the spectral clustering problem from the perspective of graph partitioning. II.2 Graph partitioning point of view Given a similarity graph with adjacency matrix W, the simplest and most direct way to construct a partition is to solve the mincut problem. This consists of choosing the partition A 1, A 2,..., A k which minimizes k cut(a 1,..., A k ) = cut(a i, Â i ) i=1 Since mincut always causes imbalance graph partitioning (e.g. one partition only contains one vertex), the objective function needs to be improved to guarantee that sets A 1,..., A k are "reasonably large". The two most common objective fuctions which encode this are RatioCut and the normalized cut Ncut: RatioCut(A 1,..., A k ) = Ncut(A 1,..., A k ) = k cut(a i, Â i ) A i=1 i k cut(a i, Â i ) vol(a i=1 i ) In RatioCut, the size of a subset A of a graph is measured by its number of vertices A, while Ncut the size is measured by the weights of its edges vol(a). Now we look at RatioCut and Ncut separately. First the relaxation of the RatioCut minimization problem in the case of a general value k looks like this: Given a partition of V into k sets A 1,..., A k, define k indicator vectors h i = (h 1,i,..., h n,i ) by: h i,j = { 1/ Ai if i A j 0 otherwise Then we set the matrix H R n k as the matrix containing those k indicator vectors as columns. Observe that the columns in H are orthonormal to each other, that is H H = 1. h Lh = n w ij (h i h j ) 2 i,j=1 2

Thus we have: = 2 cut( A i, Â i ) A i RatioCut(A 1,..., A k ) = 1 2 = (H LH) ii. k i=1 (H LH) ii. According to Dhillon [1] s paper, ratio cuts problem can be expressed as a trace maximization problem: max A i,...,a k Tr(H LH), subject to H H = I, H ij = h i,j. If we relax the problem by allowing the entries of the matrix H to take arbitrary real values, then the relaxed problem becomes: max H R n k Tr(H LH), H H = I. Von [3] s tutorial also shows the same relaxed problem also works for normalized cuts. normalized cuts we use different laplacian matrix and the problem looks like this: max Tr(U D 1/2 LD 1/2 U), U U = I. U R n k where U = D 1/2 H. A well-known solution to such problem is obtained by computing the top k eigenvectors of the laplacian matrix. These eigenvectors are then used to compute a discrete partitioning of the points. In III. Algorithm In this section, we specify the algorithm we use for this course project. The normalized spectral clustering algorithm is taken from Ng [2] and Dhillon [1] s papers; the unnormalized spectral clustering algorithm is taken from Von [3] s tutorial. Algorithm 1 Unnormalized spectral clustering algorithm 1: procedure RatioCut(W, k) Take a weight matrix and cut the graph into k parts 2: Construct diagonal matrix D 3: L D W Compute the unnormalized Laplacian L 4: Compute top k eigenvectors v 1,..., v k of L 5: Form matrix V R n k be the matrix containing the vectors v 1,..., v k as columns 6: for i 1 to n do 7: y i = V{i, :} let y i R k be the vector corresponding to the i-th row of V 8: for 9: Cluster the points (y i ) i=1,...,n in R k with the k-means algorithm into clusters C 1,..., C k 10: return Clusters A 1,..., A k with A i = j y j C i 11: procedure The normalized spectral clustering algorithm uses a different laplacian matrix L and normalizes the eigenvectors before using k-means to cluster them into k partitions: Both algorithms stated use the same framework and two different graph Laplacians. The main trick is to change the representation of the eigenvector v i to points y i R k. This change of 3

Algorithm 2 Normalized spectral clustering algorithm 1: procedure NCut(W, k) Take a weight matrix and cut the graph into k parts 2: Construct diagonal matrix D 3: L D 1/2 WD 1/2 Compute the normalized Laplacian L 4: Compute top k eigenvectors v 1,..., v k of L 5: Form matrix V R n k be the matrix containing the vectors v 1,..., v k as columns 6: for i 1 to n do 7: y ij = V i,j / V{i, :} let y i R k be the normalized vector corresponding to the i-th row of V 8: for 9: Cluster the points (y i ) i=1,...,n in R k with the k-means algorithm into clusters C 1,..., C k 10: return Clusters A 1,..., A k with A i = j y j C i 11: procedure representation enhances the cluster-properties in the data, so that they can be trivially detected in the new representation. IV. Implementation The implementation of the algorithm is straightforward using MATLAB. Before each algorithm, we first use a simple python script to prepare the graph data into Matrix Market format. In the unnnormalized spectral clustering algorithm the Laplacian matrix is symmetric, in the normalized spectral clustering algorithm the Laplacian matrix is symmetric positive definite. Thus we use MATLAB function eigs() to compute the top k eigenvectors. During the k-means phase we also apply MATLAB function kmeans(). One issue during the implementation is that sometimes when using eigs() to compute eigenvectors, not all eigenvalues converge. Also, because the result of k-means largely deps on the initial condition, the solution of our algorithm is not deterministic. In the worst time, empty clusters will be formed. V.1 Experiment Environment V. Experiment Results The machine we use in this course project has an 1.70GHz Intel(R) Core(TM) i7-2637m CPU with 8G RAM. The code runs under MATLAB R2012b. V.2 Data Sources For this course project, we use 4 graphs to do our experiments as the following graph shows. The first one is generated block diagonal graph which contains 4 connected components, it is to test the correctness of our implementation of the two algorithms. The second and third graph are three undirected graph, one is the Enron email graph, each node denotes a person, one edge between two persons implies email communication between these two persons; the other one is one category of Arxiv collaboration graph, each node is an author in Arxiv network, there is an edge between each two co-authors; the last one is a bi-partite Charity Net graph which records how multiple donors donate for different charities. The first two graphs are unweighted while the last graph is weighted by the amount of donation each donor makes. From the matrix view we 4

can see that both the test block diagonal graph and charitynet graph has different disconnected component, which makes clustering/partitioning much easier, while the other two graphs has either only few connected component (enron-email) or multiple uniformly distributed connected components, which makes clustering/partitioning a relatively difficult task. (a) Test Block Diagonal Graph (b) Enron Email Graph (c) Arxiv Collaboration Graph (d) Charitynet Graph Figure 1: Matrix view of graph topology of the four graphs we use in the course project. V.3 Result and Discussion We first show the result on partitioning the test block diagonal graph. The graph has 4 connected components, and using k = 4 to perform a ratio cut algorithm, we successfully get the optimal partitioning solution with 0 cut. The following table shows the results of performing ratio cut on enron-email graph and arxiv collaboration graph with the k set as 6, 8 and 16. The table shows a great reduction of the total cuts compared with randomly picking edges as cuts in the graph. However, we can see from the table below that the size of partitioning is ill-balanced for the arxiv collaboration and enron email graph. The reason behind this is the topology of graph. As we have stated, the enron email graph contains only few connected components, the arxiv collaboration graph contains multiple 5

uniformly distributed connected components. Stdev of cut number Partition Numer 6 Partition Number 8 Partition Number 16 Arxiv Collaboration 2116.6 1832.4 1294.3 Enron Email 14975 12225 9165.2 (a) Enron Email Graph (b) Arxiv Collaboration Graph Figure 2: Edges of cut and totla edges in graph ratio for enron email and arxiv collaboration graph using ratiocuts algorithm. The second experiment we have done is on a weighted bi-partite graph CharityNet graph. According to the nice topology of the graph itself (with 27 connected components), we get better result in terms of the balance of partitioning when applied normalized cuts algorithm on it. As the following graph shows. Figure 3: Size of 16 partitions when applied ncut algorithm on charitynet graph vs average partition size. The experiment results show that both ratiocut and ncut can give a good approximation of the min-cut graph partitioning problem in terms of reducing the cut size. However, even ncut algorithm performed on weighted graph with multiple connected components still cannot give me perfect result in terms of load balancing of partition size. Also, I find out for dealing with exscale graph data, using clustering or community detection algorithm as a method to do graph partitioning is unnecessary for basic primitives such as breadth first search, shortest path and connected component labeling. VI. Conclusion In this course project I learned about the idea of spectral clustering: construct graph partitions based on eigenvectors of the adjacency matrix. I realized this beautiful theory has been exted 6

to data mining, machine learning and many more fields. The success of spectral clustering is based on its ability to solve very general problems without any assumptions on the form of the clusters. It is also easy to implemented once we define the similarity graph. The main process is to solve a linear problem, there is no issues such as converge cretiria or restarting algorithm with different initializations. However, from the experiments I have done, I find out that defining a good similarity graph and choose the right laplacian matrix is important and has a great influence on the stability of the algorithm. Also, clustering/graph partitioning as a irregular problem is an interesting topic in general purpose GPU computing. Parallel computing is one promising solution to overcome the computational challenge of large scale clustering problem. With the linear algebra tools such as cusparse and cublas, both the building blocks of spectral clustering algorithm: eigenvector computing and k-means algorithm can be implemented on GPU. The issues come with this are the following: 1) An efficient data layout for graph which enhance the performance of irregular memory access 2) Ways to decouple depencies between graph nodes/edges when doing the computation. References [1] Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 04, pages 551 556, New York, NY, USA, 2004. ACM. [2] Andrew Y Ng, Michael I Jordan, Yair Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849 856, 2002. [3] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395 416, 2007. 7

Appix Listing 1: Normalized Laplacian Matrix Code function [ L ] = laplacian( A ) %LAPLACIAN Summary of this function goes here % Detailed explanation goes here row = sum(a,2); row = sqrt(row); one = ones(size(row), 1); row = one./row; n = length(row); D = spdiags(row(:),0,n,n); L = D*A*D; Listing 2: Unnormalized Laplacian Matrix Code function [ L ] = laplacian2( A ) %LAPLACIAN Summary of this function goes here % Detailed explanation goes here row = sum(a,2); n = length(row); D = spdiags(row(:),0,n,n); L = D-A; L = inv(l); Listing 3: Spectral Clustering Code function [ idx,c, sum ] = cluster( L, k ) [V, ]=eigs(l,k); [n1,n] = size(v); for i=1:n, row = V(i,:); norm_row = norm(row); for j=1:k, V(i,j)=V(i,j)/norm_row; [idx,c, sum]=kmeans(v,k); fid = fopen( partition.txt, wt ); for i=1:n1, fprintf(fid, %d\n, idx(i)); fclose(fid); 3