Local Graph Clustering and Applications to Graph Partitioning

Size: px

Start display at page:

Download "Local Graph Clustering and Applications to Graph Partitioning"

Arline Garrett
6 years ago
Views:

1 Local Graph Clustering and Applications to Graph Partitioning Veronika Strnadova June 12, Introduction Today there exist innumerable applications which require the analysis of some type of large graph. Social networks useful to social scientists, protein interaction and neural networks arising in biological problems, as well as geometric meshes used in many diverse scientific applications, have grown, thanks to the availability and abundance of data, to tremendous proportions. Besides the sheer size of some of these datasets the world wide web, for example, is estimated to contain at least 4.75 billion pages [11] the analysis of even moderately large networks, on the order of tens of thousands of vertices, poses a significant challenge when researches need information beyond basic metrics such as the degree distribution. The NP-complete problem of finding an optimal clustering of vertices within one of these networks, for example, has been under investigation for over a decade by various scientific communities, including data mining, computational biology, and combinatorics. One approach to deal with these problems is to partition such networks, cutting them into smaller, more manageable pieces which may be processed in parallel. However, partitioning large graphs is a computationally intensive problem in itself few methods exist which can partition a graph with n vertices and m edges in time that is close to even O(n 2 ) or O(m). A breakthrough in recent years has been the advent of local methods for graph partitioning, achieving a time complexity that is close to linear in the number of edges. The first of these methods was made possible by a local clustering algorithm called Nibble described in Spielman and Teng s A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning [1]. Nibble attempts to minimize a clustering quality metric known as the conductance, a ratio of the number of external to internal edges in a cluster, and, given a starting vertex, provably finds a cluster containing that vertex in time that is proportional to the size of the cluster. Finding a cluster in time proportional to its size is an extremely valuable routine in itself, and the authors show how Nibble can be used as a subroutine to repeatedly remove small clusters from a large graph in order to obtain a nearly-linear time graph partitioning algorithm. Before diving into the details of local clustering methods, I will first give a background summary of some traditional clustering and partitioning approaches that have been popular in the past decade. 1

2 2 Background To properly appreciate local clustering algorithms, we must consider the competition. Popular clustering algorithms in recent years have generally taken on one of several different flavors, depending on both the type of data available and the quality of the clusters desired. In all cases, we are given a graph G(V, E) and we would like to find clusters in the graph. Cluster quality may be defined in a number of different ways, but often it is a ratio relating the number of edges which join two vertices within a cluster to the number of edges joining one vertex inside the cluster and one vertex outside the cluster. If we are given geometric coordinates for vertices, we can rely on algorithms such as k-means to produce reliable clusters which find sets of vertices that are close together in some geometric space. The k-means algorithm finds k clusters by iteratively choosing the geometric mean of each current cluster, and then reassigning points to their closest means. The algorithm converges when there are no (or few) changes to the cluster assignments. k-means has many variants, and today there exist randomized algorithms that reduce the number of iterations to convergence and even avoid explicitly specifying k at the outset. However, no known algorithm k-means-like algorithm can compute just one of these clusters quickly it is a global algorithm which looks at all the vertices in the graph at once. Finding the optimal k clusters of n points in a geometric space is already NP-hard, and thus the traditional k-means algorithm can take exponential time. To make the problem harder, if we do not have geometric information about vertices (data points), even the simple k-means algorithm cannot be applied. In this case, a form of spectral clustering is often useful. Spectral clustering is so called because it is an algorithm which uses the eigenvectors of the Laplacian matrix of a graph to approximate the optimal cut of that graph, separating vertices into clusters which share few edges with other clusters but contain many intra-cluster edges. The objective of spectral clustering is to minimize a quantity known as the RatioCut or NCut of k subsets of vertices A 1... A k. These quantities are defined in [13] and reprinted below: RatioCut(A 1,..., A k ) = 1 2 k i=1 W (A i, Āi) A i NCut(A 1,..., A k ) = 1 2 k i=1 W (A i, Āi) V ol(a i ) Here V ol(a i ) is the sum of the degrees of the vertices in subset A i, and W (A i, Āi) is the number of edges with one endpoint in A i and one endpoint outside of A i, weighted by the edge weights. We can see that minimizing either quantity attempts to separate vertices into clusters which are approximately balanced (the number of vertices or edges within A i should be high) and which do 2

3 not share many edges with other. Again, finding such a k-partitioning of a graph is NP -hard, and Spectral clustering only approximates the optimal cuts by using the fact that the k eigenvectors of the Laplacian of the graph, L(G), corresponding to the k smallest eigenvalues, are solutions to a similar but relaxed version of the min-ratiocut or min-ncut problem (for a nice explanation of this fact using Rayleigh s quotient, see [13]). The eigenvectors are used to project the vertices into a k-dimensional space in which we can now use k-means to find clusters. This approach has proven to be quite powerful, finding good clusters in practice, despite the lack of a method to guarantee that the solution of the relaxed min-ratiocut or min-ncut problem will be close to the true solution. Due to the computation involved in finding the top k eigenvectors of a large matrix, however, it is not very efficient. Furthermore, it is another example of a global algorithm which uses all the edges in the graph (which appear in the form of the graph Laplacian) to arrive at a clustering solution. Besides the spectral and geometric approaches, another general approach to graph clustering is the multilevel or hierarchical approach. In this case, the initial, unwieldy graph is first coarsened into a more managable graph or hypergraph. The vertices of the smaller graph often represent sets of vertices and edges, which we may think of as mini-clusters. The vertices of the small graph are then clustered using some reliable clustering algorithm. The small graph is then expanded back to its original size, using heurisitcs to assign vertices at the higher resolution into one of the already existing clusters. In this way, multilevel schemes attempt to circumvent the long running times of standard clustering algorithms on very large graphs and hope that the cluster quality will not suffer much. However, the use of heursitics in the coarsening and expanding phase can significantly hurt cluster quality. Additionally, we again have the problem that finding just one cluster efficiently would prove very difficult using this scheme. A simple and efficient implementation of a local clustering method would undoubtedly become the go-to clustering algorithm in any application requiring the discovery of a good cluster in a large dataset. Often, just one or a few clusters are sufficient to analyze data we may not need to know the cluster assignment of every data point, but we would be interested in finding a few clusters with very high quality. Not only is this possible with a local clustering algorithm, but Spielman and Teng show that if we do indeed desire a good partitioning of the entire graph, a local clustering algorithm can be used as a sub-routine to find a good partition of the graph in nearly-linear time. In the remainder of this paper, I will summarize the theory leading up to the local clustering idea, and I will outline the most important pieces of the Nibble algorithm. Starting with the definition of conductance of a graph, I will relate this concept to random walks and mixing times, and finish with a summary of how the Nibble algorithm exploits the properties of graphs with low conductance to quickly find high-quality clusters in these graphs. Although the following is not a detailed explanation of the local clustering algorithm proposed by Spielman and Teng, I hope it provides the intuition behind Nibble, and emphasizes that the short run time of this clustering algorithm has exciting future applications. 3

4 3 Conductance and Random Walks As mentioned previously, several different metrics are useful in determining cluster quality. The conductance of a cluster is one such metric, and will be explained below. It is not only closely related to the RatioCut and NCut quantities which spectral clustering attempts to minimize, but the conductance of a graph can be used to bound the time that it takes a random walk to converge. In the subsections that follow, I will define conductance and random walks on graphs (sections 3.1 and 3.2), and show how the two are related by discussing advances in the research of random walk mixing times (section 3.3). 3.1 Conductance The conductance φ(s) of a subset of vertices S of V, where G(V, E) is the graph defined by the vertices V and edges E, is defined as: φ(s) = δs min (vol(s), vol(v \ S)) = E(S, V \ S) min (vol(s), vol(v \ S)) = {{v i, v j } v i S, v j V \ S} ( min v i S d i, ) v j V \S d j where δs = E(S, V \ S) is the number of edges with one endpoint in S and one endpoint in V \ S, also known as the boundary set of S. In other words, the conductance is a ratio of the number of edges with one end in S and one end out of S, to the smaller size of the two sides (subsets of vertices) of the cut defined by δs. The size of S or V \ S is measured by the sum of the number of vertices in S (or V \ S) weighted by their degrees. An example showing the conductance of two different subsets of the vertices of a small graph are shown in Figure 1. The conductance of a graph φ G is defined as the minimum conductance over all possible subsets S of the vertices V : φ G = min S V φ(s) A low conductance indicates good cluster quality because it means that few edges connect the cluster to the rest of the graph, and many edges connect vertices inside the cluster to each-other. Finding the conductance of a graph is equivalent to finding the best cluster in the graph, if we rank clusters by their conductance. Unfortunately, this problem is NP-hard. Even finding a subset S with a target conductance φ is an NP-complete problem [1]. This motivates the use of randomized algorithms such as those employed by Spielman and Teng [1], Andersen, Chung, and Lang [2] and Andersen and Peres [4] to look for subsets with low conductance with a high probability. Spielman and Teng s algorithm Nibble was the first algorithm to find a subset with low conductance in time that is proportional to the size of the subset. They use a random walk on a graph starting at an arbitrary vertex, and 4

5 Figure 1: Example of conductance: Suppose all edge weights are 1 If S 1 = {1, 2, 3, 4} then the conductance is φ(s 1 ) = 2 4 = 1 2 {{3,6},{4,5}} min((d 1 +d 2 +d 3 +d 4 ),(d 5 +d 6 )) = 2 min(( ),(2+2)) = {{1,4}{3,4},{3,6}} If S 2 = {1, 2, 3} then the conductance is φ(s 2 ) = min((d 1 +d 2 +d 3 ),(d 4 +d 5 +d 6 )) = 3 min((3+2+4),(3+2+2)) = 3 7 Thus φ(s 1 ) > φ(s 2 ), indicating that S 1 has a better conductance than S 2, i.e. a random walk currently on one of the vertices in S 1 will approach a limiting distribution faster than a random walk that starts in S 2. show that if the vertex is in a subset with low conductance, then with high probability, the random walk will visit a large number of the vertices in the subset containing the vertex in a small number of timesteps. Before moving on to elaborate on the Nibble algorithm, I will discuss random walks on graphs and their convergence properties. 3.2 Random Walks A random walk on a graph is a process which begins at some vertex v i V with initial probability p 0 (i), and at each successive timestep moves to a new vertex with probability defined in the transition probability matrix M. At each step p t = Mp t 1, where p t is a column vector of size V defining the probability distribution over the vertices v i V at time t, and M ji denotes the probability that in one timestep, the random walk will move from vertex i to vertex j. Mij t gives the probability that in exactly t timesteps, the random walk has moved from vertex i to vertex j. In [1], Spielman and Teng define the transition matrix in the following way: M ij = 1 2, i = j, v i V 1 i j, v i V 0 v i V 2d j, In other words, for an unweighted, undirected graph G, M = (AD 1 + I)/2, where A is the adjacency matrix, D is the diagonal degree matrix such that D ii = d(v i ), and I is the identity matrix. Note that multiplying A on the right by D is a column-normalization of the adjacency matrix, and adding I to this product introduces self-loops, forcing the probability that a walk stays at the same 5

6 vertex in any timestep to be positive. Some authors refer to this type of random walk as a lazy random walk [5], because at each timestep, the walk will remain at the same vertex with probability 1 2. An important property of random walks on undirected graphs (more specifically, this property holds for aperiodic 1, irreducible 2 Markov chains, which can be represented by such graphs) is that the probability distribution p t will approach a limit known as the stationary distribution as t [10]. The stationary distribution, which I will call π, is defined by the following rule: Mπ = π Therefore, once p t has become the stationary distribution, it will remain the same distribution forever: if we have the probability distribution π at time t, this means that the probability π i that we are at vertex v i at timestep t of the random walk will remain exactly the same for all time s t. For an example illustrating a random walk on a graph and its stationary distribution, please refer to Figure 2. Although bounds on the convergence rate of p t to π do exist for graphs with a specific structure or with some nice properties, much research in the past decade has been devoted to the study of the convergence rate for more general graphs. Various texts refer to mixing rate of a Markov chain when describing the convergence rate of p t. The mixing rate is defined as either the rate at which p t approaches π, or the number of steps before p t is within a distance ɛ of π. The rate at which p t π goes to 0 as t goes to is often used to quantify the mixing rate. Perhaps not surprisingly, the mixing rate has been found to be inversely proportional to the conductance of a graph. 3.3 Mixing Time and conductance We may conjecture that the probability distribution p t of a random walk on a graph containing a subset of vertices with small conductance would converge slower to the stationary distribution than a random walk on a graph with no such set of vertices. Intuitively, we presume that a random walk on a graph with high conductance φ G is unlikely to get stuck in a cluster for a long period of time, ensuring that it quickly reaches the stationary probability distribution. A high φ G means that for all sets S V, either the number of boundary edges of S, those crossing from S to V \S, is high, or the size of one of the pieces of the cut is small there are no highly clustered sets S V which have a relatively large size. Conversely, we might assume that if a random walk is started from a vertex in a subset S of low conductance, then it will likely remain in the set for a time before moving to a vertex outside of the subset, thus slowing convergence to the stationary distribution. These assumptions are indeed correct, proven by various methods and authors since the late 1980 s, and the proofs linking conductance to mixing rate of random walks ultimately led Spielman and Teng to the idea of local clustering. 1 The period of a vertex, or in general a state, in a Markov chain is the greatest common divisor of the set of times at which it is possible for a chain to return to its original state, i.e. Mii t > 0 for state (vertex) v i. If all states have period 1, which is true in our case since M ii = 1 to begin with, then Markov chain is aperiodic. 2 2 A Markov chain is irreducible if for any two states/vertices v i and v j, there exists a time t such that Mij t > 0 in other words, it is possible to reach any state from any other state at some point in time.[5] 6

7 The first intuition, that random walks on graphs with high conductance converge quickly to their limiting distributions, has been known for over 20 years. In 1970, Jeff Cheeger introduced an inequality that relates an isoperimetric constant of an n-dimensional closed Riemannian manifold M to the smallest positive eigenvalue of the Laplacian on M: λ 2 (L(M)) 1 4 ( ) S(E) 2 inf E min(v (A), V (B)) Discrete versions of this Cheeger inequality were proven by several different authors independently in the late 1980 s [8]. The discrete versions relate the conductance of a graph φ G to the smallest non-zero eigenvalue λ 2 of the normalized Laplacian D 1 2 LD 1 2 of the graph, revealing that a small λ 2 implies that a graph contains a set of vertices with low conductance, and that a high λ 2 indicates that a graph is an expander graph with high conductance: 2φ G λ 2 (D 1 2 LD 1 2 ) φ 2 G 2 If we have an undirected graph, the symmetric matrix L will poses an orthogonal basis of eigenvectors, and thus we can use λ 2 to bound the convergence of p t to π if it can be written in terms of this basis. A nice combinatorial argument strengthening this fact was given by Milena Mihail [3] in Mihail first shows that the distance of a distribution p t of a random walk from the stationary distribution decreases proportionally to the cutset expansion α of a graph. More precisely, if we take e t to be the vector e t = p t π at step t of a random walk, Mihail proves that for any initial distribution p 0, where e t e t+1 α 2 e t (1) α = 1 δs min 2d S V : S V S 2 Note that the cutset expansion α differs from conductance φ G by taking a minimum over the fraction of boundary set edges divided by the cardinality (the number of vertices) instead of division by the volume (the sum of the degrees of the vertices) of a set S. Mihail considers the effect of a probability charge f distributed over the vertices the charge is simply defined as an assignment of vertex to a real number, and a probability charge is one such that the sum of the charges is 0. She shows that the difference between the norm of a probability charge and the norm of the same charge multiplied by M, the transition probability matrix of a markov chain, is proportional to the fraction of edges with significantly larger charges at their endpoints: f 2 2 Mf d 7 (i,j) E (f i f j ) 2

8 She then shows that since: e t+1 = p t+1 π = Mp t π = Mp t Mπ = Me t and since the error can be viewed as a probability charge because i et i = 0, then the convergence of the error e t is also bounded by this charge distribution. She proves that for graphs with a high cutset expansion α, any placement of charges on the vertices will result in a significant proportion of edges with a large difference in the charges at their endpoints, and that 1 4d (i,j) E (f i f j ) 2 α 2 f 2 2 proving inequality (1). Mihail calls the conductance the weighted analogue to cutset, defines a merging conductance φ M, and using a very similar argument to that given for the proof of (1), shows that e t 2 2 (1 φ G 2 ) t e thereby showing that the convergence rate of p t to π depends on the conductance of a graph. Finally, we arrive at our second intuition that a random walk on a graph which is started inside a moderately large set S V with low conductance should linger inside S for some time before moving to other vertices in the rest of the graph. In 1990, Lovasz and Simonovits [6] defined the µ- conductance φ µ in order to prove that the mixing rate of a random walk on a graph is fast even when small sets of vertices with low conductance exist within the graph. The µ-conductance is defined as: v φ µ i S,v j V \S π im ij min(π(s) µ, π(v \ S µ)) where π is the stationary distribution of the random walk, π(s) = v i S π i, and 0 µ 1 2. Instead of looking at the 2-norm of the error e t 2 2, Lovasz and Simonovits consider the mixing rate in the l 1 metric, and define the rate in terms of π(s). They introduce a function h t (x) to place an upper bound on the rate of convergence of a set of vertices to the distribution on those vertices in the stationary distribution. This function is defined as: h t (x) = max w:0 w i 1, i w iπ i =1 (p t i π i )w i vi V and it is used to prove [12] that for every set of vertices S, and x such that x = v i S d i = vol(s), p t i π i min( x, ( vol(v ) x) 1 1 ) t 2 φ2 G v i S 8

9 The implication of their analysis is that if φ G is large, then every random walk, started in any subset S, will converge quickly. Furthermore, it implies that if the random walk stagnates, then some set in S has low conductance. This is the starting point for Spielman and Teng s local clustering algorithm because a walk started at a vertex in a subset with poor conductance will result in a slow convergence rate, one can find a cut with small conductance from the distributions of the steps of the random walk starting at any vertex from which the walk does not mix rapidly. [1] The authors use a truncated random walk to quickly find such a cluster. 4 Local Clustering Using Random Walks and Conductance The ideas brought forth in Section 3 fit nicely together to provide an intuition for the local clustering algorithm of Spielman and Teng. Using the notions of conductance, random walks, and mixing time, we can now look the Nibble algorithm in greater detail and understand why it succeeds at finding a cluster with good conductance by only looking at a small neighborhood of an input vertex in a graph. 4.1 The Nibble Algorithm Spielman and Teng find a cluster with a maximum input conductance and prove that the size of this cluster µ(c) = v i C d i will be at most 5 6 the size of the vertex set, µ(v ) = v j V d j, taking size to mean volume as defined previously. More importantly, Spielman and Teng show that this cluster can be found in time that is proportional to the size of the cluster. It may then be used as a piece of a randomized graph partitioning algorithm which achieves a balanced (all partitions are of approximately equal size), sparse (a low number of edges are cut) cut of an input graph G in nearly linear time, a feat which was unheard of at the time of their publication. The input to Nibble is a graph G, a starting vertex v 0, a conductance threshold φ, and a parameter b controlling the minimum size of the output cluster. Roughly, the algorithm works as follows: Start a random walk from the vertex v 0 in G. At each step of the walk, truncate all entries in p t that are below a threshold (close to 0), where p t is the distribution over the vertices at step t of the random walk. Call the truncated probability distribution r t, and in the next step set the new probability distribution q t to be: q t = Mr t Find the set of j vertices v i V which maximizes the value of qt i d i. Call this set S j (q t ). That is, we are looking for the set of the first j vertices w i such that qt w i d wi qt w i+1 d wi+1, where w is a permutation of the vertex numbers. If there exists a j such that: the set containing the first j vertices in this order has a conductance lower than the input φ, 9

10 the size of the set is not too small: 2 b v i S j (q t ) d i, or too large: v i S j (q t ) d i 5 6 v i V d i the set contains a large probability mass (here we use a partial derivative with respect to x of a function very closely related to the h t (x) defined by Lovasz and Simonovits) then return this set of j vertices as the output. The output is a cluster which is proven, as stated above, to have a low conductance and to be found in time O(2 b (log 6 m)/φ 4 ). The main idea is to start a random walk from a vertex in a set with low conductance; then, the walk will most likely not converge rapidly but will instead linger on the vertices inside the low-conductance set, and the most highly-clustered vertices in this set will be found in a few steps of the random walk. The authors prove that truncating the random walk in this way will not cause q t to deviate too much from p t, and that if the walk is started in a vertex v 0 that is inside a cluster with low conductance, then the output of Nibble will be a set which intersects a true set with the input conductance on at least 2 b 1 vertices. [1] To better understand the probability mass, I will refer to an example that Spielman [12] uses to show the relationship between φ G and the mixing rate of a random walk. Consider a random walk on a graph with an initial probability distribution p 0 on the set S V with minimal conductance (φ G = φ(s)): v i S,w j / S p 0 i = { d i v j S d j, v i S, 0 v i / V Taking A to be the adjacency matrix of the graph, we have that at the first timestep, the probability that the walk will travel outside of S is: p 1 v i M ij = i S,w j / S A ij v i S d = δs i vol(s) = φ(s) Thus, a low φ(s) indicates that less probability mass will be distributed on vertices laying outside of S. Spielman states that in successive steps, even less probability mass will land outside of S. Therefore, more steps are required to reach all the vertices outside of S, and therefore more time is required to reach the limiting distribution π. In the Nibble algorithm, requiring that the probability mass of the returned set is large is essentially checking that the random walk has stayed inside a set of low conductance with high probability. 4.2 Application to Graph Partitioning Spielman and Teng use Nibble as a procedure within a randomized algorithm which achieves a good partitioning of all the vertices in a graph in time O(m log(1/p) log 7 m/θ 4 ), where m is the number of edges in the graph, θ is a bound on the conductance of the partition, and p dictates the probability } 10

11 that the partition is of a good size. Andersen, Chung, and Lang [2] closely follow Spielman and Teng s approach to derive a local clustering and fast partitioning algorithm, the notable difference being that they prove bounds on the convergence of pagerank vectors, instead of the probability distribution of a Markov chain, using the conductance of a graph. Andersen and Peres local partitioning algorithm [4] uses an evolving set process, which is a Markov chain on a sets of vertices to quickly find sets of small conductance, and their method is thus far the best local partitioning algorithm [8], with a runtime of O(φ 1 2 log O(1) n) times the number of vertices in the output set. 5 Conclusion In this survey, I have given an introduction to the theory and intuition underlying local clustering algorithms. Unlike traditional approaches to clustering, which attempt to minimize a global metric designed to compare different clusterings of the entire graph, these algorithms find a cluster near an input vertex by only looking at a small neighborhood of that vertex within the graph. It is rather remarkable that even with a very limited view of a graph, we may find a good cluster of vertices. Spielman and Teng proved that we can do so by considering the relationship between a random walk on a graph and the quality of a cluster within the graph. By looking at only the vertices that we will visit on a random walk from an initial vertex with high probability, we can find a cluster which achieves a given conductance bound, because the random walk is likely to stay within or near the cluster for a number of timesteps before moving out toward the rest of the graph. Even better, we can use local clustering algorithms to quickly find good graph partitions. Given the advances in clustering and partitioning algorithms enabled by random walk techniques, I expect that we will soon see a easily-implementable, random-walk-based algorithm becoming the clustering algorithm of choice for today s data-intensive applications. References [1] Daniel A. Spielman, Shang-Hua Teng. A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning. CoRR, abs/ [2] Reid Andersen, Fan Chung, Kevin Lang. Local Graph Partitioning using PageRank Vectors In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS) IEEE Computer Society Press, Washington, DC, USA [3] M. Mihail Conductance and convergence of Markov chains-a combinatorial treatment of expanders In Proc. Of 30th FOCS, pp [4] Reid Andersen, Yuval Peres.Finding Sparse Cuts Locally Using Evolving Sets In STOC 09: Proceedings of the 41st annual ACM symposium on Theory of computing, pp , New York, NY, USA

12 [5] David A. Levin, Yuval Peres, Elizabeth L. Wilmer. Markov Chains and Mixing Times: Chapter 1, Chapter 4. [6] Lovasz, L. and Simonovits, M. The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume In FOCS, pp [7] Daniel A. Spielman and Shang-Hua Teng. Spectral Partitioning Works [8] Daniel A. Speilman. Algorithms, Graph Theory, and Linear Equations in Laplacian Matrices In Proceedings of the International Congress of Mathematics, Hyderabad, India [9] N. Alon. Eigenvalues and Expanders. In Combinatorica, Vol. 6(2), pp [10] Laszlo Lovasz. Random Walks on Graphs: A Survey. In Combinatorics, Paul Erdos is Eighty, Vol. 2. pp [11] Maurice de Kunder. The Size of the World Wide Web. WorldWideWebSize.com. Published Accessed 10 June [12] Daniel A. Spielman. Spectral Graph Theory, Fall 2012 Course Notes Published Accessed May-June [13] Ulrike von Luxburg. A Tutorial on Spectral Clustering. In Statistics and Computing, Vol.17(4)

13 Figure 2: Example of a random walk on a graph started at vertex 1 (suppose all edge weights are 1). At the first timestep, we multiply p 1 = Mp 0, getting a probability distribution over the vertices, where p 1 i is the probability that the random walk is currently at vertex i and M = 1 2 (AD 1 +I) is the 10x10 transition probability matrix. The orange vertices in the top part of the figure illustrate all vertices reachable in the first step of the random walk, and the fractions on edges represent the probabilities of moving to each vertex in the first timestep. Eventually, the probability distribution converges to the stationary distribution in the bottom part of the figure, where the walk has a positive probability of being at any vertex, but is always more likely to be on vertices 7,8,9, and 10 than vertices

Graph Partitioning Algorithms

Graph Partitioning Algorithms Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 03.03.2014 Leonid E. Zhukov (HSE) Lecture 8