Local Graph Clustering and Applications to Graph Partitioning

Size: px
Start display at page:

Download "Local Graph Clustering and Applications to Graph Partitioning"

Transcription

1 Local Graph Clustering and Applications to Graph Partitioning Veronika Strnadova June 12, Introduction Today there exist innumerable applications which require the analysis of some type of large graph. Social networks useful to social scientists, protein interaction and neural networks arising in biological problems, as well as geometric meshes used in many diverse scientific applications, have grown, thanks to the availability and abundance of data, to tremendous proportions. Besides the sheer size of some of these datasets the world wide web, for example, is estimated to contain at least 4.75 billion pages [11] the analysis of even moderately large networks, on the order of tens of thousands of vertices, poses a significant challenge when researches need information beyond basic metrics such as the degree distribution. The NP-complete problem of finding an optimal clustering of vertices within one of these networks, for example, has been under investigation for over a decade by various scientific communities, including data mining, computational biology, and combinatorics. One approach to deal with these problems is to partition such networks, cutting them into smaller, more manageable pieces which may be processed in parallel. However, partitioning large graphs is a computationally intensive problem in itself few methods exist which can partition a graph with n vertices and m edges in time that is close to even O(n 2 ) or O(m). A breakthrough in recent years has been the advent of local methods for graph partitioning, achieving a time complexity that is close to linear in the number of edges. The first of these methods was made possible by a local clustering algorithm called Nibble described in Spielman and Teng s A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning [1]. Nibble attempts to minimize a clustering quality metric known as the conductance, a ratio of the number of external to internal edges in a cluster, and, given a starting vertex, provably finds a cluster containing that vertex in time that is proportional to the size of the cluster. Finding a cluster in time proportional to its size is an extremely valuable routine in itself, and the authors show how Nibble can be used as a subroutine to repeatedly remove small clusters from a large graph in order to obtain a nearly-linear time graph partitioning algorithm. Before diving into the details of local clustering methods, I will first give a background summary of some traditional clustering and partitioning approaches that have been popular in the past decade. 1

2 2 Background To properly appreciate local clustering algorithms, we must consider the competition. Popular clustering algorithms in recent years have generally taken on one of several different flavors, depending on both the type of data available and the quality of the clusters desired. In all cases, we are given a graph G(V, E) and we would like to find clusters in the graph. Cluster quality may be defined in a number of different ways, but often it is a ratio relating the number of edges which join two vertices within a cluster to the number of edges joining one vertex inside the cluster and one vertex outside the cluster. If we are given geometric coordinates for vertices, we can rely on algorithms such as k-means to produce reliable clusters which find sets of vertices that are close together in some geometric space. The k-means algorithm finds k clusters by iteratively choosing the geometric mean of each current cluster, and then reassigning points to their closest means. The algorithm converges when there are no (or few) changes to the cluster assignments. k-means has many variants, and today there exist randomized algorithms that reduce the number of iterations to convergence and even avoid explicitly specifying k at the outset. However, no known algorithm k-means-like algorithm can compute just one of these clusters quickly it is a global algorithm which looks at all the vertices in the graph at once. Finding the optimal k clusters of n points in a geometric space is already NP-hard, and thus the traditional k-means algorithm can take exponential time. To make the problem harder, if we do not have geometric information about vertices (data points), even the simple k-means algorithm cannot be applied. In this case, a form of spectral clustering is often useful. Spectral clustering is so called because it is an algorithm which uses the eigenvectors of the Laplacian matrix of a graph to approximate the optimal cut of that graph, separating vertices into clusters which share few edges with other clusters but contain many intra-cluster edges. The objective of spectral clustering is to minimize a quantity known as the RatioCut or NCut of k subsets of vertices A 1... A k. These quantities are defined in [13] and reprinted below: RatioCut(A 1,..., A k ) = 1 2 k i=1 W (A i, Āi) A i NCut(A 1,..., A k ) = 1 2 k i=1 W (A i, Āi) V ol(a i ) Here V ol(a i ) is the sum of the degrees of the vertices in subset A i, and W (A i, Āi) is the number of edges with one endpoint in A i and one endpoint outside of A i, weighted by the edge weights. We can see that minimizing either quantity attempts to separate vertices into clusters which are approximately balanced (the number of vertices or edges within A i should be high) and which do 2

3 not share many edges with other. Again, finding such a k-partitioning of a graph is NP -hard, and Spectral clustering only approximates the optimal cuts by using the fact that the k eigenvectors of the Laplacian of the graph, L(G), corresponding to the k smallest eigenvalues, are solutions to a similar but relaxed version of the min-ratiocut or min-ncut problem (for a nice explanation of this fact using Rayleigh s quotient, see [13]). The eigenvectors are used to project the vertices into a k-dimensional space in which we can now use k-means to find clusters. This approach has proven to be quite powerful, finding good clusters in practice, despite the lack of a method to guarantee that the solution of the relaxed min-ratiocut or min-ncut problem will be close to the true solution. Due to the computation involved in finding the top k eigenvectors of a large matrix, however, it is not very efficient. Furthermore, it is another example of a global algorithm which uses all the edges in the graph (which appear in the form of the graph Laplacian) to arrive at a clustering solution. Besides the spectral and geometric approaches, another general approach to graph clustering is the multilevel or hierarchical approach. In this case, the initial, unwieldy graph is first coarsened into a more managable graph or hypergraph. The vertices of the smaller graph often represent sets of vertices and edges, which we may think of as mini-clusters. The vertices of the small graph are then clustered using some reliable clustering algorithm. The small graph is then expanded back to its original size, using heurisitcs to assign vertices at the higher resolution into one of the already existing clusters. In this way, multilevel schemes attempt to circumvent the long running times of standard clustering algorithms on very large graphs and hope that the cluster quality will not suffer much. However, the use of heursitics in the coarsening and expanding phase can significantly hurt cluster quality. Additionally, we again have the problem that finding just one cluster efficiently would prove very difficult using this scheme. A simple and efficient implementation of a local clustering method would undoubtedly become the go-to clustering algorithm in any application requiring the discovery of a good cluster in a large dataset. Often, just one or a few clusters are sufficient to analyze data we may not need to know the cluster assignment of every data point, but we would be interested in finding a few clusters with very high quality. Not only is this possible with a local clustering algorithm, but Spielman and Teng show that if we do indeed desire a good partitioning of the entire graph, a local clustering algorithm can be used as a sub-routine to find a good partition of the graph in nearly-linear time. In the remainder of this paper, I will summarize the theory leading up to the local clustering idea, and I will outline the most important pieces of the Nibble algorithm. Starting with the definition of conductance of a graph, I will relate this concept to random walks and mixing times, and finish with a summary of how the Nibble algorithm exploits the properties of graphs with low conductance to quickly find high-quality clusters in these graphs. Although the following is not a detailed explanation of the local clustering algorithm proposed by Spielman and Teng, I hope it provides the intuition behind Nibble, and emphasizes that the short run time of this clustering algorithm has exciting future applications. 3

4 3 Conductance and Random Walks As mentioned previously, several different metrics are useful in determining cluster quality. The conductance of a cluster is one such metric, and will be explained below. It is not only closely related to the RatioCut and NCut quantities which spectral clustering attempts to minimize, but the conductance of a graph can be used to bound the time that it takes a random walk to converge. In the subsections that follow, I will define conductance and random walks on graphs (sections 3.1 and 3.2), and show how the two are related by discussing advances in the research of random walk mixing times (section 3.3). 3.1 Conductance The conductance φ(s) of a subset of vertices S of V, where G(V, E) is the graph defined by the vertices V and edges E, is defined as: φ(s) = δs min (vol(s), vol(v \ S)) = E(S, V \ S) min (vol(s), vol(v \ S)) = {{v i, v j } v i S, v j V \ S} ( min v i S d i, ) v j V \S d j where δs = E(S, V \ S) is the number of edges with one endpoint in S and one endpoint in V \ S, also known as the boundary set of S. In other words, the conductance is a ratio of the number of edges with one end in S and one end out of S, to the smaller size of the two sides (subsets of vertices) of the cut defined by δs. The size of S or V \ S is measured by the sum of the number of vertices in S (or V \ S) weighted by their degrees. An example showing the conductance of two different subsets of the vertices of a small graph are shown in Figure 1. The conductance of a graph φ G is defined as the minimum conductance over all possible subsets S of the vertices V : φ G = min S V φ(s) A low conductance indicates good cluster quality because it means that few edges connect the cluster to the rest of the graph, and many edges connect vertices inside the cluster to each-other. Finding the conductance of a graph is equivalent to finding the best cluster in the graph, if we rank clusters by their conductance. Unfortunately, this problem is NP-hard. Even finding a subset S with a target conductance φ is an NP-complete problem [1]. This motivates the use of randomized algorithms such as those employed by Spielman and Teng [1], Andersen, Chung, and Lang [2] and Andersen and Peres [4] to look for subsets with low conductance with a high probability. Spielman and Teng s algorithm Nibble was the first algorithm to find a subset with low conductance in time that is proportional to the size of the subset. They use a random walk on a graph starting at an arbitrary vertex, and 4

5 Figure 1: Example of conductance: Suppose all edge weights are 1 If S 1 = {1, 2, 3, 4} then the conductance is φ(s 1 ) = 2 4 = 1 2 {{3,6},{4,5}} min((d 1 +d 2 +d 3 +d 4 ),(d 5 +d 6 )) = 2 min(( ),(2+2)) = {{1,4}{3,4},{3,6}} If S 2 = {1, 2, 3} then the conductance is φ(s 2 ) = min((d 1 +d 2 +d 3 ),(d 4 +d 5 +d 6 )) = 3 min((3+2+4),(3+2+2)) = 3 7 Thus φ(s 1 ) > φ(s 2 ), indicating that S 1 has a better conductance than S 2, i.e. a random walk currently on one of the vertices in S 1 will approach a limiting distribution faster than a random walk that starts in S 2. show that if the vertex is in a subset with low conductance, then with high probability, the random walk will visit a large number of the vertices in the subset containing the vertex in a small number of timesteps. Before moving on to elaborate on the Nibble algorithm, I will discuss random walks on graphs and their convergence properties. 3.2 Random Walks A random walk on a graph is a process which begins at some vertex v i V with initial probability p 0 (i), and at each successive timestep moves to a new vertex with probability defined in the transition probability matrix M. At each step p t = Mp t 1, where p t is a column vector of size V defining the probability distribution over the vertices v i V at time t, and M ji denotes the probability that in one timestep, the random walk will move from vertex i to vertex j. Mij t gives the probability that in exactly t timesteps, the random walk has moved from vertex i to vertex j. In [1], Spielman and Teng define the transition matrix in the following way: M ij = 1 2, i = j, v i V 1 i j, v i V 0 v i V 2d j, In other words, for an unweighted, undirected graph G, M = (AD 1 + I)/2, where A is the adjacency matrix, D is the diagonal degree matrix such that D ii = d(v i ), and I is the identity matrix. Note that multiplying A on the right by D is a column-normalization of the adjacency matrix, and adding I to this product introduces self-loops, forcing the probability that a walk stays at the same 5

6 vertex in any timestep to be positive. Some authors refer to this type of random walk as a lazy random walk [5], because at each timestep, the walk will remain at the same vertex with probability 1 2. An important property of random walks on undirected graphs (more specifically, this property holds for aperiodic 1, irreducible 2 Markov chains, which can be represented by such graphs) is that the probability distribution p t will approach a limit known as the stationary distribution as t [10]. The stationary distribution, which I will call π, is defined by the following rule: Mπ = π Therefore, once p t has become the stationary distribution, it will remain the same distribution forever: if we have the probability distribution π at time t, this means that the probability π i that we are at vertex v i at timestep t of the random walk will remain exactly the same for all time s t. For an example illustrating a random walk on a graph and its stationary distribution, please refer to Figure 2. Although bounds on the convergence rate of p t to π do exist for graphs with a specific structure or with some nice properties, much research in the past decade has been devoted to the study of the convergence rate for more general graphs. Various texts refer to mixing rate of a Markov chain when describing the convergence rate of p t. The mixing rate is defined as either the rate at which p t approaches π, or the number of steps before p t is within a distance ɛ of π. The rate at which p t π goes to 0 as t goes to is often used to quantify the mixing rate. Perhaps not surprisingly, the mixing rate has been found to be inversely proportional to the conductance of a graph. 3.3 Mixing Time and conductance We may conjecture that the probability distribution p t of a random walk on a graph containing a subset of vertices with small conductance would converge slower to the stationary distribution than a random walk on a graph with no such set of vertices. Intuitively, we presume that a random walk on a graph with high conductance φ G is unlikely to get stuck in a cluster for a long period of time, ensuring that it quickly reaches the stationary probability distribution. A high φ G means that for all sets S V, either the number of boundary edges of S, those crossing from S to V \S, is high, or the size of one of the pieces of the cut is small there are no highly clustered sets S V which have a relatively large size. Conversely, we might assume that if a random walk is started from a vertex in a subset S of low conductance, then it will likely remain in the set for a time before moving to a vertex outside of the subset, thus slowing convergence to the stationary distribution. These assumptions are indeed correct, proven by various methods and authors since the late 1980 s, and the proofs linking conductance to mixing rate of random walks ultimately led Spielman and Teng to the idea of local clustering. 1 The period of a vertex, or in general a state, in a Markov chain is the greatest common divisor of the set of times at which it is possible for a chain to return to its original state, i.e. Mii t > 0 for state (vertex) v i. If all states have period 1, which is true in our case since M ii = 1 to begin with, then Markov chain is aperiodic. 2 2 A Markov chain is irreducible if for any two states/vertices v i and v j, there exists a time t such that Mij t > 0 in other words, it is possible to reach any state from any other state at some point in time.[5] 6

7 The first intuition, that random walks on graphs with high conductance converge quickly to their limiting distributions, has been known for over 20 years. In 1970, Jeff Cheeger introduced an inequality that relates an isoperimetric constant of an n-dimensional closed Riemannian manifold M to the smallest positive eigenvalue of the Laplacian on M: λ 2 (L(M)) 1 4 ( ) S(E) 2 inf E min(v (A), V (B)) Discrete versions of this Cheeger inequality were proven by several different authors independently in the late 1980 s [8]. The discrete versions relate the conductance of a graph φ G to the smallest non-zero eigenvalue λ 2 of the normalized Laplacian D 1 2 LD 1 2 of the graph, revealing that a small λ 2 implies that a graph contains a set of vertices with low conductance, and that a high λ 2 indicates that a graph is an expander graph with high conductance: 2φ G λ 2 (D 1 2 LD 1 2 ) φ 2 G 2 If we have an undirected graph, the symmetric matrix L will poses an orthogonal basis of eigenvectors, and thus we can use λ 2 to bound the convergence of p t to π if it can be written in terms of this basis. A nice combinatorial argument strengthening this fact was given by Milena Mihail [3] in Mihail first shows that the distance of a distribution p t of a random walk from the stationary distribution decreases proportionally to the cutset expansion α of a graph. More precisely, if we take e t to be the vector e t = p t π at step t of a random walk, Mihail proves that for any initial distribution p 0, where e t e t+1 α 2 e t (1) α = 1 δs min 2d S V : S V S 2 Note that the cutset expansion α differs from conductance φ G by taking a minimum over the fraction of boundary set edges divided by the cardinality (the number of vertices) instead of division by the volume (the sum of the degrees of the vertices) of a set S. Mihail considers the effect of a probability charge f distributed over the vertices the charge is simply defined as an assignment of vertex to a real number, and a probability charge is one such that the sum of the charges is 0. She shows that the difference between the norm of a probability charge and the norm of the same charge multiplied by M, the transition probability matrix of a markov chain, is proportional to the fraction of edges with significantly larger charges at their endpoints: f 2 2 Mf d 7 (i,j) E (f i f j ) 2

8 She then shows that since: e t+1 = p t+1 π = Mp t π = Mp t Mπ = Me t and since the error can be viewed as a probability charge because i et i = 0, then the convergence of the error e t is also bounded by this charge distribution. She proves that for graphs with a high cutset expansion α, any placement of charges on the vertices will result in a significant proportion of edges with a large difference in the charges at their endpoints, and that 1 4d (i,j) E (f i f j ) 2 α 2 f 2 2 proving inequality (1). Mihail calls the conductance the weighted analogue to cutset, defines a merging conductance φ M, and using a very similar argument to that given for the proof of (1), shows that e t 2 2 (1 φ G 2 ) t e thereby showing that the convergence rate of p t to π depends on the conductance of a graph. Finally, we arrive at our second intuition that a random walk on a graph which is started inside a moderately large set S V with low conductance should linger inside S for some time before moving to other vertices in the rest of the graph. In 1990, Lovasz and Simonovits [6] defined the µ- conductance φ µ in order to prove that the mixing rate of a random walk on a graph is fast even when small sets of vertices with low conductance exist within the graph. The µ-conductance is defined as: v φ µ i S,v j V \S π im ij min(π(s) µ, π(v \ S µ)) where π is the stationary distribution of the random walk, π(s) = v i S π i, and 0 µ 1 2. Instead of looking at the 2-norm of the error e t 2 2, Lovasz and Simonovits consider the mixing rate in the l 1 metric, and define the rate in terms of π(s). They introduce a function h t (x) to place an upper bound on the rate of convergence of a set of vertices to the distribution on those vertices in the stationary distribution. This function is defined as: h t (x) = max w:0 w i 1, i w iπ i =1 (p t i π i )w i vi V and it is used to prove [12] that for every set of vertices S, and x such that x = v i S d i = vol(s), p t i π i min( x, ( vol(v ) x) 1 1 ) t 2 φ2 G v i S 8

9 The implication of their analysis is that if φ G is large, then every random walk, started in any subset S, will converge quickly. Furthermore, it implies that if the random walk stagnates, then some set in S has low conductance. This is the starting point for Spielman and Teng s local clustering algorithm because a walk started at a vertex in a subset with poor conductance will result in a slow convergence rate, one can find a cut with small conductance from the distributions of the steps of the random walk starting at any vertex from which the walk does not mix rapidly. [1] The authors use a truncated random walk to quickly find such a cluster. 4 Local Clustering Using Random Walks and Conductance The ideas brought forth in Section 3 fit nicely together to provide an intuition for the local clustering algorithm of Spielman and Teng. Using the notions of conductance, random walks, and mixing time, we can now look the Nibble algorithm in greater detail and understand why it succeeds at finding a cluster with good conductance by only looking at a small neighborhood of an input vertex in a graph. 4.1 The Nibble Algorithm Spielman and Teng find a cluster with a maximum input conductance and prove that the size of this cluster µ(c) = v i C d i will be at most 5 6 the size of the vertex set, µ(v ) = v j V d j, taking size to mean volume as defined previously. More importantly, Spielman and Teng show that this cluster can be found in time that is proportional to the size of the cluster. It may then be used as a piece of a randomized graph partitioning algorithm which achieves a balanced (all partitions are of approximately equal size), sparse (a low number of edges are cut) cut of an input graph G in nearly linear time, a feat which was unheard of at the time of their publication. The input to Nibble is a graph G, a starting vertex v 0, a conductance threshold φ, and a parameter b controlling the minimum size of the output cluster. Roughly, the algorithm works as follows: Start a random walk from the vertex v 0 in G. At each step of the walk, truncate all entries in p t that are below a threshold (close to 0), where p t is the distribution over the vertices at step t of the random walk. Call the truncated probability distribution r t, and in the next step set the new probability distribution q t to be: q t = Mr t Find the set of j vertices v i V which maximizes the value of qt i d i. Call this set S j (q t ). That is, we are looking for the set of the first j vertices w i such that qt w i d wi qt w i+1 d wi+1, where w is a permutation of the vertex numbers. If there exists a j such that: the set containing the first j vertices in this order has a conductance lower than the input φ, 9

10 the size of the set is not too small: 2 b v i S j (q t ) d i, or too large: v i S j (q t ) d i 5 6 v i V d i the set contains a large probability mass (here we use a partial derivative with respect to x of a function very closely related to the h t (x) defined by Lovasz and Simonovits) then return this set of j vertices as the output. The output is a cluster which is proven, as stated above, to have a low conductance and to be found in time O(2 b (log 6 m)/φ 4 ). The main idea is to start a random walk from a vertex in a set with low conductance; then, the walk will most likely not converge rapidly but will instead linger on the vertices inside the low-conductance set, and the most highly-clustered vertices in this set will be found in a few steps of the random walk. The authors prove that truncating the random walk in this way will not cause q t to deviate too much from p t, and that if the walk is started in a vertex v 0 that is inside a cluster with low conductance, then the output of Nibble will be a set which intersects a true set with the input conductance on at least 2 b 1 vertices. [1] To better understand the probability mass, I will refer to an example that Spielman [12] uses to show the relationship between φ G and the mixing rate of a random walk. Consider a random walk on a graph with an initial probability distribution p 0 on the set S V with minimal conductance (φ G = φ(s)): v i S,w j / S p 0 i = { d i v j S d j, v i S, 0 v i / V Taking A to be the adjacency matrix of the graph, we have that at the first timestep, the probability that the walk will travel outside of S is: p 1 v i M ij = i S,w j / S A ij v i S d = δs i vol(s) = φ(s) Thus, a low φ(s) indicates that less probability mass will be distributed on vertices laying outside of S. Spielman states that in successive steps, even less probability mass will land outside of S. Therefore, more steps are required to reach all the vertices outside of S, and therefore more time is required to reach the limiting distribution π. In the Nibble algorithm, requiring that the probability mass of the returned set is large is essentially checking that the random walk has stayed inside a set of low conductance with high probability. 4.2 Application to Graph Partitioning Spielman and Teng use Nibble as a procedure within a randomized algorithm which achieves a good partitioning of all the vertices in a graph in time O(m log(1/p) log 7 m/θ 4 ), where m is the number of edges in the graph, θ is a bound on the conductance of the partition, and p dictates the probability } 10

11 that the partition is of a good size. Andersen, Chung, and Lang [2] closely follow Spielman and Teng s approach to derive a local clustering and fast partitioning algorithm, the notable difference being that they prove bounds on the convergence of pagerank vectors, instead of the probability distribution of a Markov chain, using the conductance of a graph. Andersen and Peres local partitioning algorithm [4] uses an evolving set process, which is a Markov chain on a sets of vertices to quickly find sets of small conductance, and their method is thus far the best local partitioning algorithm [8], with a runtime of O(φ 1 2 log O(1) n) times the number of vertices in the output set. 5 Conclusion In this survey, I have given an introduction to the theory and intuition underlying local clustering algorithms. Unlike traditional approaches to clustering, which attempt to minimize a global metric designed to compare different clusterings of the entire graph, these algorithms find a cluster near an input vertex by only looking at a small neighborhood of that vertex within the graph. It is rather remarkable that even with a very limited view of a graph, we may find a good cluster of vertices. Spielman and Teng proved that we can do so by considering the relationship between a random walk on a graph and the quality of a cluster within the graph. By looking at only the vertices that we will visit on a random walk from an initial vertex with high probability, we can find a cluster which achieves a given conductance bound, because the random walk is likely to stay within or near the cluster for a number of timesteps before moving out toward the rest of the graph. Even better, we can use local clustering algorithms to quickly find good graph partitions. Given the advances in clustering and partitioning algorithms enabled by random walk techniques, I expect that we will soon see a easily-implementable, random-walk-based algorithm becoming the clustering algorithm of choice for today s data-intensive applications. References [1] Daniel A. Spielman, Shang-Hua Teng. A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning. CoRR, abs/ [2] Reid Andersen, Fan Chung, Kevin Lang. Local Graph Partitioning using PageRank Vectors In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS) IEEE Computer Society Press, Washington, DC, USA [3] M. Mihail Conductance and convergence of Markov chains-a combinatorial treatment of expanders In Proc. Of 30th FOCS, pp [4] Reid Andersen, Yuval Peres.Finding Sparse Cuts Locally Using Evolving Sets In STOC 09: Proceedings of the 41st annual ACM symposium on Theory of computing, pp , New York, NY, USA

12 [5] David A. Levin, Yuval Peres, Elizabeth L. Wilmer. Markov Chains and Mixing Times: Chapter 1, Chapter 4. [6] Lovasz, L. and Simonovits, M. The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume In FOCS, pp [7] Daniel A. Spielman and Shang-Hua Teng. Spectral Partitioning Works [8] Daniel A. Speilman. Algorithms, Graph Theory, and Linear Equations in Laplacian Matrices In Proceedings of the International Congress of Mathematics, Hyderabad, India [9] N. Alon. Eigenvalues and Expanders. In Combinatorica, Vol. 6(2), pp [10] Laszlo Lovasz. Random Walks on Graphs: A Survey. In Combinatorics, Paul Erdos is Eighty, Vol. 2. pp [11] Maurice de Kunder. The Size of the World Wide Web. WorldWideWebSize.com. Published Accessed 10 June [12] Daniel A. Spielman. Spectral Graph Theory, Fall 2012 Course Notes Published Accessed May-June [13] Ulrike von Luxburg. A Tutorial on Spectral Clustering. In Statistics and Computing, Vol.17(4)

13 Figure 2: Example of a random walk on a graph started at vertex 1 (suppose all edge weights are 1). At the first timestep, we multiply p 1 = Mp 0, getting a probability distribution over the vertices, where p 1 i is the probability that the random walk is currently at vertex i and M = 1 2 (AD 1 +I) is the 10x10 transition probability matrix. The orange vertices in the top part of the figure illustrate all vertices reachable in the first step of the random walk, and the fractions on edges represent the probabilities of moving to each vertex in the first timestep. Eventually, the probability distribution converges to the stationary distribution in the bottom part of the figure, where the walk has a positive probability of being at any vertex, but is always more likely to be on vertices 7,8,9, and 10 than vertices

Graph Partitioning Algorithms

Graph Partitioning Algorithms Graph Partitioning Algorithms Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 03.03.2014 Leonid E. Zhukov (HSE) Lecture 8

More information

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6, Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4

More information

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates CSE 51: Design and Analysis of Algorithms I Spring 016 Lecture 11: Clustering and the Spectral Partitioning Algorithm Lecturer: Shayan Oveis Gharan May nd Scribe: Yueqi Sheng Disclaimer: These notes have

More information

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection. CSCI-B609: A Theorist s Toolkit, Fall 016 Sept. 6, 016 Lecture 03: The Sparsest Cut Problem and Cheeger s Inequality Lecturer: Yuan Zhou Scribe: Xuan Dong We will continue studying the spectral graph theory

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

Spectral Clustering on Handwritten Digits Database

Spectral Clustering on Handwritten Digits Database October 6, 2015 Spectral Clustering on Handwritten Digits Database Danielle dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance

More information

Computing Heat Kernel Pagerank and a Local Clustering Algorithm

Computing Heat Kernel Pagerank and a Local Clustering Algorithm Computing Heat Kernel Pagerank and a Local Clustering Algorithm Fan Chung and Olivia Simpson University of California, San Diego La Jolla, CA 92093 {fan,osimpson}@ucsd.edu Abstract. Heat kernel pagerank

More information

The clustering in general is the task of grouping a set of objects in such a way that objects

The clustering in general is the task of grouping a set of objects in such a way that objects Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory

More information

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 Spectral Clustering Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014 What are we going to talk about? Introduction Clustering and

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010 Finding and Visualizing Graph Clusters Using PageRank Optimization Fan Chung and Alexander Tsiatas, UCSD WAW 2010 What is graph clustering? The division of a graph into several partitions. Clusters should

More information

Modeling web-crawlers on the Internet with random walksdecember on graphs11, / 15

Modeling web-crawlers on the Internet with random walksdecember on graphs11, / 15 Modeling web-crawlers on the Internet with random walks on graphs December 11, 2014 Modeling web-crawlers on the Internet with random walksdecember on graphs11, 2014 1 / 15 Motivation The state of the

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Lecture 6: Spectral Graph Theory I

Lecture 6: Spectral Graph Theory I A Theorist s Toolkit (CMU 18-859T, Fall 013) Lecture 6: Spectral Graph Theory I September 5, 013 Lecturer: Ryan O Donnell Scribe: Jennifer Iglesias 1 Graph Theory For this course we will be working on

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Graph drawing in spectral layout

Graph drawing in spectral layout Graph drawing in spectral layout Maureen Gallagher Colleen Tygh John Urschel Ludmil Zikatanov Beginning: July 8, 203; Today is: October 2, 203 Introduction Our research focuses on the use of spectral graph

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Testing Isomorphism of Strongly Regular Graphs

Testing Isomorphism of Strongly Regular Graphs Spectral Graph Theory Lecture 9 Testing Isomorphism of Strongly Regular Graphs Daniel A. Spielman September 26, 2018 9.1 Introduction In the last lecture we saw how to test isomorphism of graphs in which

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

Clustering in Networks

Clustering in Networks Clustering in Networks (Spectral Clustering with the Graph Laplacian... a brief introduction) Tom Carter Computer Science CSU Stanislaus http://csustan.csustan.edu/ tom/clustering April 1, 2012 1 Our general

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1 Graphs and Networks Page /7: Lecture 8. Clustering Heuristics Wednesday, November 8, 26 8:49 AM Today we will talk about clustering and partitioning in graphs, and sometimes in data sets. Partitioning

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

My favorite application using eigenvalues: partitioning and community detection in social networks

My favorite application using eigenvalues: partitioning and community detection in social networks My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,

More information

Statistical Physics of Community Detection

Statistical Physics of Community Detection Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Application of Spectral Clustering Algorithm

Application of Spectral Clustering Algorithm 1/27 Application of Spectral Clustering Algorithm Danielle Middlebrooks dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance

More information

Local Partitioning using PageRank

Local Partitioning using PageRank Local Partitioning using PageRank Reid Andersen Fan Chung Kevin Lang UCSD, UCSD, Yahoo! What is a local partitioning algorithm? An algorithm for dividing a graph into two pieces. Instead of searching for

More information

Information Networks: PageRank

Information Networks: PageRank Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the

More information

Planar Graphs 2, the Colin de Verdière Number

Planar Graphs 2, the Colin de Verdière Number Spectral Graph Theory Lecture 26 Planar Graphs 2, the Colin de Verdière Number Daniel A. Spielman December 4, 2009 26.1 Introduction In this lecture, I will introduce the Colin de Verdière number of a

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

arxiv: v3 [cs.ds] 15 Dec 2016

arxiv: v3 [cs.ds] 15 Dec 2016 Computing Heat Kernel Pagerank and a Local Clustering Algorithm Fan Chung and Olivia Simpson University of California, San Diego La Jolla, CA 92093 {fan,_osimpson}@ucsd.edu arxiv:1503.03155v3 [cs.ds] 15

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Survey Survey Survey Competition I Out! Preliminary report of

More information

Introduction to spectral clustering

Introduction to spectral clustering Introduction to spectral clustering Vasileios Zografos zografos@isy.liu.se Klas Nordberg klas@isy.liu.se What this course is Basic introduction into the core ideas of spectral clustering Sufficient to

More information

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras Spectral Graph Sparsification: overview of theory and practical methods Yiannis Koutis University of Puerto Rico - Rio Piedras Graph Sparsification or Sketching Compute a smaller graph that preserves some

More information

Modeling and Detecting Community Hierarchies

Modeling and Detecting Community Hierarchies Modeling and Detecting Community Hierarchies Maria-Florina Balcan, Yingyu Liang Georgia Institute of Technology Age of Networks Massive amount of network data How to understand and utilize? Internet [1]

More information

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching

Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching Reducing Directed Max Flow to Undirected Max Flow and Bipartite Matching Henry Lin Division of Computer Science University of California, Berkeley Berkeley, CA 94720 Email: henrylin@eecs.berkeley.edu Abstract

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Stanford University CS359G: Graph Partitioning and Expanders Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about. 1 Overview This class is about the following

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011 Stanford University CS359G: Graph Partitioning and Expanders Handout 8 Luca Trevisan March 3, 20 Lecture 8 In which we prove properties of expander graphs. Quasirandomness of Expander Graphs Recall that

More information

On Covering a Graph Optimally with Induced Subgraphs

On Covering a Graph Optimally with Induced Subgraphs On Covering a Graph Optimally with Induced Subgraphs Shripad Thite April 1, 006 Abstract We consider the problem of covering a graph with a given number of induced subgraphs so that the maximum number

More information

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix.

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix. CSE525: Randomized Algorithms and Probabilistic Analysis Lecture 9 Lecturer: Anna Karlin Scribe: Sonya Alexandrova and Keith Jia 1 Introduction to semidefinite programming Semidefinite programming is linear

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu

More information

The Surprising Power of Belief Propagation

The Surprising Power of Belief Propagation The Surprising Power of Belief Propagation Elchanan Mossel June 12, 2015 Why do you want to know about BP It s a popular algorithm. We will talk abut its analysis. Many open problems. Connections to: Random

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

Spectral Clustering and Community Detection in Labeled Graphs

Spectral Clustering and Community Detection in Labeled Graphs Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

Cache-Oblivious Traversals of an Array s Pairs

Cache-Oblivious Traversals of an Array s Pairs Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

CS264: Homework #4. Due by midnight on Wednesday, October 22, 2014

CS264: Homework #4. Due by midnight on Wednesday, October 22, 2014 CS264: Homework #4 Due by midnight on Wednesday, October 22, 2014 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Turn in your solutions

More information

Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems

Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems Wenxing Zhu, Chuanyin Guo Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou

More information

The Simplex Algorithm

The Simplex Algorithm The Simplex Algorithm Uri Feige November 2011 1 The simplex algorithm The simplex algorithm was designed by Danzig in 1947. This write-up presents the main ideas involved. It is a slight update (mostly

More information

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003 CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

Lesson 2 7 Graph Partitioning

Lesson 2 7 Graph Partitioning Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:

More information

SGT Toolbox A Playground for Spectral Graph Theory

SGT Toolbox A Playground for Spectral Graph Theory Distributed Computing SGT Toolbox A Playground for Spectral Graph Theory Master s Thesis Michael König mikoenig@student.ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Online Facility Location

Online Facility Location Online Facility Location Adam Meyerson Abstract We consider the online variant of facility location, in which demand points arrive one at a time and we must maintain a set of facilities to service these

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Local higher-order graph clustering

Local higher-order graph clustering Local higher-order graph clustering Hao Yin Stanford University yinh@stanford.edu Austin R. Benson Cornell University arb@cornell.edu Jure Leskovec Stanford University jure@cs.stanford.edu David F. Gleich

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

Available online at ScienceDirect. Procedia Computer Science 20 (2013 )

Available online at  ScienceDirect. Procedia Computer Science 20 (2013 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 20 (2013 ) 522 527 Complex Adaptive Systems, Publication 3 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri

More information

Combinatorial Gems. Po-Shen Loh. June 2009

Combinatorial Gems. Po-Shen Loh. June 2009 Combinatorial Gems Po-Shen Loh June 2009 Although this lecture does not contain many offical Olympiad problems, the arguments which are used are all common elements of Olympiad problem solving. Some of

More information

V4 Matrix algorithms and graph partitioning

V4 Matrix algorithms and graph partitioning V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Counting the Number of Eulerian Orientations

Counting the Number of Eulerian Orientations Counting the Number of Eulerian Orientations Zhenghui Wang March 16, 011 1 Introduction Consider an undirected Eulerian graph, a graph in which each vertex has even degree. An Eulerian orientation of the

More information

Chapter II. Linear Programming

Chapter II. Linear Programming 1 Chapter II Linear Programming 1. Introduction 2. Simplex Method 3. Duality Theory 4. Optimality Conditions 5. Applications (QP & SLP) 6. Sensitivity Analysis 7. Interior Point Methods 1 INTRODUCTION

More information

The Fibonacci hypercube

The Fibonacci hypercube AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 40 (2008), Pages 187 196 The Fibonacci hypercube Fred J. Rispoli Department of Mathematics and Computer Science Dowling College, Oakdale, NY 11769 U.S.A. Steven

More information

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network

A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network A Comparison of Pattern-Based Spectral Clustering Algorithms in Directed Weighted Network Sumuya Borjigin 1. School of Economics and Management, Inner Mongolia University, No.235 West College Road, Hohhot,

More information

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec. Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming

More information

Investigating Mixed-Integer Hulls using a MIP-Solver

Investigating Mixed-Integer Hulls using a MIP-Solver Investigating Mixed-Integer Hulls using a MIP-Solver Matthias Walter Otto-von-Guericke Universität Magdeburg Joint work with Volker Kaibel (OvGU) Aussois Combinatorial Optimization Workshop 2015 Outline

More information

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful Karl Gutwin May 15, 2005 18.336 A Parallel Implementation of a Higher-order Self Consistent Mean Field Effectively solving the protein repacking problem is a key step to successful protein design. Put

More information

Edge intersection graphs. of systems of grid paths. with bounded number of bends

Edge intersection graphs. of systems of grid paths. with bounded number of bends Edge intersection graphs of systems of grid paths with bounded number of bends Andrei Asinowski a, Andrew Suk b a Caesarea Rothschild Institute, University of Haifa, Haifa 31905, Israel. b Courant Institute,

More information

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

Gene expression & Clustering (Chapter 10)

Gene expression & Clustering (Chapter 10) Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching

More information

A Recommender System Based on Local Random Walks and Spectral Methods

A Recommender System Based on Local Random Walks and Spectral Methods A Recommender System Based on Local Random Walks and Spectral Methods ABSTRACT Zeinab Abbassi Department of Computer Science,UBC 201 2366 Main Mall Vancouver, Canada zeinab@cs.ubc.ca In this paper, we

More information

CS229 Final Project: k-means Algorithm

CS229 Final Project: k-means Algorithm CS229 Final Project: k-means Algorithm Colin Wei and Alfred Xue SUNet ID: colinwei axue December 11, 2014 1 Notes This project was done in conjuction with a similar project that explored k-means in CS

More information

REGULAR GRAPHS OF GIVEN GIRTH. Contents

REGULAR GRAPHS OF GIVEN GIRTH. Contents REGULAR GRAPHS OF GIVEN GIRTH BROOKE ULLERY Contents 1. Introduction This paper gives an introduction to the area of graph theory dealing with properties of regular graphs of given girth. A large portion

More information

12.1 Formulation of General Perfect Matching

12.1 Formulation of General Perfect Matching CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Perfect Matching Polytope Date: 22/02/2008 Lecturer: Lap Chi Lau Scribe: Yuk Hei Chan, Ling Ding and Xiaobing Wu In this lecture,

More information

Pebble Sets in Convex Polygons

Pebble Sets in Convex Polygons 2 1 Pebble Sets in Convex Polygons Kevin Iga, Randall Maddox June 15, 2005 Abstract Lukács and András posed the problem of showing the existence of a set of n 2 points in the interior of a convex n-gon

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Exact Algorithms Lecture 7: FPT Hardness and the ETH Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

A Reduction of Conway s Thrackle Conjecture

A Reduction of Conway s Thrackle Conjecture A Reduction of Conway s Thrackle Conjecture Wei Li, Karen Daniels, and Konstantin Rybnikov Department of Computer Science and Department of Mathematical Sciences University of Massachusetts, Lowell 01854

More information

Disjoint directed cycles

Disjoint directed cycles Disjoint directed cycles Noga Alon Abstract It is shown that there exists a positive ɛ so that for any integer k, every directed graph with minimum outdegree at least k contains at least ɛk vertex disjoint

More information

6.856 Randomized Algorithms

6.856 Randomized Algorithms 6.856 Randomized Algorithms David Karger Handout #4, September 21, 2002 Homework 1 Solutions Problem 1 MR 1.8. (a) The min-cut algorithm given in class works because at each step it is very unlikely (probability

More information

Research Interests Optimization:

Research Interests Optimization: Mitchell: Research interests 1 Research Interests Optimization: looking for the best solution from among a number of candidates. Prototypical optimization problem: min f(x) subject to g(x) 0 x X IR n Here,

More information