On the Approximability of Modularity Clustering

Similar documents
Maximizing edge-ratio is NP-complete

On the Min-Max 2-Cluster Editing Problem

6 Randomized rounding of semidefinite programs

Approximate Correlation Clustering using Same-Cluster Queries

Lecture Note: Computation problems in social. network analysis

Modularity CMSC 858L

A Survey of Correlation Clustering

Algebraic method for Shortest Paths problems

Extracting Communities from Networks

Decision Problems. Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not.

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)

On Covering a Graph Optimally with Induced Subgraphs

Computing Largest Correcting Codes and Their Estimates Using Optimization on Specially Constructed Graphs p.1/30

GRAPH THEORY and APPLICATIONS. Factorization Domination Indepence Clique

Community Structure and Beyond

The complement of PATH is in NL

NP-complete Reductions

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

On Finding Dense Subgraphs

Graphs. Graph G = (V, E) Types of graphs E = O( V 2 ) V = set of vertices E = set of edges (V V)

Small Survey on Perfect Graphs

Introduction to Graph Theory

COMP 355 Advanced Algorithms Approximation Algorithms: VC and TSP Chapter 11 (KT) Section (CLRS)

Extremal Graph Theory. Ajit A. Diwan Department of Computer Science and Engineering, I. I. T. Bombay.

On Approximating Minimum Vertex Cover for Graphs with Perfect Matching

Community Detection. Community

Math 776 Graph Theory Lecture Note 1 Basic concepts

arxiv: v1 [physics.data-an] 27 Sep 2007

Chapter 11. Network Community Detection

1 More configuration model

Best known solution time is Ω(V!) Check every permutation of vertices to see if there is a graph edge between adjacent vertices

Coloring 3-colorable Graphs

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix.

The Ordered Covering Problem

The k-center problem Approximation Algorithms 2009 Petros Potikas

TELCOM2125: Network Science and Analysis

Some graph theory applications. communications networks

Notes for Lecture 24

Computability Theory

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Social-Network Graphs

The Set Cover with Pairs Problem

Maximum Betweenness Centrality: Approximability and Tractable Cases

CSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation

Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Solutions for the Exam 6 January 2014

MCL. (and other clustering algorithms) 858L

Lecture 10 October 7, 2014

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Spectral Methods for Network Community Detection and Graph Partitioning

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15

An IPS for TQBF Intro to Approximability

MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS

Polynomial-Time Approximation Algorithms

Faster parameterized algorithms for Minimum Fill-In

Algorithm Design and Analysis

2 The Fractional Chromatic Gap

ACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms

Dense and Sparse Graph Partition

Social Data Management Communities

CSC 505, Fall 2000: Week 12

Cluster Editing with Locally Bounded Modifications Revisited

An exact algorithm for max-cut in sparse graphs

Clustering Using Graph Connectivity

Vertex Cover Approximations on Random Graphs.

Network Based Models For Analysis of SNPs Yalta Opt

Algorithmic complexity of two defence budget problems

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

val(y, I) α (9.0.2) α (9.0.3)

Graphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University

Fast Clustering using MapReduce

A Simple Acceleration Method for the Louvain Algorithm

1 More stochastic block model. Pr(G θ) G=(V, E) 1.1 Model definition. 1.2 Fitting the model to data. Prof. Aaron Clauset 7 November 2013

On the Max Coloring Problem

Big Data Management and NoSQL Databases

1 Better Approximation of the Traveling Salesman

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011

Theoretical Computer Science. Testing Eulerianity and connectivity in directed sparse graphs

Cooperative Computing for Autonomous Data Centers

Stanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011

Approximation Algorithms for Item Pricing

1 The Traveling Salesperson Problem (TSP)

CS675: Convex and Combinatorial Optimization Spring 2018 Consequences of the Ellipsoid Algorithm. Instructor: Shaddin Dughmi

Approximation Algorithms

Decreasing the Diameter of Bounded Degree Graphs

Finding bipartite subgraphs efficiently

P vs. NP. Simpsons: Treehouse of Horror VI

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Testing Eulerianity and Connectivity in Directed Sparse Graphs

Optimisation While Streaming

Problem 1. Which of the following is true of functions =100 +log and = + log? Problem 2. Which of the following is true of functions = 2 and =3?

Least Squares; Sequence Alignment

1 Random Graph Models for Networks

Correlation Clustering in General Weighted Graphs

Analysis of an ( )-Approximation Algorithm for the Maximum Edge-Disjoint Paths Problem with Congestion Two

Transcription:

On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607, USA dasgupta@cs.uic.edu July 2, 2011 Joint work with Devendra Desai (Rutgers University) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 1 / 33

1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 2 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 3 / 33

Biological & Social Interaction Networks Clusters/Communities Interaction systems in biology and social science modeled as pairwise interaction graphs nodes are entities edges are interactions between entities Goal: partition nodes into communities or clusters of statistically significant interactions www.fmsasg.com/socialnetworkanalysis/ DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 4 / 33

Unsatisfactory Choices for Communities What are clusters of statistically significant interactions? Unsatisfactory choices in practical applications (too strict, computationally difficult,...) cliques dense subgraphs. DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 5 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 6 / 33

Model Based Clustering Model: define a null model G of a background random graph provides probability p i,j of edge between v i and v j (implicitly or explicitly) v j p i,j v i DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 7 / 33

Model Based Clustering Model: define a null model G of a background random graph provides probability p i,j of edge between v i and v j (implicitly or explicitly) Input graph G: 0 < w i,j 1 normalized weight v j p i,j v i DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 7 / 33

Model Based Clustering Model: define a null model G of a background random graph provides probability p i,j of edge between v i and v j (implicitly or explicitly) Input graph G: 0 < w i,j 1 normalized weight w i,j p i,j is large v j p i,j v i DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 7 / 33

Model Based Clustering Model: define a null model G of a background random graph provides probability p i,j of edge between v i and v j (implicitly or explicitly) Input graph G: 0 < w i,j 1 normalized weight w i,j p i,j is large v j v j p i,j w i,j v i v i {v i, v j } is statistically significant DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 7 / 33

Correlation Clustering as a Model Based Clustering {+, } }-correlation clustering Goal: maximize number of + edges minus number of edges inside clusters e.g. [Bansal, Blum, Chawla, 2002], [Charikar, Guruswami, Wirth, 2003], [Swamy, 2004] given input graph H with each edge labeled as + or DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 8 / 33

Correlation Clustering as a Model Based Clustering {+, } }-correlation clustering Goal: maximize number of + edges minus number of edges inside clusters e.g. [Bansal, Blum, Chawla, 2002], [Charikar, Guruswami, Wirth, 2003], [Swamy, 2004] given input graph H with each edge labeled as + or let G be the graph consisting of all edges labeled + in H a i,j : (i, j) th entry in adjacency matrix) (aa i,j DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 8 / 33

Correlation Clustering as a Model Based Clustering {+, } }-correlation clustering Goal: maximize number of + edges minus number of edges inside clusters e.g. [Bansal, Blum, Chawla, 2002], [Charikar, Guruswami, Wirth, 2003], [Swamy, 2004] given input graph H with each edge labeled as + or let G be the graph consisting of all edges labeled + in H (a a i,j : (i, j) th entry in adjacency matrix) { 0 if the edge was labeled + or missing null model G: p i,j = 1 otherwise i.e., labeled DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 8 / 33

Correlation Clustering as a Model Based Clustering {+, } }-correlation clustering Goal: maximize number of + edges minus number of edges inside clusters e.g. [Bansal, Blum, Chawla, 2002], [Charikar, Guruswami, Wirth, 2003], [Swamy, 2004] given input graph H with each edge labeled as + or let G be the graph consisting of all edges labeled + in H (a a i,j : (i, j) th entry in adjacency matrix) { 0 if the edge was labeled + or missing null model G: p i,j = 1 otherwise i.e., labeled contribution of an edge inside cluster to total score: a i,j p i,j DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 8 / 33

Correlation Clustering as a Model Based Clustering {+, } }-correlation clustering Goal: maximize number of + edges minus number of edges inside clusters e.g. [Bansal, Blum, Chawla, 2002], [Charikar, Guruswami, Wirth, 2003], [Swamy, 2004] given input graph H with each edge labeled as + or let G be the graph consisting of all edges labeled + in H (a a i,j : (i, j) th entry in adjacency matrix) { 0 if the edge was labeled + or missing null model G: p i,j = 1 otherwise i.e., labeled contribution of an edge inside cluster to total score: a i,j p i,j total score: appropriate function of individual scores of edges DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 8 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 9 / 33

Newman s Modularity Clustering Newman s Modularity Clustering A specific model based clustering Extremely popular in practice (in biology, social science, etc.) For example, see (Ravasz et al., Science, 2002) (Newman and Girvan, Physical Review E, 2004) (Newman, Physical Review E, 2004) (Newman, PNAS, 2006) (Guimera et al, Nature Physics, 2007) (Leicht and Newman, Physical Review Letters, 2008) null model dependent on the degree distribution of the input graph can be used for directed/undirected and weighted/unweighted graphs DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 10 / 33

Newman s Modularity Clustering Undirected Graphs Null Model G Input graph G = (V, E) has m edges degree(u) = 3 u u, v V: p u,v = degree(u) degree(v) 2m u = v allowed v degree(v) = 2 m=8 p u,v =3/8 p v,v = 1/4 Expected degree of a node v is precisely degree(v) and, thus, the expected number of edges is m degree(u) degree(v) = degree(u) 2m v V DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 11 / 33

Newman s Modularity Clustering Undirected Graphs Fitness of a cluster (subset of nodes) S V S u 1 u 2 u 3 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 12 / 33

Newman s Modularity Clustering Undirected Graphs Fitness of a cluster (subset of nodes) S V Contribution for an edge {u, v} E: a non-edge{u, v} E: 1 p u,v p u,v S u 1 u 2 u 3 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 12 / 33

Newman s Modularity Clustering Undirected Graphs Fitness of a cluster (subset of nodes) S V Contribution for an edge {u, v} E: a non-edge{u, v} E: combining both cases: 1 p u,v p u,v S u 1 u 2 u 3 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 12 / 33

Newman s Modularity Clustering Undirected Graphs Fitness of a cluster (subset of nodes) S V Contribution for an edge {u, v} E: a non-edge{u, v} E: combining both cases: 1 p u,v p u,v a u,v p u,v = a u,v degree(u) degree(v) 2m u 3 S u 1 u 2 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 12 / 33

Newman s Modularity Clustering Undirected Graphs Fitness of a cluster (subset of nodes) S V Contribution for an edge {u, v} E: a non-edge{u, v} E: combining both cases: 1 p u,v p u,v a u,v p u,v = a u,v degree(u) degree(v) 2m Add for all pairs of nodes in S u 3 S u 1 u 2 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 12 / 33

Newman s Modularity Clustering Undirected Graphs Fitness of a cluster (subset of nodes) S V Contribution for an edge {u, v} E: a non-edge{u, v} E: combining both cases: 1 p u,v p u,v a u,v p u,v = a u,v degree(u) degree(v) 2m Add for all pairs of nodes in S fitness of S M(S) = ( a u,v degree(u) degree(v) ) 2m u,v S u 3 S u 1 u 2 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 12 / 33

Modularity Clustering Undirected Graphs Modularity value of a clustering C C = {V 1, V 2,...,V k } is a partition of V modularity is sum of individual fitnesses (normalized by dividing by 2m to get a value between 0 and 1) M(C) = 1 k 2m M(V i ) i=1 Goal: find a clustering C to maximize M(C) (note: number of clusters k is unspecified) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 13 / 33

Modularity Clustering Undirected Graphs Equivalent Formula for Modularity value (via simple algebraic manipulation) Original modularity M(C) = 1 k 2m ( a u,v degree(u) degree(v) ) 2m i=1 u,v V i Equivalent formula M(C) = ( k i=1 m i m ( Di 2m ) ) 2 m i D i = number of edges whose both endpoints are in V i = sum of degrees of nodes in V i V i V i DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 14 / 33

Modularity Clustering Undirected Graphs Equivalent Formula for Modularity value (via simple algebraic manipulation) Original modularity M(C) = 1 k 2m ( a u,v degree(u) degree(v) ) 2m i=1 u,v V i Yet another equivalent formula M(C) = ( Di D j 2m 2 m ) i,j m V i,v j : i<j m i,j D i = number of edges with one endpoint in V i and another in V j = sum of degrees of nodes in V i V i V i V j DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 14 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 15 / 33

Modularity Clustering Generalization to other types of graphs Generalization to other types of graphs Undirected graphs ) M(C) = 1 u,v V i (a u,v degree(u) degree(v) 2m k i=1 2m DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 16 / 33

Modularity Clustering Generalization to other types of graphs Generalization to other types of graphs Directed graphs M(C) = 1 2m m out-degree k degree (u) in-degree degree (v) a u,v i=1 u,v V i 2m m DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 16 / 33

Modularity Clustering Generalization to other types of graphs Generalization to other types of graphs (Edge)-Weighted undirected graphs M(C) = 1 weighted-degree k 2m degree (u) weighted-degree degree (v) a u,v 2m i=1 u,v V i edge weights are non-negative weighted degree of v is sum of weights of edges incident on v a u,v is the weight of the edge{u, v} m is sum of edge weights DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 16 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 17 / 33

Previously known complexity results OPT = max { M(C)} denotes the maximum modularity value C Previously known complexity results computing OPT is NP-complete for sufficiently dense graphs (Brandes, Delling, Gaertler, Görke, Hoefer, Nikoloski and Wagner, 2007) the reduction roughly requiresω( n) degree for every node NP-completeness result holds even if any solution is constrained to contain no more than two clusters Many results on heuristics and their experimental evaluations As (Agarwal and Kempe, 2008) observed: In spite of its extreme popularity, not much is known about the computational complexity aspect of modularity clustering beyond NP-completeness DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 18 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 19 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 20 / 33

Our main results (undirected graphs) Inapproximability Our main inapproximability results (undirected graphs) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 21 / 33

Our main results (undirected graphs) Inapproximability Our main inapproximability results (undirected graphs) computing OPT is APX-hard for dense graphs (edge-complement of 3-regular graphs) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 21 / 33

Our main results (undirected graphs) Inapproximability Our main inapproximability results (undirected graphs) computing OPT is APX-hard for dense graphs (edge-complement of 3-regular graphs) optimally partitioning into 2 clusters is NP-complete even when the graph is sparse and regular (d-regular for any constant d 9) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 21 / 33

Our main results (undirected graphs) Approximation algorithms Our main approximability results (undirected graphs) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 22 / 33

Our main results (undirected graphs) Approximation algorithms Our main approximability results (undirected graphs) small number of clusters well-approximate OPT in particular, partitioning into two clusters achieves 1 2 OPT OPT 2 1 2 OPT DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 22 / 33

Our main results (undirected graphs) Approximation algorithms Our main approximability results (undirected graphs) small number of clusters well-approximate OPT in particular, partitioning into just two clusters achieves 1 2 OPT OPT 2 1 2 OPT An approximation algorithm whose approximation ratio is logarithmic in the maximum degree (provided, roughly speaking, maximum degree is o(n)) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 22 / 33

Our main results (undirected graphs) Approximation algorithms Our main approximability results (undirected graphs) small number of clusters well-approximate OPT in particular, partitioning into just two clusters achieves 1 2 OPT OPT 2 1 2 OPT An approximation algorithm whose approximation ratio is logarithmic in the average degree (provided, roughly speaking, average degree is o(n)) for locally-dense graphs (i.e., every node has a degree of Ω(n)) a solution within any constant additive error in polynomial time via use of regularity lemma DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 22 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 23 / 33

Some proof ideas for main results APX-hardness for dense graphs APX-hardness for dense graphs 3-MIS maximum independent set for 3-regular graphs δ l = 94 δ h = 95 194 194 L NP I L I / L [1] 3-MIS Modularity Clustering Ψ δ h n OPT 2 (4δ2 h δ h) Ψ δ l n [1] Chlebík and Chlebíková, 2006 n 4 OPT 4δ l 1 n 4 DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 24 / 33

Some proof ideas for main results logarithmic approximation algorithm Logarithmic approximation algorithm modularity function is neither monotone nor sub-modular, thus cannot use techniques from those domains we show that a natural LP-relaxation for modularity clustering has large integrality gap, so cannot use LP-based techniques standard algorithmic approaches such as greedy provably do not work well instead, we go via quadratic optimization and semi-definite programming (SDP) based approach DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 25 / 33

Some proof ideas for main results logarithmic approximation algorithm Logarithmic approximation algorithm Quadratic optimization and SDP-based approach OPT 2 OPT, thus suffices to partition into 2-clusters 2 express this 2-cluster partition problem as a quadratic integer program after some algebraic simplification w(u, v) = degree(u) degree(v) au,v 2m 4m, W = [w 4m u,v ] R n n maximize x T Wx subject to x { 1, 1} n But, but,..., the diagonal entries w u,u s of W are negative DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 26 / 33

Some proof ideas for main results logarithmic approximation algorithm Logarithmic approximation algorithm (continued) ignore diagonal entries; later show that it was OK to ignore maximize w u,v x u x v subject to u V: x u { 1, 1} (1) u v V obtain a lower bound on OPT using an explicit graph decomposition ( ) 1 OPT = Ω average degree ( ) 1 Approximate (1) within a factor of O by an log OPT appropriate adaptation of the algorithm of (Charikar & Wirth, FOCS 2004) DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 27 / 33

Outline 1 Introduction 2 Model Based Community Finding 3 Newman s Modularity Clustering Generalization to other types of graphs 4 Previously known complexity results 5 Our results Main results Some proof ideas for main results Other results DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 28 / 33

Our other results directed or weighted graphs Our other results for directed or weighted graphs all the algorithmic results can be generalized to directed and/or weighted graphs via appropriate modifications DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 29 / 33

Our other results alternate null model (undirected graphs) Idea of alternative null models has been explored before empirically (Gaertler, Görke, Wagner, 2007) (Karrer and Newman, 2009) We explore the classical Erdös-Rényi random graph null model G(n, p) each possible edge is selected uniformly and randomly with a probability of p set p = 2m n (n 1) such that the expected number of edges in G(n, p) is m Our observation This is same as computing ( Newman s modularity measure on a m ) n) -regular graph ( m n DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 30 / 33

Our other results alternate overall modularity (undirected graphs) Exact or approximate solutions to Newman s modularity measure may produce many trivial clusters of single nodes Example If the maximum degree is at most clustering such that 4 n 16 ln n, then there always exists a every cluster except one consists of a single node modularity value is at least 25% of the maximum A possible reason total modularity is sum of individual cluster modularities DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 31 / 33

Our other results alternate overall modularity (undirected graphs) New modularity equation total modularity is minimum of individual cluster modularities Results new objective indeed avoids generating trivial clusters its optimal value is precisely half of the optimal value of old objective DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 32 / 33

The End Any questions? DasGupta (UIC) Approximability of Modularity Clustering GA Workshop, ICALP, 2011 33 / 33