Network Science: Principles and Applications

Size: px

Start display at page:

Download "Network Science: Principles and Applications"

Olivia Wade
5 years ago
Views:

1 Network Science: Principles and Applications CS Fall 2016 Amarda Shehu,Fei Li [amarda, lifei](at)gmu.edu Department of Computer Science George Mason University

2 1 Outline of Today s Class 2 Communities Basics of Communities: Community Structure in Real Networks Examples of Network Communities Network Community Detection Algorithms (Via) Hierarchical Clustering Clustering, Community Structure, and Optimality in Real Networks Hierarchical Network Model Clustering to Optimal Partition: Modularity Modularity Maximization Overlapping Communities Comparison of Community Detection Algorithms 3 Advanced (more Theoretical) Topics/Treatments Modularity Maximization Spectral Graph Partitioning Amarda Shehu,Fei Li () Outline of Today s Class 2

3 Communities within Networks Networks play the powerful role of bridging the local and the global Explain how processes at node/link level ripple to a population We often think of (social) networks as having the following structure Q: Can we gain insights behind this conceptualization? Amarda Shehu,Fei Li () Communities 3

4 Context In the 60s., M. Granovetter interviewed people who changed jobs Asked about how they discovered their new jobs Many learned about opportunities through personal contacts Surprisingly, contacts where often acquaintances rather than friends Close friends likely have the most motivation to help you out Q: Why do distant acquaintances convey the crucial information? M. Granovetter, Getting a job: A study of contacts and careers. University of Chicago Press, 1974 Amarda Shehu,Fei Li () Communities 4

5 Granovetter s Answer and Impact Linked two different perspectives on distant friendships Structural: focus on how friendships span the network Interpersonal: local consequences of friendship being strong or weak Intertwining between structural and informational role of an edge (1) Structurally-embedded edges within a community: Tend to be socially strong Highly redundant in terms of information access (2) Long-range edges spanning different parts of the network: Tend to be socially weak Offer access to useful information (e.g., on a new job!) General way of thinking about the architecture of social networks Answer transcends the specific setting of job-seeking Amarda Shehu,Fei Li () Communities 5

6 Central to Emergence of Communities: Triadic Closure A basic principle of network formation is that of triadic closure If two people have a friend in common, then there is an increased likelihood that they will become friends in the future Emergent edges in a social network likely to close triangles Figure: More likely to see the red edge than the blue one Prevalence of triadic closure measured by the clustering coefficient #pairs of friends of v that are connected cl(v) = #pairs of friends of v Amarda Shehu,Fei Li () Communities 6

7 Reasons for Triadic Closure Triadic closure is intuitively very natural. Reasons why it operates: Increased opportunity for B and C to meet Both spend time with A There is a basis for mutual trust among B and C Both have A as a common friend A may have an incentive to bring B and C together Lack of friendship may become a source of latent stress Premise based on theories dating to early work in social psychology F. Heider, The Psychology of Interpersonal Relations. Wiley, 1958 Amarda Shehu,Fei Li () Communities 7

8 Bridges E.g.: Consider the simple social network here A s links to C,D, and E connect her to a tightly knit group A,C,D, and E likely exposed to similar opinions A s link to B seems to reach to a different part of the network Offers her access to views she would otherwise not hear about A-B edge is called a bridge, its removal disconnects the network Giant components suggest that bridges are quite rare Amarda Shehu,Fei Li () Communities 8

9 Local Bridges E.g.: In reality, the social network is larger and may look as Figure: Without A, B knowing, may have a longer path among them Defn: Span of (u, v) is the u v distance when the edge is removed Defn: A local bridge is an edge with span > 2 Ex: Edge (A, B) is a local bridge with span 3 Local bridges with large spans bridges, but less extreme Link with triadic closure: local bridges not part of triangles Amarda Shehu,Fei Li () Communities 9

Strong Triadic Closure Property Categorize all edges in the network according to their strength Strong ties corresponding to friendship Weak ties corresponding to acquaintances Opportunity, trust,

10 Strong Triadic Closure Property Categorize all edges in the network according to their strength Strong ties corresponding to friendship Weak ties corresponding to acquaintances Opportunity, trust, incentive act more powerfully for strong ties Suggests qualitative assumption termed strong triadic closure Two strong ties implies a third edge exists closing the triangle Abstraction to reason about consequences of strong/weak ties Amarda Shehu,Fei Li () Communities 10

11 Local Bridges and Weak Ties (a) Local, interpersonal distinction between edges = strong/weak ties (b) Global, structural notion = local bridges present or absent Theorem: If a node satisfies the strong triadic closure property and is involved in at least two strong ties, then any local bridge incident to it is a weak tie. Links structural and interpersonal perspectives on friendships Back to job-seeking, local bridges connect to new information Conceptual span is related to their weakness as social ties Surprising dual role suggests a strength of weak ties Amarda Shehu,Fei Li () Communities 11

12 Proof By Contradiction Proof: We will argue by contradiction. Suppose node A has 2 strong ties Moreover, suppose A satisfies the strong triadic closure property Let A-B be a local bridge as well as a strong tie Figure: Edge B-C must exist by strong triadic closure This contradicts A-B is a local bridge (cannot be part of a triangle) Amarda Shehu,Fei Li () Communities 12

13 Tie Strength and Structure in Large-Scale Data Q: Can one test Granovetter s theory with real network data? Hard for decades. Lack of large-scale social interaction surveys Now we have who-calls-whom networks with both key ingredients Network structure of communication among pairs of people Total talking time, i.e., a proxy for tie strength E.g.: Cell-phone network spanning 20% of country s population J. P. Onella et al., Structure and tie strengths in mobile communication networks, PNAS, vol. 104, pp , 2007 How can we generalize weak ties and local bridges? Amarda Shehu,Fei Li () Communities 13

14 Tie Strength and Structure in Large-Scale Data Q: Can one test Granovetter s theory with real network data? Hard for decades. Lack of large-scale social interaction surveys Now we have who-calls-whom networks with both key ingredients Network structure of communication among pairs of people Total talking time, i.e., a proxy for tie strength E.g.: Cell-phone network spanning 20% of country s population J. P. Onella et al., Structure and tie strengths in mobile communication networks, PNAS, vol. 104, pp , 2007 How can we generalize weak ties and local bridges? Amarda Shehu,Fei Li () Communities 13

Generalizing Weak Ties and Local Bridges Model described so far imposes sharp dichotomies on the network Edges are either strong or weak, local bridges or not Convenient to have proxies exhibiting

15 Generalizing Weak Ties and Local Bridges Model described so far imposes sharp dichotomies on the network Edges are either strong or weak, local bridges or not Convenient to have proxies exhibiting smoother gradations Numerical tie strength = Minutes spent in phone conversations Order edges by strength, report their percentile occupancy Generalize local bridges = neighborhood overlap O ij of edge (i, j) O i j = n(i) n(j) n(i) n(j) where n(i) = {j V : (i, j) E} Desirable property: O ij = 0 if (i, j) is a local bridge Amarda Shehu,Fei Li () Communities 14

16 Empirical Results Strength of weak ties prediction: O ij grows with tie strength Dependence borne out very cleanly by the data (o points) Randomly permuted tie strengths, fixed network structure ( points) Effectively removes the coupling between O ij and tie strength Amarda Shehu,Fei Li () Communities 15

17 Phone Network and Tie Strength Cell-phone network with color-coded tie strengths: (1) Stronger ties more structurally-embedded (within communities) (2) Weaker ties correlate with long-range edges joining communities Amarda Shehu,Fei Li () Communities 16

18 Approximating Link Weights If the link weights are driven by the need to transport information or materials, as it is often the case in technological and biological systems, the weights are well approximated by betweenness centrality links connecting communities have high betweenness (red), whereas the links within communities have low betweenness (green) Amarda Shehu,Fei Li () Communities 17

Weak Ties Linking Communities Strength of weak ties prediction: long-range, weak ties bridge communities Delete decreasingly weaker (small overlap) edges one at a

19 Weak Ties Linking Communities Strength of weak ties prediction: long-range, weak ties bridge communities Delete decreasingly weaker (small overlap) edges one at a time Giant component shrinks rapidly, eventually disappears Repeat with strong-to-weak tie deletions = slower shrinkage observed Amarda Shehu,Fei Li () Communities 18

20 Closing the Loop We often think of (social) networks as having the following structure Conceptual picture supported by Granovetter s strength of weak ties Amarda Shehu,Fei Li () Communities 19

21 Communities Nodes in real-world networks organize into communities E.g.: families, clubs, political organizations, proteins by function,... Supported by Granovetter s strength of weak ties theory Community (a.k.a. group, cluster, module) members are: Well connected among themselves Relatively well separated from the rest Exhibit high cohesiveness w.r.t. the underlying relational patterns Amarda Shehu,Fei Li () Communities 20

22 Zachary s Karate Club Social interactions among members of a karate club in the 70s Zachary witnessed the club split in two during his study Toy network, yet canonical (benchmark) for community detection algorithms Offers ground truth community membership (a rare luxury) Amarda Shehu,Fei Li () Communities 21

Zachary s Karate Club Social interactions among members of a karate club in the 70s Zachary witnessed the club split in two during his study Toy network,

23 Zachary s Karate Club Social interactions among members of a karate club in the 70s Zachary witnessed the club split in two during his study Toy network, yet canonical (benchmark) for community detection algorithms Offers ground truth community membership (a rare luxury) Amarda Shehu,Fei Li () Communities 21

24 Political Blogs The political blogosphere for the US 2004 presidential election Community structure of liberal and conservative blogs is apparent People have a stronger tendency to interact with equals Amarda Shehu,Fei Li () Communities 22

Electrical Power Grid Split power network into areas with minimum inter-area interactions Applications: Decide control areas for distributed power

25 Electrical Power Grid Split power network into areas with minimum inter-area interactions Applications: Decide control areas for distributed power system state estimation Parallel computation of power flow Controlled islanding to prevent spreading of blackouts Amarda Shehu,Fei Li () Communities 23

26 Scientific Collaboration Network E.g.: Coauthorship network of scientists at the Santa Fe Institute Communities found can be traced to different disciplines Amarda Shehu,Fei Li () Communities 24

Belgium Population Call pattern of 2M consumers of largest Belgian mobile phone company The color of each community on a red-green scale represents the language spoken in the particular

27 Belgium Population Call pattern of 2M consumers of largest Belgian mobile phone company The color of each community on a red-green scale represents the language spoken in the particular community, red for French and green for Dutch Bicultural society structure is apparent Model for peaceful coexistence? Minimal cross-community contact Amarda Shehu,Fei Li () Communities 25

28 High-school Social Network Network of social interactions among high-school students Strong assortative mixing, with race as latent characteristic Amarda Shehu,Fei Li () Communities 26

29 Co-authorship Network Coauthorship network of physicists publishing in network science Tightly-knit subgroups are evident from the network structure Amarda Shehu,Fei Li () Communities 27

30 College Football Vertices are NCAA football teams, edges are games during Fall 2000 Communities are the NCAA conferences and independent teams Amarda Shehu,Fei Li () Communities 28

31 Social Network Facebook egonet with 744 vertices and 30K edges Asked ego to identify social circles to which friends belong Company, high-school, basketball club, squash club, family Amarda Shehu,Fei Li () Communities 29

Biological Network Community structure of biological systems central to understanding and treating human disease community: group of molecules that form functional modules to carry out a specific

32 Biological Network Community structure of biological systems central to understanding and treating human disease community: group of molecules that form functional modules to carry out a specific cellular function E.g.: Proteins involved in same disease tend to interact with one another Disease module hypothesis: Each disease can be linked to a well-defined neighborhood of the cellular network Amarda Shehu,Fei Li () Communities 30

33 Unveiling Network Communities Nodes in real-world networks organize into communities E.g.: families, clubs, political organizations, proteins by function,... Supported by Granovetter s strength of weak ties theory Community (a.k.a. group, cluster, module) members are: Well connected among themselves Relatively well separated from the rest Exhibit high cohesiveness w.r.t. the underlying relational patterns Q: How can we automatically identify such cohesive subgroups? Amarda Shehu,Fei Li () Communities 31

34 Community Detection: What Makes a Community Useful for exploratory analysis of network data E.g.: clues about social interactions, content-related web pages, disease, collective phenomena Community detection is a challenging clustering problem (1) No consensus on the structural definition of community (2) Node subset selection often intractable (3) Lack of ground-truth for validation Connectedness and Density Hypothesis: Connectedness: All members of a community must be reached through other members of the same community Locally dense: Nodes that belong to a community have a higher probability to link to others in the same community than to nodes outside the community. Amarda Shehu,Fei Li () Communities 32

35 Community Detection: What Makes a Community Useful for exploratory analysis of network data E.g.: clues about social interactions, content-related web pages, disease, collective phenomena Community detection is a challenging clustering problem (1) No consensus on the structural definition of community (2) Node subset selection often intractable (3) Lack of ground-truth for validation Connectedness and Density Hypothesis: Connectedness: All members of a community must be reached through other members of the same community Locally dense: Nodes that belong to a community have a higher probability to link to others in the same community than to nodes outside the community. A community is a locally dense connected subgraph in the network Amarda Shehu,Fei Li () Communities 32

36 Communities as Special Subgraphs Luce & Perry, A method of matrix analysis of group structure in Psychometrika, 1949 defined a community as a complete subgraph, i.e., a clique A clique automatically satisfies the criterion a locally dense connected subgraph but misses legitimate, imperfect communities Consider a node i in a connected subgraph C G that has N c N nodes Let ki int be the number of links that connect i to nodes in C Let ki ext be the number of links that connect i to nodes in GC C is a strong community if each node within C has more links within C than outside: i C, ki int > ki ext C is a weak community if the inequality above does not hold for all nodes in C but the following observation can be made regarding the total internal and total external degree: i C kint i > i C kext i Amarda Shehu,Fei Li () Communities 33

37 Communities as Special Subgraphs Luce & Perry, A method of matrix analysis of group structure in Psychometrika, 1949 defined a community as a complete subgraph, i.e., a clique A clique automatically satisfies the criterion a locally dense connected subgraph but misses legitimate, imperfect communities Consider a node i in a connected subgraph C G that has N c N nodes Let ki int be the number of links that connect i to nodes in C Let ki ext be the number of links that connect i to nodes in GC C is a strong community if each node within C has more links within C than outside: i C, ki int > ki ext C is a weak community if the inequality above does not hold for all nodes in C but the following observation can be made regarding the total internal and total external degree: i C kint i > i C kext i How many communities are there? To answer this, pose community detection as graph partitioning and a simple version of it, graph bisection...next Amarda Shehu,Fei Li () Communities 33

38 Community Detection and Graph Partitioning Graph Partitioning Split V into given number of non-overlapping groups of given sizes Criterion: number of edges between groups is minimized This is known as minimizing cut E.g.: task-processor assignment for load balancing, chip design Number and sizes of groups unspecified in community detection Identify the natural fault lines along which a network separates A special version of this problem is graph bisection: Partition the graph G into two non-overlapping subgraphs, such that the number of edges/links between subgraphs, called cut size is minimized. Amarda Shehu,Fei Li () Communities 34

39 Graph Partitioning is Hard E.g.: Graph bisection problem, i.e., partition V into two groups ( V = N) Suppose the groups V 1 and V 2 are non-overlapping Suppose groups have equal size, i.e., V 1 = V 2 = N/2 Minimize edges running between vertices in different groups Simple problem to describe, but hard to solve Number of ways to partition V : ( N ) N/2 2 N+1 N Used Stirling s formula N! 2πN(N/e) N In this case, ( N ) N/2 e [(N+1)ln2 1 2 lnn] Since number of bisections increases exponentially with size of network, Exhaustive/brute-force search intractable beyond toy small-sized networks No smart (i.e., polynomial time) algorithm, NP-hard problem Seek good heuristics, e.g., relaxations of natural criteria Amarda Shehu,Fei Li () Communities 35

40 Community Detection and Graph Partitioning In community detection, size and number of communities not known a priori Let s call a partition a division of a network into an arbitrary number of groups, s.t. each node belongs to one and only one group (non-overlapping) The number of possible partitions (the Bell Number) is B N = 1 e j=0 j N j! Need to identify communities without the luxury of inspecting all possible partitions but with some freedom on the definition of a community Amarda Shehu,Fei Li () Communities 36

41 Community Detection and Graph Partitioning Need algorithms whose running time grows polynomially with respect to N Clustering algorithms address this need They identify communities by either growing communities or dividing the graph Many exploit the strength of weak ties Short detour... Amarda Shehu,Fei Li () Communities 37

42 Strength of Weak Ties Motivation Local bridges connect weakly interacting parts of the network Q: What about removing those to reveal communities? Challenges: Multiple local bridges. Some better that others? Which one first? There might be no local bridge, yet an apparent natural division Amarda Shehu,Fei Li () Communities 38

43 Edge Betweenness Centrality Idea: high edge betweenness centrality to identify weak ties High cbe(e) edges carry large traffic volume over shortest paths Position at the interface between tightly-knit groups E.g.: cell-phone network with colored edge strength and betweenness Amarda Shehu,Fei Li () Communities 39

44 Girvan-Newman s method Girvan-Newman s method is conceptually simple Find and remove spanning links between cohesive subgroups Algorithm: Repeat until there are no edges left Calculate the betweenness centrality cbe(e) of all edges Remove edge(s) with highest cbe(e) Connected components are the communities identified Divisive method: network falls apart into pieces as we go Nested partition: larger communities potentially host denser groups Recompute edge betweenness after per step M. Girvan and M. Newman, Community structure in social and biological networks, PNAS, vol. 99, pp , 2002 Amarda Shehu,Fei Li () Communities 40

45 Girvan-Newman s Algorithm in Action Amarda Shehu,Fei Li () Communities 41

46 Hierarchical Clustering: Common Framework Another way to look at Girvan-Newman s algorithm is as a divisive hierarchical clustering algorithm We will cover two hierarchical clustering algorithms (domain is very rich): Ravasz algorithm (agglomerative) Girvan-Newman s algorithm (divisive) In principle, any clustering algorithm can be adapted and applied to detect communities in a network Common framework: exploitation of a similarity or dissimilarity matrix whose elements x ij indicate the similarity or distance of node i from node j E.g., similarity extracted from relative positions of i and j within the network Agglomerative: merge nodes with high similarity into the same community Divisive: isolate communities by removing low-similarity (high dissimilarity), cross-community links Amarda Shehu,Fei Li () Communities 42

47 Agglomerative Approach to Hierarchical Clustering Take Ravasz algorithm as an example; designed to detect functional modules in metabolic networks [E. Ravasz and A.-L. Barabasi. Hierarchical organization in complex networks. Physical Review E 2003] Key idea: similarity should be high for nodes that belong to the same community and low for nodes residing in different communities Proposal: nodes that connect to each-other and share neighbors belong to the J(i,j) same community = topological similarity = x ij = min k i,k j +1 θ(a ij ) θ(x) = 0 for x 0 and 1 for x > 0 (Heaviside step function) J(i, j) is nr. of shared neighbors (+1 if (i, j) E), and k i, k j are degrees So, x ij = 1 if nodes i and j share the same neighbors x ij = 0 if i and j do not share any neighbors any value in [0, 1] Amarda Shehu,Fei Li () Communities 43

48 Ravasz-Barabasi Hierarchical Clustering Algorithm Amarda Shehu,Fei Li () Communities 44

49 Ravasz-Barabasi Hierarchical Clustering Algorithm What to do with this similarity matrix? = Neighbor Joining Amarda Shehu,Fei Li () Communities 44

50 Ravasz-Barabasi Hierarchical Clustering Algorithm S0: Assign each node i to its own community {i} S1: Evaluate x ij for all pairs of communities S2: Find the community pair with the highest similarity and join/merge them into a single community S3: If there is more than one community, go to S1 How do we determine similarity between communities of more than one node? = Several options: Single, complete, or average group/cluster similarity Single-link clustering: similarity of two clusters is the similarity of their most similar members Complete-link clustering: the similarity of two clusters is the similarity of their most dissimilar members In Ravasz-Barabasi, average cluster similarity is used: similarity between all pairs of nodes residing in two different clusters is computed and averaged to report the similarity between two groups/clusters Amarda Shehu,Fei Li () Communities 45

51 Cluster Similarity Figure: (a) Pairwise similarity matrix. (b) Single-link(age) clustering. (c) Complete-link clustering. (d) Average link clustering. Amarda Shehu,Fei Li () Communities 46

52 Ravasz-Barabasi Hierarchical Clustering Algorithm: Time Complexity S0: Assign each node i to its own community {i} = O(N) worst-case running time S1: Evaluate x ij for all pairs of communities = O(N 2 ) first time around S2: Find the community pair with the highest similarity and join/merge them into a single community S1 + S2 = at worst N times determining distance of new cluster to others = O(N 2 ) time Total time: O(N 2 ) Where are the communities? What was the order in which smaller communities were joined to larger ones? = need to maintain an order of joining events tree = adds O(NlogN) to running time Amarda Shehu,Fei Li () Communities 47

53 Ravasz-Barabasi Hierarchical Clustering Algorithm: Partitions The order in which communities are merged/joined together can be visualized via a dendogram tree. Can you infer order of events? Amarda Shehu,Fei Li () Communities 48

54 Ravasz-Barabasi Hierarchical Clustering Algorithm: Partitions The order in which communities are merged/joined together can be visualized via a dendogram tree. Can you infer order of events? Where are the communities? (more later) Amarda Shehu,Fei Li () Communities 48

55 Girvan-Newman Hierarchical Clustering Algorithm A divisive algorithm that relies on a dissimilarity matrix x i j Proposal: exploit x ij to select node pairs that are in different communities x ij should be high for nodes in different communities and low for nodes in same community Reasonable options: link betweenness and random walk betweenness link betweenness fastest to calculate In Girvan-Newman: link betweenness x ij defined as the number of shortest paths that go through the link (i, j) If I am to remove links one at a time based on their high-to-low x ij value, does link betweenness make sense? What values of x ij do I expect from nodes that connect communities? Amarda Shehu,Fei Li () Communities 49

56 Girvan-Newman Hierarchical Clustering Algorithm Algorithm S1: Compute centrality of each link S2: Remove link with highest centrality S3: Recalculate centrality of each link in altered network S4: Repeat S2-S3 until all links are removed Computational Complexity Most expensive step is recalculation of centrality If link betweenness used, O(LN) worst-case running time Step S3 adds a factor of L to this time Total time: O(L 2 N) Amarda Shehu,Fei Li () Communities 50

57 Girvan-Newman Hierarchical Clustering Algorithm Figure: (a)-(d) sequence of link removals. (e) Accompanying dendogram. (f) Which partition is optimal? Modularity helps us determine that. Amarda Shehu,Fei Li () Communities 51

58 Partitions on the Dendogram Figure: (a) dendogram. (b)-(d) different partitions. Amarda Shehu,Fei Li () Communities 52

59 Dendograms Show Partitions, Not Final Partition! Hierarchical partitions often represented with a dendogram Shows groups found in the network at all algorithmic steps Split the network at different resolutions E.g.: Girvan-Newman s algorithm for the Zachary s karate club Q: Which of the divisions/partitions is the most useful/optimal in some sense? A: Need to define metrics to compare partitions! Amarda Shehu,Fei Li () Communities 53

60 Clustering, Optimality, and Community Structure in Real Networks Hierarchical clustering assumes nested communities, as captured by the dendograms these algorithms produce The density hypothesis states that there are locally-dense subgraphs weakly linked to other locally-dense subgraphs Q1: How do we know such organization exists in real networks? Q2: If community structure exists, then how can hierarchical clustering algorithms be used to reveal the optimal organization/partition? Amarda Shehu,Fei Li () Communities 54

61 Clustering, Optimality, and Community Structure in Real Networks Hierarchical clustering assumes nested communities, as captured by the dendograms these algorithms produce The density hypothesis states that there are locally-dense subgraphs weakly linked to other locally-dense subgraphs Q1: How do we know such organization exists in real networks? Hubs connect communities = conflict between scale-free property and community structure? Need to develop a model that reconciles the two Q2: If community structure exists, then how can hierarchical clustering algorithms be used to reveal the optimal organization/partition? Amarda Shehu,Fei Li () Communities 54

62 Clustering, Optimality, and Community Structure in Real Networks Hierarchical clustering assumes nested communities, as captured by the dendograms these algorithms produce The density hypothesis states that there are locally-dense subgraphs weakly linked to other locally-dense subgraphs Q1: How do we know such organization exists in real networks? Hubs connect communities = conflict between scale-free property and community structure? Need to develop a model that reconciles the two Q2: If community structure exists, then how can hierarchical clustering algorithms be used to reveal the optimal organization/partition? Need a way to rank/score partitions and report optimal one Amarda Shehu,Fei Li () Communities 54

Reconciling Scale-free Property and Community Structure Left: Start with a module of 5 nodes, one at center of a square. Connect all of them to one another.

63 Reconciling Scale-free Property and Community Structure Left: Start with a module of 5 nodes, one at center of a square. Connect all of them to one another. Middle: Create four identical replicas and connect the peripheral nodes of each module to the central node of original module. Right: Create four replicas of the 25-node module and connect the peripheral nodes again to central node of original module, obtaining a 125-node network. Process can be repeated ad infinitum. Amarda Shehu,Fei Li () Communities 55

64 Hierarchical Network Model The hierarchical network model generates a scale-free network with degree exponent γ = 1 + ln5 ln4 = For Erdos-Renyi and Barabasi-Albert models, clustering coefficient C decreases with N; C = for the hierarchical network model N-independent clustering coefficient observed in metabolic networks Quantitative signature of nested community (hierarchical modularity) is the dependence of a node s clustering coefficient on its degree: C(k) k 1 The higher a node s degree, the smaller its clustering coefficient Small-degree nodes have high C because they reside in dense communities High-degree nodes have low C because they connect to different communities Inspecting C(k) allows determining whether a network is hierarchical Power grid lacks hierarchical modularity (C(k) independent of k) In many real networks, C(k) decreases with k More detailed models predict C(k) k β with β [0, 2] Amarda Shehu,Fei Li () Communities 56

65 Modularity Size of communities typically unknown = Identify automatically Devise a function that measures how well a network is partitioned in communities Intuition: density of edges in communities higher than expected at random Hypothesis: Randomly-wired networks lack an inherent community structure By comparing link density of a subgraph with link density obtained for the same group of nodes but randomly wired, one can determine whether the found subgraph is a community Systematic deviations from a random configuration allow defining a quantity called modularity to measure the quality of a particular partition Modularity offers a way to determine the optimal partition from the dendogram and so predict a partition Modularity optimization also offers a novel approach to community detection via optimization algorithms Amarda Shehu,Fei Li () Communities 57

Modularity of a Proposed Community Suppose a subgraph s has been identified and proposed to be a community One can measure its modularity as: Q(s) [(# of edges within group s) E[# of such edges]]

66 Modularity of a Proposed Community Suppose a subgraph s has been identified and proposed to be a community One can measure its modularity as: Q(s) [(# of edges within group s) E[# of such edges]] Operationalizes hypothesis that there should be more edges within a true community than what would be obtained in a null model Proposal: Q(s) = 1 (i,j) s (A ij p ij ) 2L L is number of edges in the entire graph/network p ij is probability of an edge by chance, obtained by randomizing original network G while preserving expected degree of each node Under degree-preserving null model: p ij = k i k j 2L. Why? Amarda Shehu,Fei Li () Communities 58

67 Expected Connectivity Among Nodes Null model: randomize edges preserving degree distribution in G Random variable A ij := I{(i, j) E} Expectation is E[A ij ] = P((i, j) E) Suppose node i has degree k i, node j has degree k j Degree is # of spokes per node, 2L spokes in G Probability spoke i k is connected to j is P((i, j) E) k j 2L 1 k j 2L = P( k i i k =1 {spoke i k connected to node j}) = k i i k =1 P(spoke i k connected to node j) = k i k j 2L Amarda Shehu,Fei Li () Communities 59

68 Expected Connectivity Among Nodes Null model: randomize edges preserving degree distribution in G Random variable A ij := I{(i, j) E} Expectation is E[A ij ] = P((i, j) E) Suppose node i has degree k i, node j has degree k j Degree is # of spokes per node, 2L spokes in G Probability spoke i k is connected to j is P((i, j) E) k j 2L 1 k j 2L = P( k i i k =1 {spoke i k connected to node j}) = k i i k =1 P(spoke i k connected to node j) = k i k j 2L Back to modularity of a subgraph and modularity of a partition Amarda Shehu,Fei Li () Communities 59

69 From Modularity of a Subgraph to Modularity of a Partition So, modularity Q(s) of a subgraph s is: Q(s) = Ls L ( ks 2L )2, where L s is the total number of edges in s and k s is total degree of nodes in s Now consider a partition S of a graph G into groups/subgraphs s S Summing over all groups s S gives us the modularity score of a particular partition S: Q(G, S) = s S Q(s) Formally, after normalization Q(G, S) [ 1, 1] Can use modularity to rank partitions and pick best one from dendogram tree Can evaluate the modularity of each partition in a dendrogram Maximum value gives the best community structure How sensitive is modularity to changes to a partition? Amarda Shehu,Fei Li () Communities 60

70 Examples of Modularity Values Amarda Shehu,Fei Li () Communities 61

71 Using Modularity to Assess Clustering Quality E.g.: Girvan-Newman s algorithm for the Zachary s karate club Q: Why not optimize Q(G, S) directly over possible partitions S? Amarda Shehu,Fei Li () Communities 62

72 Using Modularity to Assess Clustering Quality E.g.: Girvan-Newman s algorithm for the Zachary s karate club Q: Why not optimize Q(G, S) directly over possible partitions S? A: How complex is modularity landscape of partition space? Amarda Shehu,Fei Li () Communities 62

Modularity Landscape A greedy algorithm seeking maximally modular partition runs in O(Nlog 2 N) time Modularity maximization forces small weakly-connected communities into

73 Modularity Landscape A greedy algorithm seeking maximally modular partition runs in O(Nlog 2 N) time Modularity maximization forces small weakly-connected communities into larger ones Optimal partition difficult to identify among a large number of close to optimal partitions = modularity landscape is complex Amarda Shehu,Fei Li () Communities 63

74 Modularity Maximization Algorithms If the goal is to find partition with maximal modularity, this is an optimization problem Searching for local maxima in a complex search space (fitness landscape) is handled very well by evolutionary algorithms Shi, Wy, Zhong, A New Genetic Algorithm for Community Detection, Complex Sciences, Springer 2009 Gong, Ma, Zhang, Jiao, Community detection in networks by using multiobjective evolutionary algorithm with decomposition, Physica A: Statistical Mechanics and Its Applications 2012 Key concept: evolving population of partitions in response to optimizing modularity Do not rely on similarity or dissimilarity matrix and therefore scale to very networks Multi-objective EAs address two conflicting objectives: negative ratio association (divides into small communities) and ratio cut (divides into large communities) Various reviews exist Amarda Shehu,Fei Li () Communities 64

Overlapping Communities Viscek and collaborators proposed the concept of overlapping communities Clique percolation algorithm views a community as a union of overlapping cliques: based on knowledge

75 Overlapping Communities Viscek and collaborators proposed the concept of overlapping communities Clique percolation algorithm views a community as a union of overlapping cliques: based on knowledge that k-cliques naturally emerge in sufficiently dense networks Two k-cliques are adjacent if sharing k 1 nodes A k-clique community is largest connected subgraph obtained by union of all adjacent k-cliques k-cliques that cannot be reached from a particular k-clique belong to other k-clique communities Polynomial running time when cliques are small Link clustering algorithm depends on definition of link similarity Similarity of two links is determined by the neighborhood of nodes connected by them Hierarchical clustering applied to group links into communities Amarda Shehu,Fei Li () Communities 65

76 Testing Community Detection Algorithms In absence of ground truth, how is one to compare community detection algorithms? Modularity-based comparison only helps so far Other measures used, defined to compare quality of clusters by clustering algorithms (e.g., information gain) Other measures compare the behavior of algorithms at various community sizes [Leskovec, Lang, Mahoney, Empirical Comparison of Algorithms for Network Community Detection, IW3C ] Figure: Network community profile (NCP) characterizes the quality of network communities as a function of their size. Universal V shape of NCP summarizes large-scale community structure of real networks. Amarda Shehu,Fei Li () Communities 66

77 Advanced (more Theoretical) Topics/Treatments Modularity maximization can be proved to be NP-hard Relaxing modularity maximization via the Lagrange technique provides one approach to modularity maximization Stochastic optimization algorithms (such as EAs) more generally applicable and scalable on real networks Relaxation uses eigendecomposition and specific properties of dominant eigenvector Graph partition recast as graph cut problem Minimize-cut treatment opens way to other algorithms that use the Laplacian to determine the connected components of the graph Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 67

78 Modularity Revisited Recall our definition of modularity Q(G, S) = 1 2N e s S i,j s [A ij d i d j 2N e ] Let g i be the group membership of vertex i and rewrite Q(G, S) = 1 2N e i,j V [A ij d i d j 2N e ]I{g i = g j } Define for convenience the summands B ij = A ij d i d j 2N e Both marginal sums of B ij vanish, since e.g.: j B ij = j A ij d i 2N e j d j = d i d i 2N e 2N e = 0 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 68

79 Graph Bisection Consider (for simplicity) dividing the network in two groups Binary community membership variables per vertex: s i = +1 if vertex i belongs to group 1 s i = 1 if vertex i belongs to group 2 Using the identity 1 (s 2 is j + 1) = I{g i = g j }, the modularity is: Q(G, S) = 1 2N e i,j V [A ij d i d j 2N e ]I{g i = g j } = 1 4N e i,j V B ij(s i s j + 1) Recall j B ij = 0 to obtain the simpler expression: Q(G, S = 1 4N e i,j V B ijs i s j Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 69

80 Nv Nv Let B R b Optimizing Modularity the modularity matrix with entries B ij = A ij d i d j 2N e Any partition S is defined by the vector s = [s 1... s Nv ] T Modularity is a quadratic form: Q(G, S) = 1 4N e i,j V B ijs i s j = 1 4N e s T Bs Modularity as criterion for graph bisection yields the formulation: ŝ = arg max s { 1,+1} Nv s T Bs Nasty binary constraints s { 1, +1} Nv (hypercube vertices) Modularity optimization is NP-hard [Brandes et al 2006] Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 70

81 Relaxation Relax the constraint s { 1, +1} Nv to s R Nv, s 2 = 1 ŝ = arg max s s T Bs s.t. s T s = 1 Associate a Lagrange multiplier λ to the constraint s T s = 1 Optimality conditions yield: s [s T Bs + λ(1 s T s)] = 0 = Bs = λs Conclusion is that s is an eigenvector of B with eigenvalue λ Q: Which eigenvector should we pick? At optimum, Bs = λs so objective becomes: s T Bs = λs T s = λ A: To maximize modularity pick the dominant eigenvector of B Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 71

82 Spectral Modularity Maximization Let u 1 be the dominant eigenvector of B, with i th [u 1] i Cannot just set s = u 1 because u 1 { 1, +1} Nv Best effort: maximize their similarity s T u 1 which gives: s i =sign([u 1 ]) i +1 when [u 1 ] i ] > 0 and 1 otherwise Spectral modularity maximization algorithm S1: Compute modularity matrix B with entries B ij = j A ij d i 2N e S2: Find a dominant eigenvector u 1 f B (e.g., power method) S3: Cluster membership of vertex i is s i =sign([u 1 ] i ) Multiple (> 2) communities through e.g., repeated graph bisection Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 72

83 Example: Zachary s Karate Club Spectral modularity maximization Shapes of vertices indicate community membership Dotted line indicates partition found by the algorithm Vertex colors indicate the strength of their membership Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 73

Graph Bisection Consider an undirected graph G = (V, E) E.g.: Graph bisection problem, i.e., partition V into two groups Groups V 1 and V 2 = V C 1 are non-overlapping Groups have given sizes, i.e., V 1 = N 1 and V 2 = N 2 Q: What is a good criterion to partition the graph?

84 Graph Bisection Consider an undirected graph G = (V, E) E.g.: Graph bisection problem, i.e., partition V into two groups Groups V 1 and V 2 = V C 1 are non-overlapping Groups have given sizes, i.e., V 1 = N 1 and V 2 = N 2 Q: What is a good criterion to partition the graph? A: We have already seen modularity. Let s see a different one Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 74

Graph Cut Desiderata: Community members should be Well-connected among themselves Relatively well-separated from rest of nodes Def: A cut C is the number of edges between groups 1

85 Graph Cut Desiderata: Community members should be Well-connected among themselves Relatively well-separated from rest of nodes Def: A cut C is the number of edges between groups 1 and 2 C = cut(v 1, V 2 ) = i V 1,j V 2 A ij Natural criterion: minimize cut, i.e., edges across groups 1 and 2 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 75

86 From Graph Cuts Binary community membership variable per vertex s i = +1, vertex i belongs to group 1 s i = 1 otherwise Let g i be the group membership of vertex i, s.t.: I{g i g j } = 1 2 (1 s is j ) = 1 if i and j in different groups and 0 otherwise Cut expressible in terms of the variables s i as: C = i V 1,j V 2 A ij = 1 2 i,j V A ij(1 s i s j ) Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 76

87 to the Graph Laplacian Matrix First summand in C = 1 2 i,j A ij(1 s i s j is: i,j V A ij = i V d i = i V d isi 2 = i,j Vd i s i s j I{i=j} Used si 2 = 1 since s i { 1, +1}. The cut becomes: C = 1 2 i,j V (d ii{i = j} A)ij)s i s j = 1 2 i,j V i,j V L ijs i s j Cut in terms of L ij, entries of the graph Laplacian L = D A, i.e.,: C(s) = 1 2 st Ls, s = [s 1... s Nv ] T Maximize modularity Q(s) s T Bs v.s. minimize cut C(s) s T Ls Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 77

88 Laplacian Matrix Properties Revisited Smoothness: For any vector x R Nv of vertex values, one has: x T Lx = i,j V L ijx i x j = (i,j) E (x i x j ) 2 which can be minimized to enforce smoothness of functions on G Positive semi-definiteness: Follows since x T Lx > 0 x R Nv Spectrum: All eigenvalues of L are real and and non-negative Eigenvectors form an orthonormal basis of R Nv Rank deficiency: Since L1 = 0, L is rank deficient Spectrum and connectivity: The smallest eigenvalue λ 1 of L is 0 If the second-smallest eigenvalue λ 2 = 0, then G is connected If L has n zero eigenvalues, G has n connected components Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 78

89 Graph Cut Minimization Since V = N 1 and V 2 = N 2 = N N 1, then we have the constraint: i V s i = i V 1 (+1) + i V 2 ( 1) = N 1 N 2 = 1 T s = N 1 N 2 Minimum-cut criterion for graph bisection yields the formulation ŝ = arg min s { 1,+1} Nv s T Ls s.t. 1 T s = N 1 N 2 Again, binary constraints s { 1, +1} Nv render cut minimization hard Relax binary constraints as with modularity maximization Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 79

90 Further Intuition Since s T Ls = (i,j) E (s i s j ) 2, the minimum-cut formulation is: ŝ = arg min s { 1,+1} Nv (i,j) E (s i s j ) 2 s.t. 1 T s = N 1 N 2 Q: Does this equivalent cost function make sense? A: Absolutely! Edges joining vertices in the same group do not add to the sum Edges joining vertices in different groups add 4 to the sum Minimize cut: assign values s i to nodes i such that few edges cross 0 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 80

Further Intuition Since s T Ls = (i,j) E (s i s j ) 2, the minimum-cut formulation is: ŝ = arg min s { 1,+1} Nv (i,j) E (s i s j ) 2 s.t. 1 T s = N 1 N 2 Q: Does this equivalent cost function make sense?

91 Further Intuition Since s T Ls = (i,j) E (s i s j ) 2, the minimum-cut formulation is: ŝ = arg min s { 1,+1} Nv (i,j) E (s i s j ) 2 s.t. 1 T s = N 1 N 2 Q: Does this equivalent cost function make sense? A: Absolutely! Edges joining vertices in the same group do not add to the sum Edges joining vertices in different groups add 4 to the sum Minimize cut: assign values s i to nodes i such that few edges cross 0 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 80

92 Minimum-Cut Relaxation Relax the constraint s { 1, +1} Nv to s R Nv, s 2 = 1 ŝ =arg min s s T Ls, 1 T s = N 1 N 2, s T s = 1 Straightforward to solve using Lagrange multipliers Characterization of the solution ŝ [Fiedler 1973] ŝ = v 2 + N 1 N 2 N v 1 The second-smallest eigenvector v 2 of L satisfies 1 T v 2 = 0 Minimum cut C(ŝ) = ŝ T Lŝ = v T 2 Lv 2 λ 2 If the graph G is disconnected then we know lambda 2 = 0 = C(ŝq) If G is amenable to bisection, the cut is small and so is λ 2 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 81

93 Theoretical Guarantee Consider a partition of G into V 1 and V 2, where V 1 V 2 If G is connected, then the Cheeger inequality asserts: α 2 2d max λ 2 2α where α = C V 1 Certifies that λ 2 gives a useful bound and d max is the maximum node degree F. Chung, Four proofs for the Cheeger inequality and graph partition algorithms, Proc. of ICCM, 2007 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 82

94 Spectral Graph Bisection Q: How to obtain the binary cluster labels s { 1, +1} Nv from ŝ R Nv? Again, maximize the similarity measure s T ŝ where s i =sign([v 2 ] i ) (+1 if [v 2 ] i > 0 and -1 otherwise) Spectral graph bisection algorithm S1: Compute Laplacian matrix L with entries L ij = D ij A ij S2: Find second smallest eigenvector v 2 of L S3: Cluster membership of vertex i is s i =sign([v 2 ] i ) Complexity: efficient Lanczos algorithm variant in O( N e λ 3 λ 2 ) Nomenclature: v 2 is known as the Fiedler vector Eigenvalue λ 2 is Fiedler value, or algebraic connectivity of G Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 83

95 Spectral Gap in Fidler Vector Entries Suppose G is disconnected and has two connected components L is block diagonal, two smallest eigenvectors indicate groups, i.e. v 1 = [1, 1,..., 1, 0,..., 0] T and v 2 = [0, 0,..., 0, 1,..., 1] T If G is connected but amenable to bisection, v 1 = 1 and λ 2 0 Also, 1 T v 2 = i [v 2] i = 0 = Positive and negative entries in v 2 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 84

96 Unknown Community Sizes Consider the graph bisection problem with unknown group sizes Minimizing the graph cut may be no longer meaningful! Figure: Cost C = i V 1,j V 2 A ij agnostic to groups internal structures Better criterion is the ratio cut R defined as R = C V 1 + C V 2 Balanced partitions: small community is penalized by the cost Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 85

97 Ratio-Cut Minimization Fix a bisection S of G into groups V 1 and V 2 Define f : f (S) = [f 1,..., f N ] T R Nv with entries: f i = V2 V 1 if vertex i belongs to V 1 and f i = V1 V 2 of vertex i belongs to V 2 One can establish the following properties: P1: f T Lf = N v R(S) P2: i f i = 0, i.e., 1 T f = 0 P3: f 2 = N v From P1-P3 it follows that ratio-cut minimization is equivalent to: min f f T Lf s.t. 1 T f = 0 and f T f = N v Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 86

98 Ratio Cut and Spectral Graph Bisection Ratio-cut minimization is also NP-hard. Relax to obtain: ŝ =arg min s R Nv s T Ls s.t. 1 T s = 0 and s T s = N v Partition Ŝ also given by the spectral graph bisection algorithm S1: Compute Laplacian matrix L with entries L ij = D ij A ij S2: Find second smallest eigenvector v 2 of L S3: Cluster membership of vertex i is s i =sign([v 2 ] i ) Alternative criterion is the normalized cut NC defined as: NC = C vol(v 1 ) + C vol(v 2 ), where vol(v i) = v V i d v, i {1, 2} Corresponds to using the normalized Laplacian D 1 L Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 87

Network Community Detection

Network Community Detection Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ March 20, 2018