Network Science: Principles and Applications
|
|
- Olivia Wade
- 5 years ago
- Views:
Transcription
1 Network Science: Principles and Applications CS Fall 2016 Amarda Shehu,Fei Li [amarda, lifei](at)gmu.edu Department of Computer Science George Mason University
2 1 Outline of Today s Class 2 Communities Basics of Communities: Community Structure in Real Networks Examples of Network Communities Network Community Detection Algorithms (Via) Hierarchical Clustering Clustering, Community Structure, and Optimality in Real Networks Hierarchical Network Model Clustering to Optimal Partition: Modularity Modularity Maximization Overlapping Communities Comparison of Community Detection Algorithms 3 Advanced (more Theoretical) Topics/Treatments Modularity Maximization Spectral Graph Partitioning Amarda Shehu,Fei Li () Outline of Today s Class 2
3 Communities within Networks Networks play the powerful role of bridging the local and the global Explain how processes at node/link level ripple to a population We often think of (social) networks as having the following structure Q: Can we gain insights behind this conceptualization? Amarda Shehu,Fei Li () Communities 3
4 Context In the 60s., M. Granovetter interviewed people who changed jobs Asked about how they discovered their new jobs Many learned about opportunities through personal contacts Surprisingly, contacts where often acquaintances rather than friends Close friends likely have the most motivation to help you out Q: Why do distant acquaintances convey the crucial information? M. Granovetter, Getting a job: A study of contacts and careers. University of Chicago Press, 1974 Amarda Shehu,Fei Li () Communities 4
5 Granovetter s Answer and Impact Linked two different perspectives on distant friendships Structural: focus on how friendships span the network Interpersonal: local consequences of friendship being strong or weak Intertwining between structural and informational role of an edge (1) Structurally-embedded edges within a community: Tend to be socially strong Highly redundant in terms of information access (2) Long-range edges spanning different parts of the network: Tend to be socially weak Offer access to useful information (e.g., on a new job!) General way of thinking about the architecture of social networks Answer transcends the specific setting of job-seeking Amarda Shehu,Fei Li () Communities 5
6 Central to Emergence of Communities: Triadic Closure A basic principle of network formation is that of triadic closure If two people have a friend in common, then there is an increased likelihood that they will become friends in the future Emergent edges in a social network likely to close triangles Figure: More likely to see the red edge than the blue one Prevalence of triadic closure measured by the clustering coefficient #pairs of friends of v that are connected cl(v) = #pairs of friends of v Amarda Shehu,Fei Li () Communities 6
7 Reasons for Triadic Closure Triadic closure is intuitively very natural. Reasons why it operates: Increased opportunity for B and C to meet Both spend time with A There is a basis for mutual trust among B and C Both have A as a common friend A may have an incentive to bring B and C together Lack of friendship may become a source of latent stress Premise based on theories dating to early work in social psychology F. Heider, The Psychology of Interpersonal Relations. Wiley, 1958 Amarda Shehu,Fei Li () Communities 7
8 Bridges E.g.: Consider the simple social network here A s links to C,D, and E connect her to a tightly knit group A,C,D, and E likely exposed to similar opinions A s link to B seems to reach to a different part of the network Offers her access to views she would otherwise not hear about A-B edge is called a bridge, its removal disconnects the network Giant components suggest that bridges are quite rare Amarda Shehu,Fei Li () Communities 8
9 Local Bridges E.g.: In reality, the social network is larger and may look as Figure: Without A, B knowing, may have a longer path among them Defn: Span of (u, v) is the u v distance when the edge is removed Defn: A local bridge is an edge with span > 2 Ex: Edge (A, B) is a local bridge with span 3 Local bridges with large spans bridges, but less extreme Link with triadic closure: local bridges not part of triangles Amarda Shehu,Fei Li () Communities 9
10 Strong Triadic Closure Property Categorize all edges in the network according to their strength Strong ties corresponding to friendship Weak ties corresponding to acquaintances Opportunity, trust, incentive act more powerfully for strong ties Suggests qualitative assumption termed strong triadic closure Two strong ties implies a third edge exists closing the triangle Abstraction to reason about consequences of strong/weak ties Amarda Shehu,Fei Li () Communities 10
11 Local Bridges and Weak Ties (a) Local, interpersonal distinction between edges = strong/weak ties (b) Global, structural notion = local bridges present or absent Theorem: If a node satisfies the strong triadic closure property and is involved in at least two strong ties, then any local bridge incident to it is a weak tie. Links structural and interpersonal perspectives on friendships Back to job-seeking, local bridges connect to new information Conceptual span is related to their weakness as social ties Surprising dual role suggests a strength of weak ties Amarda Shehu,Fei Li () Communities 11
12 Proof By Contradiction Proof: We will argue by contradiction. Suppose node A has 2 strong ties Moreover, suppose A satisfies the strong triadic closure property Let A-B be a local bridge as well as a strong tie Figure: Edge B-C must exist by strong triadic closure This contradicts A-B is a local bridge (cannot be part of a triangle) Amarda Shehu,Fei Li () Communities 12
13 Tie Strength and Structure in Large-Scale Data Q: Can one test Granovetter s theory with real network data? Hard for decades. Lack of large-scale social interaction surveys Now we have who-calls-whom networks with both key ingredients Network structure of communication among pairs of people Total talking time, i.e., a proxy for tie strength E.g.: Cell-phone network spanning 20% of country s population J. P. Onella et al., Structure and tie strengths in mobile communication networks, PNAS, vol. 104, pp , 2007 How can we generalize weak ties and local bridges? Amarda Shehu,Fei Li () Communities 13
14 Tie Strength and Structure in Large-Scale Data Q: Can one test Granovetter s theory with real network data? Hard for decades. Lack of large-scale social interaction surveys Now we have who-calls-whom networks with both key ingredients Network structure of communication among pairs of people Total talking time, i.e., a proxy for tie strength E.g.: Cell-phone network spanning 20% of country s population J. P. Onella et al., Structure and tie strengths in mobile communication networks, PNAS, vol. 104, pp , 2007 How can we generalize weak ties and local bridges? Amarda Shehu,Fei Li () Communities 13
15 Generalizing Weak Ties and Local Bridges Model described so far imposes sharp dichotomies on the network Edges are either strong or weak, local bridges or not Convenient to have proxies exhibiting smoother gradations Numerical tie strength = Minutes spent in phone conversations Order edges by strength, report their percentile occupancy Generalize local bridges = neighborhood overlap O ij of edge (i, j) O i j = n(i) n(j) n(i) n(j) where n(i) = {j V : (i, j) E} Desirable property: O ij = 0 if (i, j) is a local bridge Amarda Shehu,Fei Li () Communities 14
16 Empirical Results Strength of weak ties prediction: O ij grows with tie strength Dependence borne out very cleanly by the data (o points) Randomly permuted tie strengths, fixed network structure ( points) Effectively removes the coupling between O ij and tie strength Amarda Shehu,Fei Li () Communities 15
17 Phone Network and Tie Strength Cell-phone network with color-coded tie strengths: (1) Stronger ties more structurally-embedded (within communities) (2) Weaker ties correlate with long-range edges joining communities Amarda Shehu,Fei Li () Communities 16
18 Approximating Link Weights If the link weights are driven by the need to transport information or materials, as it is often the case in technological and biological systems, the weights are well approximated by betweenness centrality links connecting communities have high betweenness (red), whereas the links within communities have low betweenness (green) Amarda Shehu,Fei Li () Communities 17
19 Weak Ties Linking Communities Strength of weak ties prediction: long-range, weak ties bridge communities Delete decreasingly weaker (small overlap) edges one at a time Giant component shrinks rapidly, eventually disappears Repeat with strong-to-weak tie deletions = slower shrinkage observed Amarda Shehu,Fei Li () Communities 18
20 Closing the Loop We often think of (social) networks as having the following structure Conceptual picture supported by Granovetter s strength of weak ties Amarda Shehu,Fei Li () Communities 19
21 Communities Nodes in real-world networks organize into communities E.g.: families, clubs, political organizations, proteins by function,... Supported by Granovetter s strength of weak ties theory Community (a.k.a. group, cluster, module) members are: Well connected among themselves Relatively well separated from the rest Exhibit high cohesiveness w.r.t. the underlying relational patterns Amarda Shehu,Fei Li () Communities 20
22 Zachary s Karate Club Social interactions among members of a karate club in the 70s Zachary witnessed the club split in two during his study Toy network, yet canonical (benchmark) for community detection algorithms Offers ground truth community membership (a rare luxury) Amarda Shehu,Fei Li () Communities 21
23 Zachary s Karate Club Social interactions among members of a karate club in the 70s Zachary witnessed the club split in two during his study Toy network, yet canonical (benchmark) for community detection algorithms Offers ground truth community membership (a rare luxury) Amarda Shehu,Fei Li () Communities 21
24 Political Blogs The political blogosphere for the US 2004 presidential election Community structure of liberal and conservative blogs is apparent People have a stronger tendency to interact with equals Amarda Shehu,Fei Li () Communities 22
25 Electrical Power Grid Split power network into areas with minimum inter-area interactions Applications: Decide control areas for distributed power system state estimation Parallel computation of power flow Controlled islanding to prevent spreading of blackouts Amarda Shehu,Fei Li () Communities 23
26 Scientific Collaboration Network E.g.: Coauthorship network of scientists at the Santa Fe Institute Communities found can be traced to different disciplines Amarda Shehu,Fei Li () Communities 24
27 Belgium Population Call pattern of 2M consumers of largest Belgian mobile phone company The color of each community on a red-green scale represents the language spoken in the particular community, red for French and green for Dutch Bicultural society structure is apparent Model for peaceful coexistence? Minimal cross-community contact Amarda Shehu,Fei Li () Communities 25
28 High-school Social Network Network of social interactions among high-school students Strong assortative mixing, with race as latent characteristic Amarda Shehu,Fei Li () Communities 26
29 Co-authorship Network Coauthorship network of physicists publishing in network science Tightly-knit subgroups are evident from the network structure Amarda Shehu,Fei Li () Communities 27
30 College Football Vertices are NCAA football teams, edges are games during Fall 2000 Communities are the NCAA conferences and independent teams Amarda Shehu,Fei Li () Communities 28
31 Social Network Facebook egonet with 744 vertices and 30K edges Asked ego to identify social circles to which friends belong Company, high-school, basketball club, squash club, family Amarda Shehu,Fei Li () Communities 29
32 Biological Network Community structure of biological systems central to understanding and treating human disease community: group of molecules that form functional modules to carry out a specific cellular function E.g.: Proteins involved in same disease tend to interact with one another Disease module hypothesis: Each disease can be linked to a well-defined neighborhood of the cellular network Amarda Shehu,Fei Li () Communities 30
33 Unveiling Network Communities Nodes in real-world networks organize into communities E.g.: families, clubs, political organizations, proteins by function,... Supported by Granovetter s strength of weak ties theory Community (a.k.a. group, cluster, module) members are: Well connected among themselves Relatively well separated from the rest Exhibit high cohesiveness w.r.t. the underlying relational patterns Q: How can we automatically identify such cohesive subgroups? Amarda Shehu,Fei Li () Communities 31
34 Community Detection: What Makes a Community Useful for exploratory analysis of network data E.g.: clues about social interactions, content-related web pages, disease, collective phenomena Community detection is a challenging clustering problem (1) No consensus on the structural definition of community (2) Node subset selection often intractable (3) Lack of ground-truth for validation Connectedness and Density Hypothesis: Connectedness: All members of a community must be reached through other members of the same community Locally dense: Nodes that belong to a community have a higher probability to link to others in the same community than to nodes outside the community. Amarda Shehu,Fei Li () Communities 32
35 Community Detection: What Makes a Community Useful for exploratory analysis of network data E.g.: clues about social interactions, content-related web pages, disease, collective phenomena Community detection is a challenging clustering problem (1) No consensus on the structural definition of community (2) Node subset selection often intractable (3) Lack of ground-truth for validation Connectedness and Density Hypothesis: Connectedness: All members of a community must be reached through other members of the same community Locally dense: Nodes that belong to a community have a higher probability to link to others in the same community than to nodes outside the community. A community is a locally dense connected subgraph in the network Amarda Shehu,Fei Li () Communities 32
36 Communities as Special Subgraphs Luce & Perry, A method of matrix analysis of group structure in Psychometrika, 1949 defined a community as a complete subgraph, i.e., a clique A clique automatically satisfies the criterion a locally dense connected subgraph but misses legitimate, imperfect communities Consider a node i in a connected subgraph C G that has N c N nodes Let ki int be the number of links that connect i to nodes in C Let ki ext be the number of links that connect i to nodes in GC C is a strong community if each node within C has more links within C than outside: i C, ki int > ki ext C is a weak community if the inequality above does not hold for all nodes in C but the following observation can be made regarding the total internal and total external degree: i C kint i > i C kext i Amarda Shehu,Fei Li () Communities 33
37 Communities as Special Subgraphs Luce & Perry, A method of matrix analysis of group structure in Psychometrika, 1949 defined a community as a complete subgraph, i.e., a clique A clique automatically satisfies the criterion a locally dense connected subgraph but misses legitimate, imperfect communities Consider a node i in a connected subgraph C G that has N c N nodes Let ki int be the number of links that connect i to nodes in C Let ki ext be the number of links that connect i to nodes in GC C is a strong community if each node within C has more links within C than outside: i C, ki int > ki ext C is a weak community if the inequality above does not hold for all nodes in C but the following observation can be made regarding the total internal and total external degree: i C kint i > i C kext i How many communities are there? To answer this, pose community detection as graph partitioning and a simple version of it, graph bisection...next Amarda Shehu,Fei Li () Communities 33
38 Community Detection and Graph Partitioning Graph Partitioning Split V into given number of non-overlapping groups of given sizes Criterion: number of edges between groups is minimized This is known as minimizing cut E.g.: task-processor assignment for load balancing, chip design Number and sizes of groups unspecified in community detection Identify the natural fault lines along which a network separates A special version of this problem is graph bisection: Partition the graph G into two non-overlapping subgraphs, such that the number of edges/links between subgraphs, called cut size is minimized. Amarda Shehu,Fei Li () Communities 34
39 Graph Partitioning is Hard E.g.: Graph bisection problem, i.e., partition V into two groups ( V = N) Suppose the groups V 1 and V 2 are non-overlapping Suppose groups have equal size, i.e., V 1 = V 2 = N/2 Minimize edges running between vertices in different groups Simple problem to describe, but hard to solve Number of ways to partition V : ( N ) N/2 2 N+1 N Used Stirling s formula N! 2πN(N/e) N In this case, ( N ) N/2 e [(N+1)ln2 1 2 lnn] Since number of bisections increases exponentially with size of network, Exhaustive/brute-force search intractable beyond toy small-sized networks No smart (i.e., polynomial time) algorithm, NP-hard problem Seek good heuristics, e.g., relaxations of natural criteria Amarda Shehu,Fei Li () Communities 35
40 Community Detection and Graph Partitioning In community detection, size and number of communities not known a priori Let s call a partition a division of a network into an arbitrary number of groups, s.t. each node belongs to one and only one group (non-overlapping) The number of possible partitions (the Bell Number) is B N = 1 e j=0 j N j! Need to identify communities without the luxury of inspecting all possible partitions but with some freedom on the definition of a community Amarda Shehu,Fei Li () Communities 36
41 Community Detection and Graph Partitioning Need algorithms whose running time grows polynomially with respect to N Clustering algorithms address this need They identify communities by either growing communities or dividing the graph Many exploit the strength of weak ties Short detour... Amarda Shehu,Fei Li () Communities 37
42 Strength of Weak Ties Motivation Local bridges connect weakly interacting parts of the network Q: What about removing those to reveal communities? Challenges: Multiple local bridges. Some better that others? Which one first? There might be no local bridge, yet an apparent natural division Amarda Shehu,Fei Li () Communities 38
43 Edge Betweenness Centrality Idea: high edge betweenness centrality to identify weak ties High cbe(e) edges carry large traffic volume over shortest paths Position at the interface between tightly-knit groups E.g.: cell-phone network with colored edge strength and betweenness Amarda Shehu,Fei Li () Communities 39
44 Girvan-Newman s method Girvan-Newman s method is conceptually simple Find and remove spanning links between cohesive subgroups Algorithm: Repeat until there are no edges left Calculate the betweenness centrality cbe(e) of all edges Remove edge(s) with highest cbe(e) Connected components are the communities identified Divisive method: network falls apart into pieces as we go Nested partition: larger communities potentially host denser groups Recompute edge betweenness after per step M. Girvan and M. Newman, Community structure in social and biological networks, PNAS, vol. 99, pp , 2002 Amarda Shehu,Fei Li () Communities 40
45 Girvan-Newman s Algorithm in Action Amarda Shehu,Fei Li () Communities 41
46 Hierarchical Clustering: Common Framework Another way to look at Girvan-Newman s algorithm is as a divisive hierarchical clustering algorithm We will cover two hierarchical clustering algorithms (domain is very rich): Ravasz algorithm (agglomerative) Girvan-Newman s algorithm (divisive) In principle, any clustering algorithm can be adapted and applied to detect communities in a network Common framework: exploitation of a similarity or dissimilarity matrix whose elements x ij indicate the similarity or distance of node i from node j E.g., similarity extracted from relative positions of i and j within the network Agglomerative: merge nodes with high similarity into the same community Divisive: isolate communities by removing low-similarity (high dissimilarity), cross-community links Amarda Shehu,Fei Li () Communities 42
47 Agglomerative Approach to Hierarchical Clustering Take Ravasz algorithm as an example; designed to detect functional modules in metabolic networks [E. Ravasz and A.-L. Barabasi. Hierarchical organization in complex networks. Physical Review E 2003] Key idea: similarity should be high for nodes that belong to the same community and low for nodes residing in different communities Proposal: nodes that connect to each-other and share neighbors belong to the J(i,j) same community = topological similarity = x ij = min k i,k j +1 θ(a ij ) θ(x) = 0 for x 0 and 1 for x > 0 (Heaviside step function) J(i, j) is nr. of shared neighbors (+1 if (i, j) E), and k i, k j are degrees So, x ij = 1 if nodes i and j share the same neighbors x ij = 0 if i and j do not share any neighbors any value in [0, 1] Amarda Shehu,Fei Li () Communities 43
48 Ravasz-Barabasi Hierarchical Clustering Algorithm Amarda Shehu,Fei Li () Communities 44
49 Ravasz-Barabasi Hierarchical Clustering Algorithm What to do with this similarity matrix? = Neighbor Joining Amarda Shehu,Fei Li () Communities 44
50 Ravasz-Barabasi Hierarchical Clustering Algorithm S0: Assign each node i to its own community {i} S1: Evaluate x ij for all pairs of communities S2: Find the community pair with the highest similarity and join/merge them into a single community S3: If there is more than one community, go to S1 How do we determine similarity between communities of more than one node? = Several options: Single, complete, or average group/cluster similarity Single-link clustering: similarity of two clusters is the similarity of their most similar members Complete-link clustering: the similarity of two clusters is the similarity of their most dissimilar members In Ravasz-Barabasi, average cluster similarity is used: similarity between all pairs of nodes residing in two different clusters is computed and averaged to report the similarity between two groups/clusters Amarda Shehu,Fei Li () Communities 45
51 Cluster Similarity Figure: (a) Pairwise similarity matrix. (b) Single-link(age) clustering. (c) Complete-link clustering. (d) Average link clustering. Amarda Shehu,Fei Li () Communities 46
52 Ravasz-Barabasi Hierarchical Clustering Algorithm: Time Complexity S0: Assign each node i to its own community {i} = O(N) worst-case running time S1: Evaluate x ij for all pairs of communities = O(N 2 ) first time around S2: Find the community pair with the highest similarity and join/merge them into a single community S1 + S2 = at worst N times determining distance of new cluster to others = O(N 2 ) time Total time: O(N 2 ) Where are the communities? What was the order in which smaller communities were joined to larger ones? = need to maintain an order of joining events tree = adds O(NlogN) to running time Amarda Shehu,Fei Li () Communities 47
53 Ravasz-Barabasi Hierarchical Clustering Algorithm: Partitions The order in which communities are merged/joined together can be visualized via a dendogram tree. Can you infer order of events? Amarda Shehu,Fei Li () Communities 48
54 Ravasz-Barabasi Hierarchical Clustering Algorithm: Partitions The order in which communities are merged/joined together can be visualized via a dendogram tree. Can you infer order of events? Where are the communities? (more later) Amarda Shehu,Fei Li () Communities 48
55 Girvan-Newman Hierarchical Clustering Algorithm A divisive algorithm that relies on a dissimilarity matrix x i j Proposal: exploit x ij to select node pairs that are in different communities x ij should be high for nodes in different communities and low for nodes in same community Reasonable options: link betweenness and random walk betweenness link betweenness fastest to calculate In Girvan-Newman: link betweenness x ij defined as the number of shortest paths that go through the link (i, j) If I am to remove links one at a time based on their high-to-low x ij value, does link betweenness make sense? What values of x ij do I expect from nodes that connect communities? Amarda Shehu,Fei Li () Communities 49
56 Girvan-Newman Hierarchical Clustering Algorithm Algorithm S1: Compute centrality of each link S2: Remove link with highest centrality S3: Recalculate centrality of each link in altered network S4: Repeat S2-S3 until all links are removed Computational Complexity Most expensive step is recalculation of centrality If link betweenness used, O(LN) worst-case running time Step S3 adds a factor of L to this time Total time: O(L 2 N) Amarda Shehu,Fei Li () Communities 50
57 Girvan-Newman Hierarchical Clustering Algorithm Figure: (a)-(d) sequence of link removals. (e) Accompanying dendogram. (f) Which partition is optimal? Modularity helps us determine that. Amarda Shehu,Fei Li () Communities 51
58 Partitions on the Dendogram Figure: (a) dendogram. (b)-(d) different partitions. Amarda Shehu,Fei Li () Communities 52
59 Dendograms Show Partitions, Not Final Partition! Hierarchical partitions often represented with a dendogram Shows groups found in the network at all algorithmic steps Split the network at different resolutions E.g.: Girvan-Newman s algorithm for the Zachary s karate club Q: Which of the divisions/partitions is the most useful/optimal in some sense? A: Need to define metrics to compare partitions! Amarda Shehu,Fei Li () Communities 53
60 Clustering, Optimality, and Community Structure in Real Networks Hierarchical clustering assumes nested communities, as captured by the dendograms these algorithms produce The density hypothesis states that there are locally-dense subgraphs weakly linked to other locally-dense subgraphs Q1: How do we know such organization exists in real networks? Q2: If community structure exists, then how can hierarchical clustering algorithms be used to reveal the optimal organization/partition? Amarda Shehu,Fei Li () Communities 54
61 Clustering, Optimality, and Community Structure in Real Networks Hierarchical clustering assumes nested communities, as captured by the dendograms these algorithms produce The density hypothesis states that there are locally-dense subgraphs weakly linked to other locally-dense subgraphs Q1: How do we know such organization exists in real networks? Hubs connect communities = conflict between scale-free property and community structure? Need to develop a model that reconciles the two Q2: If community structure exists, then how can hierarchical clustering algorithms be used to reveal the optimal organization/partition? Amarda Shehu,Fei Li () Communities 54
62 Clustering, Optimality, and Community Structure in Real Networks Hierarchical clustering assumes nested communities, as captured by the dendograms these algorithms produce The density hypothesis states that there are locally-dense subgraphs weakly linked to other locally-dense subgraphs Q1: How do we know such organization exists in real networks? Hubs connect communities = conflict between scale-free property and community structure? Need to develop a model that reconciles the two Q2: If community structure exists, then how can hierarchical clustering algorithms be used to reveal the optimal organization/partition? Need a way to rank/score partitions and report optimal one Amarda Shehu,Fei Li () Communities 54
63 Reconciling Scale-free Property and Community Structure Left: Start with a module of 5 nodes, one at center of a square. Connect all of them to one another. Middle: Create four identical replicas and connect the peripheral nodes of each module to the central node of original module. Right: Create four replicas of the 25-node module and connect the peripheral nodes again to central node of original module, obtaining a 125-node network. Process can be repeated ad infinitum. Amarda Shehu,Fei Li () Communities 55
64 Hierarchical Network Model The hierarchical network model generates a scale-free network with degree exponent γ = 1 + ln5 ln4 = For Erdos-Renyi and Barabasi-Albert models, clustering coefficient C decreases with N; C = for the hierarchical network model N-independent clustering coefficient observed in metabolic networks Quantitative signature of nested community (hierarchical modularity) is the dependence of a node s clustering coefficient on its degree: C(k) k 1 The higher a node s degree, the smaller its clustering coefficient Small-degree nodes have high C because they reside in dense communities High-degree nodes have low C because they connect to different communities Inspecting C(k) allows determining whether a network is hierarchical Power grid lacks hierarchical modularity (C(k) independent of k) In many real networks, C(k) decreases with k More detailed models predict C(k) k β with β [0, 2] Amarda Shehu,Fei Li () Communities 56
65 Modularity Size of communities typically unknown = Identify automatically Devise a function that measures how well a network is partitioned in communities Intuition: density of edges in communities higher than expected at random Hypothesis: Randomly-wired networks lack an inherent community structure By comparing link density of a subgraph with link density obtained for the same group of nodes but randomly wired, one can determine whether the found subgraph is a community Systematic deviations from a random configuration allow defining a quantity called modularity to measure the quality of a particular partition Modularity offers a way to determine the optimal partition from the dendogram and so predict a partition Modularity optimization also offers a novel approach to community detection via optimization algorithms Amarda Shehu,Fei Li () Communities 57
66 Modularity of a Proposed Community Suppose a subgraph s has been identified and proposed to be a community One can measure its modularity as: Q(s) [(# of edges within group s) E[# of such edges]] Operationalizes hypothesis that there should be more edges within a true community than what would be obtained in a null model Proposal: Q(s) = 1 (i,j) s (A ij p ij ) 2L L is number of edges in the entire graph/network p ij is probability of an edge by chance, obtained by randomizing original network G while preserving expected degree of each node Under degree-preserving null model: p ij = k i k j 2L. Why? Amarda Shehu,Fei Li () Communities 58
67 Expected Connectivity Among Nodes Null model: randomize edges preserving degree distribution in G Random variable A ij := I{(i, j) E} Expectation is E[A ij ] = P((i, j) E) Suppose node i has degree k i, node j has degree k j Degree is # of spokes per node, 2L spokes in G Probability spoke i k is connected to j is P((i, j) E) k j 2L 1 k j 2L = P( k i i k =1 {spoke i k connected to node j}) = k i i k =1 P(spoke i k connected to node j) = k i k j 2L Amarda Shehu,Fei Li () Communities 59
68 Expected Connectivity Among Nodes Null model: randomize edges preserving degree distribution in G Random variable A ij := I{(i, j) E} Expectation is E[A ij ] = P((i, j) E) Suppose node i has degree k i, node j has degree k j Degree is # of spokes per node, 2L spokes in G Probability spoke i k is connected to j is P((i, j) E) k j 2L 1 k j 2L = P( k i i k =1 {spoke i k connected to node j}) = k i i k =1 P(spoke i k connected to node j) = k i k j 2L Back to modularity of a subgraph and modularity of a partition Amarda Shehu,Fei Li () Communities 59
69 From Modularity of a Subgraph to Modularity of a Partition So, modularity Q(s) of a subgraph s is: Q(s) = Ls L ( ks 2L )2, where L s is the total number of edges in s and k s is total degree of nodes in s Now consider a partition S of a graph G into groups/subgraphs s S Summing over all groups s S gives us the modularity score of a particular partition S: Q(G, S) = s S Q(s) Formally, after normalization Q(G, S) [ 1, 1] Can use modularity to rank partitions and pick best one from dendogram tree Can evaluate the modularity of each partition in a dendrogram Maximum value gives the best community structure How sensitive is modularity to changes to a partition? Amarda Shehu,Fei Li () Communities 60
70 Examples of Modularity Values Amarda Shehu,Fei Li () Communities 61
71 Using Modularity to Assess Clustering Quality E.g.: Girvan-Newman s algorithm for the Zachary s karate club Q: Why not optimize Q(G, S) directly over possible partitions S? Amarda Shehu,Fei Li () Communities 62
72 Using Modularity to Assess Clustering Quality E.g.: Girvan-Newman s algorithm for the Zachary s karate club Q: Why not optimize Q(G, S) directly over possible partitions S? A: How complex is modularity landscape of partition space? Amarda Shehu,Fei Li () Communities 62
73 Modularity Landscape A greedy algorithm seeking maximally modular partition runs in O(Nlog 2 N) time Modularity maximization forces small weakly-connected communities into larger ones Optimal partition difficult to identify among a large number of close to optimal partitions = modularity landscape is complex Amarda Shehu,Fei Li () Communities 63
74 Modularity Maximization Algorithms If the goal is to find partition with maximal modularity, this is an optimization problem Searching for local maxima in a complex search space (fitness landscape) is handled very well by evolutionary algorithms Shi, Wy, Zhong, A New Genetic Algorithm for Community Detection, Complex Sciences, Springer 2009 Gong, Ma, Zhang, Jiao, Community detection in networks by using multiobjective evolutionary algorithm with decomposition, Physica A: Statistical Mechanics and Its Applications 2012 Key concept: evolving population of partitions in response to optimizing modularity Do not rely on similarity or dissimilarity matrix and therefore scale to very networks Multi-objective EAs address two conflicting objectives: negative ratio association (divides into small communities) and ratio cut (divides into large communities) Various reviews exist Amarda Shehu,Fei Li () Communities 64
75 Overlapping Communities Viscek and collaborators proposed the concept of overlapping communities Clique percolation algorithm views a community as a union of overlapping cliques: based on knowledge that k-cliques naturally emerge in sufficiently dense networks Two k-cliques are adjacent if sharing k 1 nodes A k-clique community is largest connected subgraph obtained by union of all adjacent k-cliques k-cliques that cannot be reached from a particular k-clique belong to other k-clique communities Polynomial running time when cliques are small Link clustering algorithm depends on definition of link similarity Similarity of two links is determined by the neighborhood of nodes connected by them Hierarchical clustering applied to group links into communities Amarda Shehu,Fei Li () Communities 65
76 Testing Community Detection Algorithms In absence of ground truth, how is one to compare community detection algorithms? Modularity-based comparison only helps so far Other measures used, defined to compare quality of clusters by clustering algorithms (e.g., information gain) Other measures compare the behavior of algorithms at various community sizes [Leskovec, Lang, Mahoney, Empirical Comparison of Algorithms for Network Community Detection, IW3C ] Figure: Network community profile (NCP) characterizes the quality of network communities as a function of their size. Universal V shape of NCP summarizes large-scale community structure of real networks. Amarda Shehu,Fei Li () Communities 66
77 Advanced (more Theoretical) Topics/Treatments Modularity maximization can be proved to be NP-hard Relaxing modularity maximization via the Lagrange technique provides one approach to modularity maximization Stochastic optimization algorithms (such as EAs) more generally applicable and scalable on real networks Relaxation uses eigendecomposition and specific properties of dominant eigenvector Graph partition recast as graph cut problem Minimize-cut treatment opens way to other algorithms that use the Laplacian to determine the connected components of the graph Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 67
78 Modularity Revisited Recall our definition of modularity Q(G, S) = 1 2N e s S i,j s [A ij d i d j 2N e ] Let g i be the group membership of vertex i and rewrite Q(G, S) = 1 2N e i,j V [A ij d i d j 2N e ]I{g i = g j } Define for convenience the summands B ij = A ij d i d j 2N e Both marginal sums of B ij vanish, since e.g.: j B ij = j A ij d i 2N e j d j = d i d i 2N e 2N e = 0 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 68
79 Graph Bisection Consider (for simplicity) dividing the network in two groups Binary community membership variables per vertex: s i = +1 if vertex i belongs to group 1 s i = 1 if vertex i belongs to group 2 Using the identity 1 (s 2 is j + 1) = I{g i = g j }, the modularity is: Q(G, S) = 1 2N e i,j V [A ij d i d j 2N e ]I{g i = g j } = 1 4N e i,j V B ij(s i s j + 1) Recall j B ij = 0 to obtain the simpler expression: Q(G, S = 1 4N e i,j V B ijs i s j Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 69
80 Nv Nv Let B R b Optimizing Modularity the modularity matrix with entries B ij = A ij d i d j 2N e Any partition S is defined by the vector s = [s 1... s Nv ] T Modularity is a quadratic form: Q(G, S) = 1 4N e i,j V B ijs i s j = 1 4N e s T Bs Modularity as criterion for graph bisection yields the formulation: ŝ = arg max s { 1,+1} Nv s T Bs Nasty binary constraints s { 1, +1} Nv (hypercube vertices) Modularity optimization is NP-hard [Brandes et al 2006] Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 70
81 Relaxation Relax the constraint s { 1, +1} Nv to s R Nv, s 2 = 1 ŝ = arg max s s T Bs s.t. s T s = 1 Associate a Lagrange multiplier λ to the constraint s T s = 1 Optimality conditions yield: s [s T Bs + λ(1 s T s)] = 0 = Bs = λs Conclusion is that s is an eigenvector of B with eigenvalue λ Q: Which eigenvector should we pick? At optimum, Bs = λs so objective becomes: s T Bs = λs T s = λ A: To maximize modularity pick the dominant eigenvector of B Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 71
82 Spectral Modularity Maximization Let u 1 be the dominant eigenvector of B, with i th [u 1] i Cannot just set s = u 1 because u 1 { 1, +1} Nv Best effort: maximize their similarity s T u 1 which gives: s i =sign([u 1 ]) i +1 when [u 1 ] i ] > 0 and 1 otherwise Spectral modularity maximization algorithm S1: Compute modularity matrix B with entries B ij = j A ij d i 2N e S2: Find a dominant eigenvector u 1 f B (e.g., power method) S3: Cluster membership of vertex i is s i =sign([u 1 ] i ) Multiple (> 2) communities through e.g., repeated graph bisection Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 72
83 Example: Zachary s Karate Club Spectral modularity maximization Shapes of vertices indicate community membership Dotted line indicates partition found by the algorithm Vertex colors indicate the strength of their membership Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 73
84 Graph Bisection Consider an undirected graph G = (V, E) E.g.: Graph bisection problem, i.e., partition V into two groups Groups V 1 and V 2 = V C 1 are non-overlapping Groups have given sizes, i.e., V 1 = N 1 and V 2 = N 2 Q: What is a good criterion to partition the graph? A: We have already seen modularity. Let s see a different one Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 74
85 Graph Cut Desiderata: Community members should be Well-connected among themselves Relatively well-separated from rest of nodes Def: A cut C is the number of edges between groups 1 and 2 C = cut(v 1, V 2 ) = i V 1,j V 2 A ij Natural criterion: minimize cut, i.e., edges across groups 1 and 2 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 75
86 From Graph Cuts Binary community membership variable per vertex s i = +1, vertex i belongs to group 1 s i = 1 otherwise Let g i be the group membership of vertex i, s.t.: I{g i g j } = 1 2 (1 s is j ) = 1 if i and j in different groups and 0 otherwise Cut expressible in terms of the variables s i as: C = i V 1,j V 2 A ij = 1 2 i,j V A ij(1 s i s j ) Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 76
87 to the Graph Laplacian Matrix First summand in C = 1 2 i,j A ij(1 s i s j is: i,j V A ij = i V d i = i V d isi 2 = i,j Vd i s i s j I{i=j} Used si 2 = 1 since s i { 1, +1}. The cut becomes: C = 1 2 i,j V (d ii{i = j} A)ij)s i s j = 1 2 i,j V i,j V L ijs i s j Cut in terms of L ij, entries of the graph Laplacian L = D A, i.e.,: C(s) = 1 2 st Ls, s = [s 1... s Nv ] T Maximize modularity Q(s) s T Bs v.s. minimize cut C(s) s T Ls Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 77
88 Laplacian Matrix Properties Revisited Smoothness: For any vector x R Nv of vertex values, one has: x T Lx = i,j V L ijx i x j = (i,j) E (x i x j ) 2 which can be minimized to enforce smoothness of functions on G Positive semi-definiteness: Follows since x T Lx > 0 x R Nv Spectrum: All eigenvalues of L are real and and non-negative Eigenvectors form an orthonormal basis of R Nv Rank deficiency: Since L1 = 0, L is rank deficient Spectrum and connectivity: The smallest eigenvalue λ 1 of L is 0 If the second-smallest eigenvalue λ 2 = 0, then G is connected If L has n zero eigenvalues, G has n connected components Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 78
89 Graph Cut Minimization Since V = N 1 and V 2 = N 2 = N N 1, then we have the constraint: i V s i = i V 1 (+1) + i V 2 ( 1) = N 1 N 2 = 1 T s = N 1 N 2 Minimum-cut criterion for graph bisection yields the formulation ŝ = arg min s { 1,+1} Nv s T Ls s.t. 1 T s = N 1 N 2 Again, binary constraints s { 1, +1} Nv render cut minimization hard Relax binary constraints as with modularity maximization Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 79
90 Further Intuition Since s T Ls = (i,j) E (s i s j ) 2, the minimum-cut formulation is: ŝ = arg min s { 1,+1} Nv (i,j) E (s i s j ) 2 s.t. 1 T s = N 1 N 2 Q: Does this equivalent cost function make sense? A: Absolutely! Edges joining vertices in the same group do not add to the sum Edges joining vertices in different groups add 4 to the sum Minimize cut: assign values s i to nodes i such that few edges cross 0 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 80
91 Further Intuition Since s T Ls = (i,j) E (s i s j ) 2, the minimum-cut formulation is: ŝ = arg min s { 1,+1} Nv (i,j) E (s i s j ) 2 s.t. 1 T s = N 1 N 2 Q: Does this equivalent cost function make sense? A: Absolutely! Edges joining vertices in the same group do not add to the sum Edges joining vertices in different groups add 4 to the sum Minimize cut: assign values s i to nodes i such that few edges cross 0 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 80
92 Minimum-Cut Relaxation Relax the constraint s { 1, +1} Nv to s R Nv, s 2 = 1 ŝ =arg min s s T Ls, 1 T s = N 1 N 2, s T s = 1 Straightforward to solve using Lagrange multipliers Characterization of the solution ŝ [Fiedler 1973] ŝ = v 2 + N 1 N 2 N v 1 The second-smallest eigenvector v 2 of L satisfies 1 T v 2 = 0 Minimum cut C(ŝ) = ŝ T Lŝ = v T 2 Lv 2 λ 2 If the graph G is disconnected then we know lambda 2 = 0 = C(ŝq) If G is amenable to bisection, the cut is small and so is λ 2 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 81
93 Theoretical Guarantee Consider a partition of G into V 1 and V 2, where V 1 V 2 If G is connected, then the Cheeger inequality asserts: α 2 2d max λ 2 2α where α = C V 1 Certifies that λ 2 gives a useful bound and d max is the maximum node degree F. Chung, Four proofs for the Cheeger inequality and graph partition algorithms, Proc. of ICCM, 2007 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 82
94 Spectral Graph Bisection Q: How to obtain the binary cluster labels s { 1, +1} Nv from ŝ R Nv? Again, maximize the similarity measure s T ŝ where s i =sign([v 2 ] i ) (+1 if [v 2 ] i > 0 and -1 otherwise) Spectral graph bisection algorithm S1: Compute Laplacian matrix L with entries L ij = D ij A ij S2: Find second smallest eigenvector v 2 of L S3: Cluster membership of vertex i is s i =sign([v 2 ] i ) Complexity: efficient Lanczos algorithm variant in O( N e λ 3 λ 2 ) Nomenclature: v 2 is known as the Fiedler vector Eigenvalue λ 2 is Fiedler value, or algebraic connectivity of G Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 83
95 Spectral Gap in Fidler Vector Entries Suppose G is disconnected and has two connected components L is block diagonal, two smallest eigenvectors indicate groups, i.e. v 1 = [1, 1,..., 1, 0,..., 0] T and v 2 = [0, 0,..., 0, 1,..., 1] T If G is connected but amenable to bisection, v 1 = 1 and λ 2 0 Also, 1 T v 2 = i [v 2] i = 0 = Positive and negative entries in v 2 Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 84
96 Unknown Community Sizes Consider the graph bisection problem with unknown group sizes Minimizing the graph cut may be no longer meaningful! Figure: Cost C = i V 1,j V 2 A ij agnostic to groups internal structures Better criterion is the ratio cut R defined as R = C V 1 + C V 2 Balanced partitions: small community is penalized by the cost Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 85
97 Ratio-Cut Minimization Fix a bisection S of G into groups V 1 and V 2 Define f : f (S) = [f 1,..., f N ] T R Nv with entries: f i = V2 V 1 if vertex i belongs to V 1 and f i = V1 V 2 of vertex i belongs to V 2 One can establish the following properties: P1: f T Lf = N v R(S) P2: i f i = 0, i.e., 1 T f = 0 P3: f 2 = N v From P1-P3 it follows that ratio-cut minimization is equivalent to: min f f T Lf s.t. 1 T f = 0 and f T f = N v Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 86
98 Ratio Cut and Spectral Graph Bisection Ratio-cut minimization is also NP-hard. Relax to obtain: ŝ =arg min s R Nv s T Ls s.t. 1 T s = 0 and s T s = N v Partition Ŝ also given by the spectral graph bisection algorithm S1: Compute Laplacian matrix L with entries L ij = D ij A ij S2: Find second smallest eigenvector v 2 of L S3: Cluster membership of vertex i is s i =sign([v 2 ] i ) Alternative criterion is the normalized cut NC defined as: NC = C vol(v 1 ) + C vol(v 2 ), where vol(v i) = v V i d v, i {1, 2} Corresponds to using the normalized Laplacian D 1 L Amarda Shehu,Fei Li () Advanced (more Theoretical) Topics/Treatments 87
Network Community Detection
Network Community Detection Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ March 20, 2018
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationTie strength, social capital, betweenness and homophily. Rik Sarkar
Tie strength, social capital, betweenness and homophily Rik Sarkar Networks Position of a node in a network determines its role/importance Structure of a network determines its properties 2 Today Notion
More informationWeb Structure Mining Community Detection and Evaluation
Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside
More informationNon Overlapping Communities
Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides
More informationSocial Data Management Communities
Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationModularity CMSC 858L
Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look
More informationMy favorite application using eigenvalues: partitioning and community detection in social networks
My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,
More informationClusters and Communities
Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Today s Biz 1. Reminders 2. Review 3. Communities 4. Betweenness and Graph Partitioning 5. Label Propagation 2 / 14 Today s Biz 1. Reminders
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationINTRODUCTION SECTION 9.1
SECTION 9. INTRODUCTION Belgium appears to be the model bicultural society: 59% of its citizens are Flemish, speaking Dutch and 40% are Walloons who speak French. As multiethnic countries break up all
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationNETWORK SCIENCE COMMUNITIES
9 ALBERT-LÁSZLÓ BARABÁSI NETWORK SCIENCE NORMCORE ONCE UPON A TIME PEOPLE WERE BORN INTO AND HAD TO FIND THEIR INDIVIDUALITY. TODAY PEOPLE ARE BORN INDIVIDUALS AND HAVE TO FIND THEIR. MASS INDIE RESPONDS
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationCommunity Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal
Community Structure Detection Amar Chandole Ameya Kabre Atishay Aggarwal What is a network? Group or system of interconnected people or things Ways to represent a network: Matrices Sets Sequences Time
More informationCSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks
CSE 258 Lecture 12 Web Mining and Recommender Systems Social networks Social networks We ve already seen networks (a little bit) in week 3 i.e., we ve studied inference problems defined on graphs, and
More informationCAIM: Cerca i Anàlisi d Informació Massiva
1 / 72 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim
More informationV4 Matrix algorithms and graph partitioning
V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community
More informationProperties of Biological Networks
Properties of Biological Networks presented by: Ola Hamud June 12, 2013 Supervisor: Prof. Ron Pinter Based on: NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION By Albert-László Barabási
More informationCSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationCSE 158 Lecture 11. Web Mining and Recommender Systems. Social networks
CSE 158 Lecture 11 Web Mining and Recommender Systems Social networks Assignment 1 Due 5pm next Monday! (Kaggle shows UTC time, but the due date is 5pm, Monday, PST) Assignment 1 Assignment 1 Social networks
More informationCSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection
CSE 258 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationEconomics of Information Networks
Economics of Information Networks Stephen Turnbull Division of Policy and Planning Sciences Lecture 4: December 7, 2017 Abstract We continue discussion of the modern economics of networks, which considers
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More informationAn Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization
An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science
More informationCS 534: Computer Vision Segmentation and Perceptual Grouping
CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation
More informationDistribution-Free Models of Social and Information Networks
Distribution-Free Models of Social and Information Networks Tim Roughgarden (Stanford CS) joint work with Jacob Fox (Stanford Math), Rishi Gupta (Stanford CS), C. Seshadhri (UC Santa Cruz), Fan Wei (Stanford
More informationE-Companion: On Styles in Product Design: An Analysis of US. Design Patents
E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing
More informationStructure of Social Networks
Structure of Social Networks Outline Structure of social networks Applications of structural analysis Social *networks* Twitter Facebook Linked-in IMs Email Real life Address books... Who Twitter #numbers
More information1 Homophily and assortative mixing
1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate
More informationCommunity detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati
Community detection algorithms survey and overlapping communities Presented by Sai Ravi Kiran Mallampati (sairavi5@vt.edu) 1 Outline Various community detection algorithms: Intuition * Evaluation of the
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationNetworks in economics and finance. Lecture 1 - Measuring networks
Networks in economics and finance Lecture 1 - Measuring networks What are networks and why study them? A network is a set of items (nodes) connected by edges or links. Units (nodes) Individuals Firms Banks
More informationClustering Algorithms for general similarity measures
Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative
More informationOh Pott, Oh Pott! or how to detect community structure in complex networks
Oh Pott, Oh Pott! or how to detect community structure in complex networks Jörg Reichardt Interdisciplinary Centre for Bioinformatics, Leipzig, Germany (Host of the 2012 Olympics) Questions to start from
More informationExtracting Communities from Networks
Extracting Communities from Networks Ji Zhu Department of Statistics, University of Michigan Joint work with Yunpeng Zhao and Elizaveta Levina Outline Review of community detection Community extraction
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationTHE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017
THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS Summer semester, 2016/2017 SOCIAL NETWORK ANALYSIS: THEORY AND APPLICATIONS 1. A FEW THINGS ABOUT NETWORKS NETWORKS IN THE REAL WORLD There are four categories
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #21: Graph Mining 2
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #21: Graph Mining 2 Networks & Communi>es We o@en think of networks being organized into modules, cluster, communi>es: VT CS 5614 2 Goal:
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationCommunity Structure and Beyond
Community Structure and Beyond Elizabeth A. Leicht MAE: 298 April 9, 2009 Why do we care about community structure? Large Networks Discussion Outline Overview of past work on community structure. How to
More informationSignal Processing for Big Data
Signal Processing for Big Data Sergio Barbarossa 1 Summary 1. Networks 2.Algebraic graph theory 3. Random graph models 4. OperaGons on graphs 2 Networks The simplest way to represent the interaction between
More informationCentrality Book. cohesion.
Cohesion The graph-theoretic terms discussed in the previous chapter have very specific and concrete meanings which are highly shared across the field of graph theory and other fields like social network
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationCSE 158 Lecture 11. Web Mining and Recommender Systems. Triadic closure; strong & weak ties
CSE 158 Lecture 11 Web Mining and Recommender Systems Triadic closure; strong & weak ties Triangles So far we ve seen (a little about) how networks can be characterized by their connectivity patterns What
More informationCharacterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)
S-72.2420/T-79.5203 Basic Concepts 1 S-72.2420/T-79.5203 Basic Concepts 3 Characterizing Graphs (1) Characterizing Graphs (3) Characterizing a class G by a condition P means proving the equivalence G G
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationCommunity Analysis. Chapter 6
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
More informationConstructing a G(N, p) Network
Random Graph Theory Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Introduction At first inspection, most
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationSmall Survey on Perfect Graphs
Small Survey on Perfect Graphs Michele Alberti ENS Lyon December 8, 2010 Abstract This is a small survey on the exciting world of Perfect Graphs. We will see when a graph is perfect and which are families
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationGraph Theory Review. January 30, Network Science Analytics Graph Theory Review 1
Graph Theory Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ January 30, 2018 Network
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationCSE 158 Lecture 13. Web Mining and Recommender Systems. Triadic closure; strong & weak ties
CSE 158 Lecture 13 Web Mining and Recommender Systems Triadic closure; strong & weak ties Monday Random models of networks: Erdos Renyi random graphs (picture from Wikipedia http://en.wikipedia.org/wiki/erd%c5%91s%e2%80%93r%c3%a9nyi_model)
More informationSocial Networks, Social Interaction, and Outcomes
Social Networks, Social Interaction, and Outcomes Social Networks and Social Interaction Not all links are created equal: Some links are stronger than others Some links are more useful than others Links
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationCSE 255 Lecture 13. Data Mining and Predictive Analytics. Triadic closure; strong & weak ties
CSE 255 Lecture 13 Data Mining and Predictive Analytics Triadic closure; strong & weak ties Monday Random models of networks: Erdos Renyi random graphs (picture from Wikipedia http://en.wikipedia.org/wiki/erd%c5%91s%e2%80%93r%c3%a9nyi_model)
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationBasics of Network Analysis
Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationJure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research
Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction
More informationSocial and Information Network Analysis Positive and Negative Relationships
Social and Information Network Analysis Positive and Negative Relationships Cornelia Caragea Department of Computer Science and Engineering University of North Texas June 14, 2016 To Recap Triadic Closure
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationConstructing a G(N, p) Network
Random Graph Theory Dr. Natarajan Meghanathan Associate Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Introduction At first inspection,
More informationTypes of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters
Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive
More informationAlessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data
Wasserman and Faust, Chapter 3: Notation for Social Network Data Three different network notational schemes Graph theoretic: the most useful for centrality and prestige methods, cohesive subgroup ideas,
More informationNetwork Clustering. Balabhaskar Balasundaram, Sergiy Butenko
Network Clustering Balabhaskar Balasundaram, Sergiy Butenko Department of Industrial & Systems Engineering Texas A&M University College Station, Texas 77843, USA. Introduction Clustering can be loosely
More informationThis is not the chapter you re looking for [handwave]
Chapter 4 This is not the chapter you re looking for [handwave] The plan of this course was to have 4 parts, and the fourth part would be half on analyzing random graphs, and looking at different models,
More informationNotes for Lecture 24
U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined
More informationCSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena
CSE 190 Lecture 16 Data Mining and Predictive Analytics Small-world phenomena Another famous study Stanley Milgram wanted to test the (already popular) hypothesis that people in social networks are separated
More informationLesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008
Lesson 4 Random graphs Sergio Barbarossa Graph models 1. Uncorrelated random graph (Erdős, Rényi) N nodes are connected through n edges which are chosen randomly from the possible configurations 2. Binomial
More informationFinding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010
Finding and Visualizing Graph Clusters Using PageRank Optimization Fan Chung and Alexander Tsiatas, UCSD WAW 2010 What is graph clustering? The division of a graph into several partitions. Clusters should
More informationV10 Metabolic networks - Graph connectivity
V10 Metabolic networks - Graph connectivity Graph connectivity is related to analyzing biological networks for - finding cliques - edge betweenness - modular decomposition that have been or will be covered
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2
More informationScalable Clustering of Signed Networks Using Balance Normalized Cut
Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.
More informationOn the Approximability of Modularity Clustering
On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,
More information1 Large-scale structure in networks
1 Large-scale structure in networks Most of the network measures we have considered so far can tell us a great deal about the shape of a network. However, these measures generally describe patterns as
More informationExample for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)
Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows) Average clustering coefficient of a graph Overall measure
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationV2: Measures and Metrics (II)
- Betweenness Centrality V2: Measures and Metrics (II) - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity - Homophily and Assortative Mixing 1 Betweenness
More informationPaths, Circuits, and Connected Graphs
Paths, Circuits, and Connected Graphs Paths and Circuits Definition: Let G = (V, E) be an undirected graph, vertices u, v V A path of length n from u to v is a sequence of edges e i = {u i 1, u i} E for
More informationGraph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.
Graph Theory COURSE: Introduction to Biological Networks LECTURE 1: INTRODUCTION TO NETWORKS Arun Krishnan Koenigsberg, Russia Is it possible to walk with a route that crosses each bridge exactly once,
More informationCommunity detection. Leonid E. Zhukov
Community detection Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E.
More informationCS-E5740. Complex Networks. Network analysis: key measures and characteristics
CS-E5740 Complex Networks Network analysis: key measures and characteristics Course outline 1. Introduction (motivation, definitions, etc. ) 2. Static network models: random and small-world networks 3.
More information