CS224W: Analysis of Networks Jure Leskovec, Stanford University

Similar documents
Non Overlapping Communities

Mining Social Network Graphs

Tie strength, social capital, betweenness and homophily. Rik Sarkar

CSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks

CSE 158 Lecture 11. Web Mining and Recommender Systems. Social networks

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #21: Graph Mining 2

Network Community Detection

Structure of Social Networks

CSE 255 Lecture 13. Data Mining and Predictive Analytics. Triadic closure; strong & weak ties

Economics of Information Networks

Online Social Networks and Media. Community detection

CSE 158 Lecture 13. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

CSE 158 Lecture 11. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

Web 2.0 Social Data Analysis

Extracting Information from Complex Networks

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Social-Network Graphs

My favorite application using eigenvalues: partitioning and community detection in social networks

Distribution-Free Models of Social and Information Networks

Social Networks, Social Interaction, and Outcomes

Recap! CMSC 498J: Social Media Computing. Department of Computer Science University of Maryland Spring Hadi Amiri

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Clusters and Communities

Web 2.0 Social Data Analysis

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

MCL. (and other clustering algorithms) 858L

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Network Science: Principles and Applications

TELCOM2125: Network Science and Analysis

Modularity CMSC 858L

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

CAIM: Cerca i Anàlisi d Informació Massiva

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Networks in economics and finance. Lecture 1 - Measuring networks

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CS224W: Analysis of Network Jure Leskovec, Stanford University

Social & Information Network Analysis CS 224W

Social Network Analysis

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Social Data Management Communities

Introduction to network metrics

Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL

Online Social Networks and Media. Positive and Negative Edges Strong and Weak Edges Strong Triadic Closure

Graph Theory and Network Measurment

Topic mash II: assortativity, resilience, link prediction CS224W

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Graph Theory for Network Science

Basics of Network Analysis

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Signal Processing for Big Data

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Online Social Networks and Media

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

Basic Network Concepts

Social and Information Network Analysis Positive and Negative Relationships

Extracting Communities from Networks

1 Homophily and assortative mixing

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

V4 Matrix algorithms and graph partitioning

Betweenness Measures and Graph Partitioning

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Oh Pott, Oh Pott! or how to detect community structure in complex networks

CS 124/LINGUIST 180 From Languages to Information. Social Networks: Small Worlds, Weak Ties, and Power Laws. Dan Jurafsky Stanford University

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

TELCOM2125: Network Science and Analysis

Spectral Methods for Network Community Detection and Graph Partitioning

RANDOM-REAL NETWORKS

An Empirical Analysis of Communities in Real-World Networks

Chapter 11. Network Community Detection

Overlapping Communities

Web Structure Mining Community Detection and Evaluation

An Optimal Allocation Approach to Influence Maximization Problem on Modular Social Network. Tianyu Cao, Xindong Wu, Song Wang, Xiaohua Hu

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Centrality Book. cohesion.

Community Structure and Beyond

beyond social networks

Lecture 2: From Structured Data to Graphs and Spectral Analysis

Graph Theory. Network Science: Graph theory. Graph theory Terminology and notation. Graph theory Graph visualization

Community Detection. Community

Clustering CS 550: Machine Learning

Supplementary material to Epidemic spreading on complex networks with community structures

2. CONNECTIVITY Connectivity

Visual Representations for Machine Learning

Community Detection in Social Networks

ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT

Transcription:

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models Algorithms Small diameter, Edge clustering Patterns of signed edge creation Viral Marketing, Blogosphere, Memetracking Scale-Free Densification power law, Shrinking diameters Strength of weak ties, Core-periphery Erdös-Renyi model, Small-world model Structural balance, Theory of status Independent cascade model, Game theoretic model Preferential attachment, Copying model Microscopic model of evolving networks Kronecker Graphs Decentralized search Models for predicting edge signs Influence maximization, Outbreak detection, LIM PageRank, Hubs and authorities Link prediction, Supervised random walks Community detection: Girvan-Newman, Modularity

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3 We often think of networks looking like this: What led to such a conceptual picture?

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4 How does information flow through the network? What structurally distinct roles do nodes play? What roles do different links (short vs. long) play? How do people find out about new jobs? Mark Granovetter, part of his PhD in 1960s People find the information through personal contacts But: Contacts were often acquaintances rather than close friends This is surprising: One would expect your friends to help you out more than casual acquaintances Why is it that acquaintances are most helpful?

[Granovetter 73] 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5 Two perspectives on friendships: Structural: Friendships span different parts of the network Interpersonal: Friendship between two people is either strong or weak Structural role: Triadic Closure a b Which edge is more likely, a-b or a-c? c If two people in a network have a friend in common, then there is an increased likelihood they will become friends themselves.

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6 Granovetter makes a connection between social and structural role of an edge First point: Structure Structurally embedded edges are also socially strong Long-range edges spanning different parts of the network are socially weak Second point: Information Long-range edges allow you to gather information from different parts of the network and get a job Structurally embedded edges are heavily redundant in terms of information access S S S a Weak W b S Strong

Triadic closure == High clustering coefficient Reasons for triadic closure: If B and C have a friend A in common, then: B is more likely to meet C (since they both spend time with A) B and C trust each other (since they have a friend in common) A has incentive to bring B and C together (as it is hard for A to maintain two disjoint relationships) Empirical study by Bearman and Moody: Teenage girls with low clustering coefficient are more likely to contemplate suicide 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7 B A C

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8 Define: Bridge edge If removed, it disconnects the graph Define: Local bridge Edge of Span > 2 (Span of an edge is the distance of the edge endpoints if the edge is deleted. Local bridges with long span are like real bridges) Define: Two types of edges: Strong (friend), Weak (acquaintance) Define: Strong triadic closure: Two strong ties imply a third edge Fact: If strong triadic closure is satisfied then local bridges are weak ties! S Bridge a a b Local bridge S S a S W W b Edge: W or S b S S S

Claim: If node A satisfies Strong Triadic Closure and is involved in at least two strong ties, then any local bridge adjacent to A must be a weak tie. Proof by contradiction: Assume A satisfies Strong Triadic Closure and has 2 strong ties Let A B be local bridge and a strong tie Then B C must exist because of Strong Triadic Closure But then A B is not a bridge! (since B-C must be connected due to Strong Triadic Closure property) 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9 C S S S A A S S B

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10 For many years Granovetter s theory was not tested But, today we have large who-talks-to-whom graphs: Email, Messenger, Cell phones, Facebook Onnela et al. 2007: Cell-phone network of 20% of country s population Edge strength: # phone calls

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11 Edge overlap: O &' = N(i) N j N(i) N j N(i) a set of neighbors of node i Overlap = 0 when an edge is a local bridge

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12 Cell-phone network Observation: Highly used links have high overlap! Legend: True: The data Permuted strengths: Keep the network structure but randomly reassign edge strengths Neighborhood overlap Permuted strengths True Edge strength (#calls)

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13 Real edge strengths in mobile call graph Strong ties are more embedded (have higher overlap)

Same network, same set of edge strengths but now strengths are randomly shuffled 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

Size of largest component Low disconnects the network sooner Removing links by strength (#calls) Low to high High to low Fraction of removed links Conceptual picture of network structure 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

Size of largest component Low disconnects the network sooner Removing links based on overlap Low to high High to low Fraction of removed links Conceptual picture of network structure 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17 Granovetter s theory leads to the following conceptual picture of networks Strong ties Weak ties

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19 Granovetter s theory suggest that networks are composed of tightly connected sets of nodes Network communities: Communities, clusters, groups, modules Sets of nodes with lots of connections inside and few to outside (the rest of the network)

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20 How to automatically find such densely connected groups of nodes? Ideally such automatically detected clusters would then correspond to real groups For example: Communities, clusters, groups, modules

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21 Zachary s Karate club network: Observe social ties and rivalries in a university karate club During his observation, conflicts led the group to split Split could be explained by a minimum cut in the network

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22 Find micro-markets by partitioning the query-to-advertiser graph in web search: query advertiser

Can we identify node groups? (communities, modules, clusters) 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Nodes: Teams Edges: Games played 23

11/13/17 Nodes: Teams Edges: Games played Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24 NCAA conferences

Can we identify social communities? 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Nodes: Users Edges: Friendships 25

High school Company Stanford (Squash) Stanford (Basketball) Social communities 11/13/17 Nodes: Users Edges: Friendships Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26

Can we identify functional modules? 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Nodes: Proteins Edges: Interactions 27

11/13/17 Nodes: Proteins Edges: Interactions Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 28 Functional modules

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 29 How to find communities? We will work with undirected (unweighted) networks

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30 Edge betweenness: Number of shortest paths passing over the edge Intuition: b=16 b=7.5 Edge strengths (call volume) in a real network Edge betweenness in a real network

[Girvan-Newman 02] 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31 Divisive hierarchical clustering based on the notion of edge betweenness: Number of shortest paths passing through the edge Girvan-Newman Algorithm: Undirected unweighted networks Repeat until no edges are left: Calculate betweennessof edges Remove edges with highest betweenness Connected components are communities Gives a hierarchical decomposition of the network

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32 12 1 33 49 Need to re-compute betweenness at every step

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33 Step 1: Step 2: Step 3: Hierarchical network decomposition:

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34 1. How to compute betweenness? 2. How to select the number of clusters?

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35 Want to compute betweenness of paths starting at node A Breadth first search starting from A: 0 1 2 3 4

Forward step: Count the number of shortest paths from A to all other nodes of the network 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37 Backward step: Compute betweenness: If there are multiple paths count them fractionally The algorithm: Add edge flows: -- node flow = 1+ child edges -- split the flow up based on the parent value Repeat the BFS procedure for each starting node U 1 path to K. Split evenly 1+0.5 paths to J Split 1:2 1+1 paths to H Split evenly

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 38 Backward step: Compute betweenness: If there are multiple paths count them fractionally The algorithm: Add edge flows: -- node flow = 1+ child edges -- split the flow up based on the parent value Repeat the BFS procedure for each starting node U 1 path to K. Split evenly 1+0.5 paths to J Split 1:2 1+1 paths to H Split evenly

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39 1. How to compute betweenness? 2. How to select the number of clusters?

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40 Communities: sets of tightly connected nodes Define: Modularity Q A measure of how well a network is partitioned into communities Given a partitioning of the network into groups s S: Q µ sî S [ (# edges within group s) (expected # edges within group s) ] Need a null model!

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41 Given real G on n nodes and m edges, construct rewired network G Same degree distribution but random connections Consider G as a multigraph The expected number of edges between nodes i and j of degrees k i and k j equals to: k i k j = k ik j 2m 2m The expected number of edges in (multigraph) G : = 1 k i k j 2 i N j N = 1 1 k 2m 2 2m i j N k j = 1 2m 2m = m 4m i N = i j Note: F k H H I = 2m

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42 Modularity of partitioning S of graph G: Q µ sî S [ (# edges within group s) (expected # edges within group s) ] Q G, S = 1 A 2m ij k ik j s S i s j s 2m Normalizing cost.: -1<Q<1 A ij = 1 if i j, 0 else Modularity values take range [ 1,1] It is positive if the number of edges within groups exceeds the expected number 0.3-0.7<Q means significant community structure

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43 Modularity is useful for selecting the number of clusters: Q Why not optimize Modularity directly?

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45 Let s split the graph into 2 communities! Want to directly optimize modularity! arg max R Q G, S = V A WX &' Z [Z \ ] R & ] ' ] WX Community membership vector s: s i = 1 if node i is in community 1-1 if node i is in community -1 Q G, s = V A WX &' Z [Z \ & I ' I WX = V A `X &' Z [Z \ &,' I s WX &s ' s & s ' + 1 2 ] [ ] \ _V W = 1.. if s i=s j 0.. else

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46 Define: Modularity matrix: B ij = A ij k ik j 2m Membership: s = { 1, +1} Then: Q G, s = V A `X &' Z [Z \ & I ' I WX = V B `X &,' I &' s & s ' = V s `X & & ' B &' s ' = V `X sf Bs = B i s Note: each row/col of B sums to 0: A ij = k i j, k ik j j = k 2m i k j j Task: Find sî{-1,+1} n that maximizes Q(G,s) 2m = k i s &s '

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47 Symmetric matrix A That is positive semi-definite: A = U U T Then solutions λ, x to equation A x = λ x : Eigenvectors x i ordered by the magnitude of their corresponding eigenvalues λ & (λ V λ W λ n ) x i are orthonormal (orthogonal and unit length) In other words, x i form a coordinate system (basis) If A is positive-semidefinite: λ & 0 (and they always exist) Eigen Decomposition theorem: Can rewrite matrix A in terms of its eigenvectors and eigenvalues: A = T i x i λ & x i

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48 Rewrite: Q G, s = V `X sq Bs in terms of its eigenvectors and eigenvalues: n = s q F x & λ & x & f &tv n s = F s f x & λ & x & f s &tv n = F s f x & W λ & &tv So, if there would be no other constraints on s then to maximize Q, we make s = x n x Why? Because λ n λ nu1 2 Remember s has fixed euclidean length! Assigns all weight in the sum to λ n (largest eigenvalue) All other s T x i terms are zero because of orthonormality s x 1

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 49 Let s consider only the first term in the summation (because λ n is the largest): n arg max Q G, s = &tv s f x W & λ & s f x W n λ n ] Let s maximize: s T x n where s j Î{-1,+1} To do this, we set: s j = x +1 1 if x n,j 0 (j th coordinate of x n 0) if x n,j < 0 (j th coordinate of x n < 0) Continue the bisection hierarchically

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50 Fast Modularity Optimization Algorithm: Find leading eigenvector x n of modularity matrix B Divide the nodes by the signs of the elements of x n Repeat hierarchically until: If a proposed split does not cause modularity to increase, declare community indivisible and do not split it If all communities are indivisible, stop How to find x n? Power method! Start with random v (0), repeat : When converged (v (t) v (t+1) ), set x n = v (t) ( t+ 1) v = Bv Bv ( t) ( t)

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 51 Girvan-Newman: Based on the strength of weak ties Remove edge of highest betweenness Modularity: Overall quality of the partitioning of a graph Use to determine the number of communities Fast modularity optimization: Transform the modularity optimization to a eigenvalue problem

[Ron Burt] 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 53 Who is better off, Robert or James?

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 54 Few structural holes Many structural holes Structural Holes provide ego with access to novel information, power, freedom

The network constraint measure [Burt]: c To what extent are person s contacts redundant Low: disconnected contacts High: contacts that are close or strongly tied = å i c p ik = å k p ij é ê p ë + å( p ) ik pkj i ij ij j j k p uv prop. of u s energy invested in relationship with v j p uv = 1/d u 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 55 ú û ù 2 2 p 12 =¼ 2 i 1 4 k p 15 =¼ p 25 =½ p uv 1 2 3 4 5 1.00.25.25.25.25 2.50.00.00.00.50 3 1.0.00.00.00.00 4.50.00.00.00.50 5.33.33.00.33.00 5 j

11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 56 Network constraint: James: c J = 0.309 Robert: c R = 0.148 Constraint: To what extent are person s contacts redundant Low: disconnected contacts High: contacts that are close or strongly tied

[Ron Burt] 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 57