Graph Theory and Network Measurment Social and Economic Networks MohammadAmin Fazli Social and Economic Networks 1
ToC Network Representation Basic Graph Theory Definitions (SE) Network Statistics and Characteristics Some Graph Theory Readings: Chapter 2 from the Jackson book Chapter 2 from the Kleinberg book Social and Economic Networks 2
Network Representation N = {1,2,,n} is the set of nodes (vertices) A graph (N,g) is a matrix [g ij ] n n where g ij represents a link (relation, edge) between node i and node j Weighted network: g ij R Unweighted network: g ij {0,1} Undirected network: g ij = g ji Social and Economic Networks 3
Network Representation Edge list representation: g = 12, 23 Edge addition and deletion: g+ij, g-ij Network isomorphism between (N, g) and (N, g ): f:n N g ij = g f i f(j) (N,g ) is a subnetwork of g if N N, g g Induced (restricted graphs): g S ij = g ij if i S, j S 0 Social and Economic Networks 4
Path and Cycles A Walk is a sequence of edges connecting a sequence of nodes W = i 1 i 2, i 2 i 3, i 3 i 4,, i n 1 i k p : i p i p+1 g A Path is a walk in which no node repeats A Cycle is a path which starts and ends at the same node i k = i 1 The number of walks between two nodes: Social and Economic Networks 5
Components & Connectedness (N,g) is connected if every two nodes in g are connected by some path. A component of a network (N,g) is a non-empty subnetwork (N,g ) which is (N,g ) is connected If i N and ij g then j N and ij g Strongly connectivity and strongly connected components for directed graphs. C(N,g) = C(g) = set of g s connected components The link ij is a bridge iff g-ij has more components than g Giant component is a component which contains a significant fraction of nodes. There is usually at most one giant component Social and Economic Networks 6
Special Kinds of Graphs Star: Complete Graph: Social and Economic Networks 7
Special Kinds of Graphs Tree: a connected network with no cycle A connected network is a tree iff it has n-1 links A tree has at least two leaves In a tree, there is a unique path between any pair of nodes Forest: a union of trees Cycle: a connected graph with n edges in which the degree of every node is 2. Social and Economic Networks 8
Neighborhood N i g = j: g ij = 1 N 2 i g = N i g j N i g N j g N k i g = N i (g) j N i g N j k 1 g N k k S g = i S N i Degree: d i g = #N i (g) For directed graphs out-degree and in-degree is defined Social and Economic Networks 9
Degree Distribution Degree distribution of a network is a description of relative frequencies of nodes that have different degrees. P(d) is the fraction of nodes that have degree d under the degree distribution P. Most of social and economical networks have scale-free degree distribution A scale-free (power-law) distribution P(d) satisfies: P d = cd γ Free of Scale: P(2) / P(1) = P(20)/P(10) Social and Economic Networks 10
Degree Distribution Social and Economic Networks 11
Degree Distribution Scale-free distributions have fat-tails For large degrees the number of nodes that degree is much more than the random graphs. log P d = log c γlog(d) Social and Economic Networks 12
Diameter & Average Path Length The distance between two nodes is the length of the shortest path between them. The diameter of a network is the largest distance between any two nodes. Diameter is not a good measure to path lengths, but it can work as an upper-bound Average path length is a better measure. Social and Economic Networks 13
Diameter & Average Path Length The tale of Six-degrees of Separation The diameter of SENs is 6!!! Based on Milgram s Experiment The true story: The diameter of SENs may be high The average path length is low [O(log n )] Social and Economic Networks 14
Diameter & Average Path Length The distance distribution in graph of all active Microsoft Instant Messenger user accounts Social and Economic Networks 15
Cliquishness & Clustering A clique is a maximal complete subgraph of a given network (S N, g S is a complete network and for any i N S: g S i is not complete. Removing an edge from a network may destroy the whole clique structure (e.g. consider removing an edge from a complete graph). An approximation: Clustering coefficient, This is the overall clustering coefficient Social and Economic Networks 16
Cliquishness & Clustering Individual Clustering Coefficient for node i: Average Clustering Coefficient: These values may differ Social and Economic Networks 17
Cliquishness & Clustering Social and Economic Networks 18
Cliquishness & Clustering Average clustering goes to 1 Overall clustering goes to 0 Social and Economic Networks 19
Transitivity Consider a directed graph g, one can keep track of percentage of transitive triples: Social and Economic Networks 20
Centrality Centrality measures show how much central a node is. Different measures for centrality have been developed. Four general categories: Degree: how connected a node is Closeness: how easily a node can reach other nodes Betweenness: how important a node is in terms of connecting other nodes Neighbors characteristics: how important, central or influential a node s neighbors are Social and Economic Networks 21
Degree Centrality A simple measure: d i g n 1 Social and Economic Networks 22
Closeness Centrality A simple measure: j i l i, j n 1 Another measure (decay centrality) j i What does it measure for δ = 1? δ l(i,j) 1 Social and Economic Networks 23
Betweenness Centrality A simple measure: Social and Economic Networks 24
Neighbor-Related Measures Katz prestige: P i K g = j i g ij P j K (g) d j g If we define g ij = g ij d j g or, we have P K g = gp K g I g P K g = 0 Calculating Katz prestige reduces to finding the unit eigenvector. Social and Economic Networks 25
Eigenvectors & Eigenvalues For an n n matrix T an eigenvector v is a n 1 vector for which Left-hand eigenvector: λ Tv = λv vt = λv Perron-Ferobenius Theorem: if T is a non-negative column stochastic matrix (the sum of entries in each column is one), then there exists a right-hand eigenvector v and has a corresponding eigenvalue λ = 1. The same is true for right-hand eigenvectors and row stochastic matrixes. Social and Economic Networks 26
Eigenvectors & Eigenvalues How to calculate: T λi v = 0 For this equation to have a non-zero solution v, T λi must be singular (non-invertible): det T λi = 0 Social and Economic Networks 27
Neighbor-Related Measures Computing Katz prestige for the following Katz prestige degree! Not interesting on undirected networks, but interesting on directed networks. Social and Economic Networks 28
Neighbor-Related Measures To solve the problem: Eigenvector Centrality: λc i e g = j g ij C j e g λc e g = gc e (g) Katz2: P K2 g, a = agi + a 2 g 2 I + a 3 g 3 I + Bonacich: P K2 g, a = 1 + ag + a 2 g 2 + agi = I ag 1 agi C e B g, a, b = 1 bg 1 agi Social and Economic Networks 29
Final Discussion about Centrality Measures Social and Economic Networks 30
Matching A matching is a subset of edges with no common end-point. Finding the maximum matching is an interesting problem specially in bipartite graphs (recall Matching Markets) A bipartite network (N,g) is one for which N can be partitioned into two sets A and B such that each edge in g resides between A and B. A perfect matching infects all vertices. Philip-Hall Theorem: For a bipartite graph (N,g), there exists a matching of a set C A, if and only if S C N S g S Proof: see the whiteboard. Social and Economic Networks 31
Set Covering and Independent Set Independent Set: a subset of nodes A V for which for each i, j A, ij g Consider two graphs (N,g) and (N,g ) such that g g. Any independent set of g is an independent set of g. If g g, there exists an independent sets of g that are not independent set of g. Free-rider game on networks: Each player buy the book or he can borrow the book freely from one of the book owners in his neighborhood. Indirect borrowing is not permitted. Each player prefer paying for the book over not having it. The equilibrium is where the nodes of a maximal independent set pays for the book. Social and Economic Networks 32
Coloring Example: We have a network of researchers in which an edge between node i and j means i or j wants to attends the others presentation. How many time slots are needed to schedule all the presentations? In each time slot, we should color the vertices in a way no two neighboring nodes get the same colors: The Coloring Problem. The minimum number of colors needed colors: the chromatic number Many number of results, most famous is the 4-color problem: Every planar graph can be colored with 4 colors. A planar graph is a graph which can be drawn in a way that no two edges cross each other. Social and Economic Networks 33
Coloring Intuition: The 6-color problem: Any planar graph can be colored with 6 colors. Proof sktech: Euler formula: v+f = e+2 e 3v 6 δ 5 Recursive coloring Four color is needed: Social and Economic Networks 34
Eulerian Tours & Hamilton Cycles Euler Tour: a closed walk which pass through all edges Euler theorem: A connected network g has a closed walk that involves each link exactly once if and only if the degree of each node is even. Proof sketch: Induction on the number of edges Social and Economic Networks 35
Eulerian Tours & Hamilton Cycles Hamilton Cycle: a cycle that passes through all vertices Dirac theorem: If a network has n 3 nodes and each node has degree of at least n/2, then the network has a Hamilton cycle. Proof sketch: Graph is connected Consider the longest path and prove it is in fact a cycle Consider a node outside this cycle Social and Economic Networks 36
Eulerian Tours and Hamilton Cycles Chvatal Theorem: Order the nodes of a network of n 3 nodes in increasing order of their degrees, so that node 1 has the lowest degree and node n has the highest degree. If the degrees are such that d i i for some i < n/2 implies d n i n i, then the network has a Hamilton cycle. Social and Economic Networks 37