Network Analysis. 1. Large and Complex Networks. Scale-free Networks Albert Barabasi Society

COMP4048 Information Visualisation 2011 2 nd semester 1. Large and Complex Networks Network Analysis Scale-free Networks Albert Barabasi http://www.nd.edu/~networks/ Seokhee Hong Austin Powers: The spy who shagged me Kevin Bacon Number Robert Wagner Wild Things Let s make it legal What Price Glory Society Nodes: individuals Links: social relationship (family/work/friendship/etc.) Barry Norton A Few Good Man Monsieur Verdoux S. Milgram (1967) John Guare Six Degrees of Separation Social networks: Many individuals with diverse social interactions between them. Communication networks The Earth is developing an electronic nervous system, a network with diverse nodes and links are -computers -phone lines -routers -TV cables -satellites -EM waves Communication networks: Many non-identical components with diverse connections between them. Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK 1

Erdös-Rényi model (1960) Cluster Coefficient Clustering: My friends will likely know each other! - Democratic - Random Connect with probability p p=1/6 N=10 k ~ 1.5 Pál Erdös (1913-1996) Poisson distribution Networks are clustered [large C(p)] but have a small characteristic path length [small L(p)]. Probability to be connected C» p # of links between 1,2, n neighbors C = n(n-1)/2 Network C Crand L N WWW 0.1078 0.00023 3.1 153127 Internet 0.18-0.3 0.001 3.7-3.76 3015-6209 Actor 0.79 0.00027 3.65 225226 Coauthorship 0.43 0.00018 5.9 52909 Metabolic 0.32 0.026 2.9 282 Foodweb 0.22 0.06 2.43 134 C. elegance 0.28 0.05 2.65 282 Watts-Strogatz Model: Small World Networks World Wide Web: scalefree networks Nodes: WWW documents Links: URL links 800 million documents (S. Lawrence, 1999) ROBOT: collects all URL s found in a document and follows them recursively C(p) : clustering coeff. L(p) : average path length (Watts and Strogatz, Nature 393, 440 (1998)) R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999) What did they expect? They find: out = 2.45 in = 2.1 k ~ 6 P(k=500) ~ 10-99 N WWW ~ 10 9 N(k=500)~10-90 P(k=500) ~ 10-6 N WWW ~ 10 9 N(k=500) ~ 10 3 Finite size scaling: create a network with N nodes with P in (k) and P out (k) < l > 1 2 3 4 nd.edu 19 degrees of separation 5 l 15 =2 [1 2 5] l 17 =4 [1 3 4 6 7] < l > =?? < l > = 0.35 + 2.06 log(n) IBM A. Broder et al WWW9 (00) 6 7 19 degrees of separation R. Albert et al Nature (99) based on 800 million webpages [S. Lawrence et al Nature (99)] P out (k) ~ k - out P in (k) ~ k - in 2

What does it mean? Poisson distribution Power-law distribution INTERNET BACKBONE Nodes: computers, routers Links: physical lines Exponential Network (Faloutsos, Faloutsos and Faloutsos, 1999) Scale-free Network ACTOR CONNECTIVITIES Nodes: actors Links: cast jointly Days of Thunder (1990) Far and Away (1992) Eyes Wide Shut (1999) N = 212,250 actors k = 28.78 P(k) ~k- =2.3 SCIENCE CITATION INDEX Nodes: papers Links: citations 25 SCIENCE COAUTHORSHIP Nodes: scientist (authors) Links: write paper together Witten-Sander PRL 1981 1736 PRL papers (1988) 2212 P(k) ~k- ( = 3) (S. Redner, 1998) (Newman, 2000, H. Jeong et al 2001) 3

SCALE-FREE NETWORKS (1) The number of nodes (N) is NOT fixed. Networks continuously expand by the addition of new nodes Examples: WWW : addition of new documents Citation : publication of new papers (2) The attachment is NOT uniform. A node is linked with higher probability to a node that already has a large number of links. Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again Scale-free model (1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity k i of that node P(k) ~k -3 ki ( ki ) k A.-L.Barabási, R. Albert, Science 286, 509 (1999) j j GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle Metabolic Network Nodes: chemicals (substrates) Links: bio-chemical reactions Metabolic network Archaea Bacteria Eukaryotes Organisms from all three domains of life are scale-free networks! H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000) 4

Yeast protein network Nodes: proteins Links: physical interactions (binding) Topology of the protein network k k0 P( k) ~ ( k k0) exp( ) k H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001) P. Uetz, et al. Nature 403, 623-7 (2000). Nature 408 307 (2000) p53 network (mammals) One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a scale-free network. Complexity Network Science collaboration WWW Scale-free network Food Web Citation pattern Internet Cell UNCOVERING ORDER HIDDEN WITHIN COMPLEX SYSTEMS Traditional modeling: Network as a static graph Given a network with N nodes and L links Create a graph with statistically identical topology RESULT: model the static network topology PROBLEM: Real networks are dynamical systems! Evolving networks OBJECTIVE: capture the network dynamics METHOD : identify the processes that contribute to the network topology develop dynamical models that capture these processes BONUS: get the topology correctly. 5

Large and Complex Network Scale-free networks (a) Network modeling Exponential growth Preferential attachment (b) Properties: Power-low degree distribution High clustering coefficient Ultra-small avg. path length: O(loglog n) Other networks Small-world networks Power-low random sparse networks (graphs): Fan Chung 2. Social Network Analysis Basic Concepts and Terminology Social Network Analysis: Methods and Applications [Wasserman and Faust 94] Network Analysis: Methodological Foundations LNCS 3418 Tutorial [Brandes and Erlebach eds. 04] Social Network Analysis Methodological approach using graph theoretic concepts to describe, understand, explain social structure Network Model node, link, attribute weighted graph directed graph Purpose/level of interest (1) Centrality: important actors/ crucial links (2) Cohesive subgroups: components, cores, cliques (3) Structural roles: positions, roles, clusters (4) Network measures/statistics (1) Centrality Degree: local measure Distance measures (global) Betweenness [Freeman 79] low degree can be important (broker, gatekeeper, intermediary) proportion of shortest paths connecting each pair X and Z which pass thru Y Closeness: sum of shortest paths to all the other vertices Eccentricity: length of the longest shortest path Feedback measures Status/Hub/authority/eigenvector A B C G M Degree: A, B, C Betweenness: B 6

Displaying Centralities Node size, edge weight (2) Cohesive Subgroup Meaningful social group Component: strong component/weak component Cycles/cyclic component Connected (k-connectivity)/isolated Cut vertex/ separation pair Core k-core: maximal subgraph such that in which each vertex is adjacent to at least k other vertices vertex degree: >= k Clique : complete subgraph (strong/weak clique) 3-core Radial drawing Hierarchical drawing n-clique extension of clique n: maximum path length of members of clique two limitations: 1. n > 2: sociologically difficult to interpret 2. path may go thru other non-member vertices Diameter: 3 2-clique n-clan extension of n-clique, more useful concept Also require: the diameter of the clique be no greater than n 1-clique 2-clique 3-clique 2-clique No 2 -clan 2-clans k-plex Set of vertices in which each vertex is adjacent to all, except k of the other vertices (connected to n-k vertices) 1-plex = 1-clique: each vertex is connected to n-1 vertices (3) Network Positions and Structural Equivalence Positions/Roles Structural equivalance: Block model (or image matrix) : reduction of complex network Clusters: cliques, distance, similarity 1 n n1 n2 3-clique Not 3-plex 2-clique 3-plex n-clique/n-clan: reachability (path length) k-core/k-plex: degree 1..m m1 m2 Image matrix 7

Network Positions Conceptualising similarity of social positions: Structural equivalence Automorphic equivalence Regular equivalence Outdegree and indegree equivalence Blockmodelling Generalised blockmodelling Structural Equivalence The two green nodes are connected to exactly the same alters and are said to be structurally equivalent They hold identical positions in the network Formally, nodes a and b are structurally equivalent if, whenever (a,x) is an edge in G then so is (b,x), and conversely (x a, b) We can allocate nodes that are structurally equivalent to a (structural equivalence) class, or block represented by colour in the diagram. Blockmodel For a structural equivalence relation: If a node in block x is connected to a node in block y, then every node in block x is connected to every node in block y In this case we define an edge (x,y) between block x and block y, and the result is a reduced graph or blockmodel for the network Structural Equivalence for Directed graphs In a directed graph: nodes a and b are structurally equivalent if, (a) whenever (a,x) is an arc in G then so is (b,x), and conversely, and (b) whenever (x,a) is an arc in G then so is (x,b), and conversely graph blockmodel directed graph blockmodel Automorphic Equivalence A permutation α of the node set V of a graph is a re-labelling of the nodes: we write α(a) for the node to which the label a is attached by the relabelling A permutation α is an automorphism of the graph G if, whenever (a,x) is an edge in G, then (α(a),α(x)) is an edge in G, and conversely. Nodes a and b are automorphically equivalent if b = α(a) for some automorphism α Example: nodes with the same colour in the graph are automorphically equivalent Why is automorphic equivalence interesting? Automorphically equivalent nodes have the same position in a network in a more abstract sense than structurally equivalent nodes: they are not connected to the exact same nodes, but to nodes that play analogous roles in the network a potential representation for roles: leader, principal, broker, loner, clown, etc Note: if nodes are structually equivalent, then they are also automorphically equivalent, but the converse does not hold graph blockmodel Generalisation to directed and multiple graphs is straightforward, as for structural equivalence Note: thick edge on a node indicates a loop (self-tie) 8

Regular Equivalence Two nodes a and b are regularly equivalent if (a) whenever (a,x) is an edge in G, then there is some node y that is regularly equivalent to x for which (b,y) is an edge in G. In other words, if regularly equivalent nodes have the same colour, then each node in a class of regularly equivalent nodes is connected to other nodes of exactly the same set of colours (e.g red to yellow, green) Why are these forms of equivalence of interest? All of these forms of equivalence and several others [Pattison 1993] have the property that: there is a path at the block level (assuming a block-to-block tie whenever there is at least one node-level tie) if and only if there is at least one path at the node level graph (regular equivalence) blockmodel Blockmodels and Generalised Blockmodels (4) Network Measures/Statistics A blockmodel represents the relations among social positions, and comprises: an assignment of nodes to blocks, or positions (classes of equivalent nodes), and a specification of relations between blocks Different forms of equivalence are associated with different requirements for submatrix patterns of inter-block relations The most common practical applications to date have involved structural equivalence (vast majority) and regular equivalence ( niche applications) but software availability has played an important part. In a generalised blockmodel (Doreian et al., 2004, 2005), each submatrix may correspond to a different form of equivalence Degree distribution Clustering coefficient Diameter Average path length Connected component Density 3. Network Analysis using Pajek Vladimir Batagelj http://vlado.fmf.uni-lj.si/pub/networks/pajek/ 9

References Scale-free Networks: http://www.nd.edu/~networks/ GEOMI (GEOmetry for Maximum Insight) Social Network Analysis: INSNA http://www.insna.org SNA Tools UCINET Pajek: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ NetMiner Visone Network Analysis Plug-ins Graph Layout Plug-ins Interaction Plug-ins Visual analysis tool GEOMI: http://www.cs.usyd.edu.au/~visual/valacon/geomi/ Wilmascope 15

Analysis Methods Graph theory Tree/planar/directed graph algorithms Graph partitioning Graph clustering Graph decomposition Social Network Analysis Centrality Cohesive subgroups Structural position/role Block modeling.. Graph/Network Models Graph models Tree/planar graphs Clustered graphs Hierarchical graphs Directed graphs Hyper-Graphs Hi-graphs Network models Scale-free networks Evolution networks Dynamic networks Temporal networks Random sparse networks Application Domains Social networks Citation network Collaboration network Telephone call network Policy network Biological networks Phylogenetic networks Metabolic pathways Protein-Protein Interaction networks Gene regulatory networks Signaling networks Webgraphs/ AS graphs Software engineering 2.5D Graph/Network Layouts Graph models Trees Planar graphs Clustered graphs Hierarchical graphs Directed graphs Network models Scale-free networks Evolution networks Dynamic networks Temporal networks Overlapping networks Assignment 2: Programming assignment 1. Form a Group of 2 people 2. Send me an email: group name and members * Flexibility: 1 person: less requirement no exam option: more requirements Homework Find Visualisation of Social networks displaying Centrality analysis k-core analysis Structural equivalence Network Motifs Scale-free networks Small world networks 16