Erdős-Rényi Model for network formation

Similar documents
Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Constructing a G(N, p) Network

Constructing a G(N, p) Network

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

CAIM: Cerca i Anàlisi d Informació Massiva

1 Random Graph Models for Networks

1 More configuration model

CSCI5070 Advanced Topics in Social Computing

Economic Networks. Theory and Empirics. Giorgio Fagiolo Laboratory of Economics and Management (LEM) Sant Anna School of Advanced Studies, Pisa, Italy

Properties of Biological Networks

A Generating Function Approach to Analyze Random Graphs

RANDOM-REAL NETWORKS

UNIVERSITA DEGLI STUDI DI CATANIA FACOLTA DI INGEGNERIA

Graph Theory. Network Science: Graph theory. Graph theory Terminology and notation. Graph theory Graph visualization

Small-World Models and Network Growth Models. Anastassia Semjonova Roman Tekhov

Online Social Networks 2


CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS-E5740. Complex Networks. Scale-free networks

Critical Phenomena in Complex Networks

6.207/14.15: Networks Lecture 5: Generalized Random Graphs and Small-World Model

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

M.E.J. Newman: Models of the Small World

Random Graph Model; parameterization 2

Overlay (and P2P) Networks

γ : constant Goett 2 P(k) = k γ k : degree

An Evolving Network Model With Local-World Structure

CSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks

Random Simplicial Complexes

TELCOM2125: Network Science and Analysis

Mathematics of networks. Artem S. Novozhilov

Modelling data networks research summary and modelling tools

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

CSE 158 Lecture 11. Web Mining and Recommender Systems. Social networks

Distances in power-law random graphs

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Analytical reasoning task reveals limits of social learning in networks

E6885 Network Science Lecture 5: Network Estimation and Modeling

1 Counting triangles and cliques

Higher order clustering coecients in Barabasi Albert networks

Network Thinking. Complexity: A Guided Tour, Chapters 15-16

Summary: What We Have Learned So Far

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

Introduction to network metrics

Complex Networks. Structure and Dynamics

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Introduction to Networks and Business Intelligence

An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies

Intro to Random Graphs and Exponential Random Graph Models

Small-world networks

MODELS FOR EVOLUTION AND JOINING OF SMALL WORLD NETWORKS

Randomized Rumor Spreading in Social Networks

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

An introduction to the physics of complex networks

SLANG Session 4. Jason Quinley Roland Mühlenbernd Seminar für Sprachwissenschaft University of Tübingen

A Study of Random Duplication Graphs and Degree Distribution Pattern of Protein-Protein Interaction Networks

Using graph theoretic measures to predict the performance of associative memory models

The Mathematical Description of Networks

Epidemic spreading on networks

Erdös-Rényi Graphs, Part 2

Networks in economics and finance. Lecture 1 - Measuring networks

On Complex Dynamical Networks. G. Ron Chen Centre for Chaos Control and Synchronization City University of Hong Kong

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Graph-theoretic Properties of Networks

Network Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao

CPSC 536N: Randomized Algorithms Term 2. Lecture 4

TELCOM2125: Network Science and Analysis

CSE 158 Lecture 11. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

Towardsunderstanding: Astudy ofthe SourceForge.net community using modeling and simulation

An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based

Exercise set #2 (29 pts)

Characteristics of Preferentially Attached Network Grown from. Small World

Hyperbolic Geometry of Complex Network Data

arxiv: v1 [physics.soc-ph] 13 Jan 2011

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Nick Hamilton Institute for Molecular Bioscience. Essential Graph Theory for Biologists. Image: Matt Moores, The Visible Cell

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

Network Mathematics - Why is it a Small World? Oskar Sandberg

Examples of Complex Networks

Heuristics for the Critical Node Detection Problem in Large Complex Networks

Midterm 2 Solutions. CS70 Discrete Mathematics and Probability Theory, Spring 2009

2010 SMT Power Round

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Network Science with Netlogo Tutorial

Signal Processing for Big Data

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

1 Degree Distributions

Random Graphs CS224W

Branching Distributional Equations and their Applications

Computational complexity

Graph theoretic concepts. Devika Subramanian Comp 140 Fall 2008

Algorithms and Data Structures

CPSC 536N: Randomized Algorithms Term 2. Lecture 5

Transcription:

Network Science: Erdős-Rényi Model for network formation Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Why model? Simpler representation of possibly very complex structures Can gain insight into how networks form and how they grow May allow mathematical derivation of certain properties Can serve to explain certain properties observed in real networks Can predict new properties or outcomes for networks that do not even exist Can serve as benchmarks for evaluating real networks 2 Modeling approaches Random models choices independent of current network structure Erdős-Rényi (ER) Watts-Strogatz (clustered) Strategic models choices depend on current network structure Barabási-Albert (preferential attachment) Limited knowledge models choices based on local information only Newscast Cyclone Network is undirected Start with all isolated nodes (no edges) and add edges between pairs of nodes one at a time randomly Perhaps the simplest (dumbest) possible model Very unlikely that real networks actually form like this (certainly not social networks) Yet, can predict a surprising number of interesting properties Two possible choices for adding edges randomly: Randomize edge presence or absence Randomize node pairs 3 4

Randomize edge presence/absence Two parameters Number of nodes: n Probability that an edge is present: p For each of the n(n 1)/2 possible edges in the network, flip a (biased) coin that comes up heads with probability p If coin flip is heads, then add the edge to the network If coin flip is tails, then don t add the edge to the network Also known as the G(n, p) model (graph on n nodes with probability p) Randomize edge presence/absence Example: n=5, p=0.6 Number of possible edges: n(n 1)/2=5 4/2=10 Ten flips of a coin that comes up heads 60%, tails 40% Add the edges corresponding to the heads outcomes 5 6 Randomize edge presence/absence Degree distribution 5 2 4 Frequency 3 2 1 0 0 1 2 3 4 2 3 3 0 Degree Average node degree: p(n 1) What about node degree distribution? Expected average node degree: p(n 1)=0.6 4=2.4 Actual average node degree: (3+3+2+2+0)/5=2.0 Distribution 7 8

Degree distribution Need to quantify the probability that a node has degree k for all 0 k (n 1) A node has degree zero if all coin flips are tails A node has degree (n 1) if all coin flips are heads For a node to have degree k, the (n 1) coin flips must have resulted in k heads and (n 1 k) tails Since the probability of a heads is p, the probability of a tails is (1 p) Degree distribution The outcome k heads and (n 1 k) tails occurs with probability But there are (n 1) choose k ways in which this outcome can occur (the order of the flip results does not matter) Thus, the probability that a given node has degree k is given by the Binomial distribution 9 10 Binomial distribution Binomial distribution approximations for p small Binomial for n large Poisson n=8, p=0.5 n=8, p=0.1 Normal (Gaussian) Mean of the binomial distribution is µ=p(n 1) (which is also the average node degree we saw earlier) 11 12

Binomial distribution Binomial distribution Random network with n=50, p=0.08 p=.08, 50 nodes Degree distribution of random network with n=50, p=0.08 Actual data Poisson approximation 13 14 Binomial distribution Randomize node pairs Poisson distribution with different means µ Exponential decay Normal distribution with different means and standard deviations Alternative method for adding edges randomly Two parameters Number of nodes: n Number of edges: m Pick a pair of nodes at random among the n nodes and add an edge between them if not already present Repeat until exactly m edges have been added Also known as the G(n, m) model (graph on n nodes with m edges) For large n, the two versions of ER are equivalent 15 16

Example: n=5, m=4 Randomize node pairs vs real networks Degree distribution The two versions of the model are related through the equation for the number of edges: m=pn(n 1)/2 In the first case we pick p, and m is established by the model In the second case we pick m, and p is established by the model The above example corresponds to the second case where p=2m/n(n 1)=2 4/(5 4)=0.4 17 The ER model is a poor predictor of degree distribution compared to real networks The model results in Poisson degree distributions that have exponential decay Whereas most real networks exhibit power-law degree distributions that decay much slower than exponential 18 Erdős-Rényi diameter Recall that the diameter of a network is the longest shortest path between pairs of nodes Equivalently, the average distance between two randomly selected nodes In a connected network with n nodes, the diameter is in the range 1 (completely connected) to n 1 (linear chain) For a given n as we vary the model parameter p from 0 to 1, at some critical value of p, the diameter becomes finite (network becomes connected) and continues to decrease, becoming 1 when p=1 What is the relation between the diameter and p in the region where the network is connected? Erdős-Rényi diameter Suppose the model results in a tree-structured network of nodes with identical degrees, all equal to the average z=p(n 1) Starting from a given node, how many nodes can we reach in l steps? At step 1, reach z nodes then, reach z(z 1) new nodes then, reach z(z 1) 2 new nodes the number of new nodes reached grows exponentially with steps 19 20

Erdős-Rényi diameter Erdős-Rényi diameter After l steps, we have reached a total of z + z(z 1) + z(z 1) 2 + + z(z 1) l 1 nodes, which is z((z 1) l 1) / (z 2) which is roughly (z 1) l How many steps to reach (n 1) nodes? (z 1) l = (n 1) Roughly, l has to be on the order of log(n)/log(z) The diameter will be roughly twice log(n)/log(z) Confirms the empirical data we observed in real networks Can be shown to hold for the general ER model without the strong assumptions In reality, not all nodes have the same degree In reality, not tree-structured (there could be backwards edges) Proof based on a weaker set of conditions n large z (1 ε)log(n) for some ε>0 (connected) z/n 0 (but not too connected) 21 22 vs real networks Diameter Erdős-Rényi clustering coefficient Recall clustering coefficient of a node: probability that two randomly selected friends of it are friends themselves The ER model is a good predictor of diameter and average path length compared to real networks The model results in networks with small diameters, capturing very well the small-world property observed in many real networks Is the edge present? In the ER model, an edge between any two nodes is present with probability p (independent of their context) So, the clustering coefficient of the ER random network is equal to p 23 24

Erdős-Rényi clustering coefficient Erdős-Rényi clustering coefficient Example: n=5, p=0.6 1 1 0 Recall edge density of a network: actual number of edges in proportion to the maximum possible number of edges In the ER model, on average, pn(n 1)/2 edges are added, thus m=pn(n 1)/2 Edge density of ER network: 2/3 2/3 CC=(0+1+1+2/3+2/3)/5=0.6667 Compare with p which is 0.6 Since the edge density is exactly equal to the background probability of triangles being closed, the networks produced by the ER model cannot be considered highly clustered 25 26 vs real networks Clustering coefficient Erdős-Rényi giant component The ER model is a poor predictor of clustering compared to real networks The model results in clustering coefficients that are too small and too close to the edge density Whereas most real networks are often highly clustered with clustering coefficients that are much greater (sometimes several orders of magnitude) than their edge densities Suppose we add edges randomly with probability p If p=0, no edges added, so edge density of the network is 0 As p tends towards 1, the edge density tends towards 1 In fact, for the ER model, edge density follows the edge probability exactly What structural properties are likely at a given density #? When do certain structures emerge as a function of #? Many interesting properties occur at small densities And they occur very suddenly (tipping points) 27 28

Erdős-Rényi giant component Note that at edge density #, the expected node degree is #(n 1)#n for large n Run the NetLogo GiantComponent simulation In the ER model, giant components start forming at very low values of edge density For large n, we can show that If # < 1/n, the probability of a giant component tends to 0 If # > 1/n, the probability of a giant component tends to 1 and all other components have size at most log(n) At the tipping point #=1/n, the average node degree is #n=1 Network is very sparse but ER uses edges very efficiently Erdős-Rényi giant component Why is it very unlikely that two large components form? Run the NetLogo ErdosRenyiTwoComponents simulation Suppose two large components containing roughly half the nodes each do form in the ER model missing edges n/2 nodes n/2 nodes 29 30 Erdős-Rényi giant component Erdős-Rényi giant component How many potential edges are missing? The number of cross component edges is n/2 n/2=n 2 /4 Compare to the total number of possible edges: n(n 1)/2 In other words, more than half of all possible edges are missing Selecting a new edge to add that is not one of the missing cross edges becomes increasingly more unlikely Imagine enrolling 10,000 friends to Facebook asking them to keep their friendships strictly among themselves Impossible to maintain since all it takes is just one of the 10,000 to make one external friendship In those rare cases where two giant components have co-existed for a long time, their merger is sudden and often dramatic Imagine the arrival of the first Europeans in the Americas some 500 years ago Until then, the global socio-economic-technological network likely consisted of two giant components one for the Americas, another for Europe-Asia In the two components, not only technology, but also human diseases developed independently When they came in contact, the results were disastrous 31 32

Erdős-Rényi diameter Erdős-Rényi Other tipping points In the ER model, emergence of small diameter is also sudden and has a tipping point For large n, we can show that If # < n 5/6, the probability of the network having diameter 6 or less tends to 0 If # > n 5/6, the probability of the network having diameter 6 or less tends to 1 For the US, n=300m and the tipping point is #n25.8 For the world, n=7b and the tipping point is #n43.7 In fact, we can prove a much more general result In the ER model, any monotone property of networks has a tipping point In networks, a property is monotone if it continues to hold as we add more edges to the network Examples: The network has a giant component The diameter of the network is at most k The network contains a cycle of length at most k The network contains at most k isolated nodes The network contains at least k triangles 33 34 Erdős-Rényi Summary The ER model is able explain Small diameter, path lengths Giant components The ER model is not able explain Degree distributions Clustering 35