Random Graph Model; parameterization 2

Similar documents
Intro to Random Graphs and Exponential Random Graph Models

Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

Erdős-Rényi Model for network formation

Constructing a G(N, p) Network

1 Degree Distributions

RANDOM-REAL NETWORKS

Constructing a G(N, p) Network

6.207/14.15: Networks Lecture 5: Generalized Random Graphs and Small-World Model

TELCOM2125: Network Science and Analysis

CSCI5070 Advanced Topics in Social Computing

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

ECE 158A - Data Networks

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

1 More configuration model

Decomposition of log-linear models

1 Homophily and assortative mixing

Edge-exchangeable graphs and sparsity

Phase Transitions in Random Graphs- Outbreak of Epidemics to Network Robustness and fragility

A Generating Function Approach to Analyze Random Graphs

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

CS 177 Homework 1. Julian Panetta. October 22, We want to show for any polygonal disk consisting of vertex set V, edge set E, and face set F:

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Bipartite Perfect Matching in O(n log n) Randomized Time. Nikhil Bhargava and Elliot Marx

Lecture 3: Descriptive Statistics

M.E.J. Newman: Models of the Small World

CSE 100: GRAPH ALGORITHMS

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

CS281 Section 9: Graph Models and Practical MCMC

Degree Distribution: The case of Citation Networks

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Graph Theory and Network Measurment

Theorem 2.9: nearest addition algorithm

Solution to Graded Problem Set 4

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Lecture 22 Tuesday, April 10

Ma/CS 6b Class 26: Art Galleries and Politicians

Supplementary material to Epidemic spreading on complex networks with community structures

Superexpanders and Markov. cotype in the work of. Mendel and Naor. Keith Ball

DESIGN AND ANALYSIS OF ALGORITHMS GREEDY METHOD

Algebraic statistics for network models

CHAPTER 2. Graphs. 1. Introduction to Graphs and Graph Isomorphism

30. Constrained Optimization

ON THE STRONGLY REGULAR GRAPH OF PARAMETERS

Algorithm efficiency can be measured in terms of: Time Space Other resources such as processors, network packets, etc.

16.410/413 Principles of Autonomy and Decision Making

Erdös-Rényi Graphs, Part 2

Combinatorial Optimization - Lecture 14 - TSP EPFL

Chapel Hill Math Circle: Symmetry and Fractals

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

6 Randomized rounding of semidefinite programs

Algorithms for Grid Graphs in the MapReduce Model

Chapter 9 Graph Algorithms

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

Artificial Intelligence for Robotics: A Brief Summary

Disclaimer. Lect 2: empirical analyses of graphs

2. CONNECTIVITY Connectivity

Branching Distributional Equations and their Applications

Statistical Modeling of Complex Networks Within the ERGM family

TELCOM2125: Network Science and Analysis

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Critical Phenomena in Complex Networks

Voronoi Diagrams and Delaunay Triangulations. O Rourke, Chapter 5


Chapter 9 Graph Algorithms

Basics of Network Analysis

Lecture 7: Asymmetric K-Center

Geodesics in heat: A new approach to computing distance

2017 SOLUTIONS (PRELIMINARY VERSION)

MAT 3271: Selected Solutions to the Assignment 6

University of Florida CISE department Gator Engineering. Clustering Part 4

Chapter 4. Clustering Core Atoms by Location

Discrete mathematics , Fall Instructor: prof. János Pach

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Vertex Cover Approximations

Missing Data Analysis for the Employee Dataset

MSA220 - Statistical Learning for Big Data

Exact Algorithms Lecture 7: FPT Hardness and the ETH

CAIM: Cerca i Anàlisi d Informació Massiva

Clustering Part 4 DBSCAN

V 2 Clusters, Dijkstra, and Graph Layout

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Solving for dynamic user equilibrium

A graph is finite if its vertex set and edge set are finite. We call a graph with just one vertex trivial and all other graphs nontrivial.

Two-dimensional Totalistic Code 52

Social Network Analysis

Analysis of high dimensional data via Topology. Louis Xiang. Oak Ridge National Laboratory. Oak Ridge, Tennessee

IS 709/809: Computational Methods in IS Research. Graph Algorithms: Introduction

Treewidth and graph minors

Modularity CMSC 858L

Locality- Sensitive Hashing Random Projections for NN Search

Modeling web-crawlers on the Internet with random walksdecember on graphs11, / 15

COMPSCI 311: Introduction to Algorithms First Midterm Exam, October 3, 2018

Nearest Neighbors Classifiers

Signal Processing for Big Data

Transcription:

Agenda Random Graphs Recap giant component and small world statistics problems: degree distribution and triangles Recall that a graph G = (V, E) consists of a set of vertices V and a set of edges E V V. We can represent the graph with an adjacency matrix A, where A ij = 1 if (i, j) E, otherwise A ij = 0. Random Graph Model; parameterization 1 The simplest model for a graph we can have is the random graph model G(n, p), where n is the number of nodes, p is the probability of an edge, and all edges form independently with probability p. In the directed version, A ij are independently distributed with a Bernoulli(p) distribution, while in the undirected case A ij = A ji is distributed Bernoulli(p); that is, you perform one Bernoulli trial for each pair of edges, and record the result in both A ij and A ji. There are no dependencies whatsoever in the random graph model, and we only have to specify one parameter (the probability p). All edges are independent and equi-probable. In a graph of n nodes, each node can have edges with up to n 1 other nodes. The degree of a node i is thus the sum of n 1 independent Bernoulli(p) distributions, and so is Binomial(n 1, p). Thus, with a constant p, the degree diverges to infinity as n. Random Graph Model; parameterization 2 We can also use an alternate parameterization for the random graph model, called parameterizing by mean degree. Let λ = (n 1)p (the mean degree). 1

In this version, we hold λ constant as n (so p is decreasing). The different parameterizations lead to different interpretations. In the first case, we have the dense graph (sequence) limit, in which the expected degree increases as n. In the second case, we have the sparse graph (sequence) limit, in which connectivity is constant, so the graph gets sparser and sparser. In the mean degree parameterization, Degree(i) Poisson(λ); for this reason, we sometimes call them Poisson random graphs if they have the mean degree parameterization. How many connected components? The number of connected components depends on p; at p = 1, clearly there is just 1 connected component, while at p = 0, each node is a connected component. What is interesting is that at some magic intermediate value, there is a phase shift from many small connected components, to one CC of size O(n) and a few small ones. There are several arguments to show this claim: 1. Self-consistency argument: suppose that such a giant component exists. Say that a fraction s > 0 of all nodes are in this giant component (GC). For any node i, the probability that it is in GC is s. Node i is not in the GC if for each of the other n 1 nodes, i is not connected to that node (with probability 1 p) or is connected to the node and that node is not in GC (probability p(1 s)). The probability i is not in the GC is thus 1 s = (1 p + p(1 s)) n 1 = (1 ps) n 1 ( = 1 sλ n 1 Hence, taking the log of both sides, log(1 s) = (n 1) log 2 ) n 1 ( 1 sλ ) n 1

As long as x is small, log(1 + x) x, so And so log(1 s) (n 1) λs n 1 = λs s = 1 e λs So what are the solutions for s? Clearly s = 0 is always a solution, but this is not a very helpful one. Can we get a solution where s > 0? If we plot 1 e λs for a range of s and see where it intersect the line y = s, we can find solutions. If λ 1, it turns out that there is only the trivial solution. If λ > 0, then we can find a non-trivial solution s > 0. The solution s increases as λ increases, and the solution will always be bounded above by 1 since s = 1 e λs 1. 2. Epidemic picture for the giant component: Consider node i in the graph. It has Z 1 first neighbors, where Z 1 Poisson(λ). each first neighbor has a random number of neighbors, as do their neighbors, and so on. The expected number of first neighbors is E(Z 1 ) = λ. The number of second neighbors is the sum of all neighbors of the first neighbors, not including the initial vertex (we neglect overlapping because λ is fixed and n is large, so the probability of overlap is small): Z 1 Z 2 = (Poisson(λ) 1) j=1 The expectation of each summand is λ 1, so E[Z 2 ] = E[Z 1 ](λ 1) = λ(λ 1). Hence, E[Z 1 + Z 2 ] = λ 2. After k steps, we expect about λ k nodes: E[Z 1 + Z 2 + + Z k ] λ k. If λ < 1, this tends to a finite number. If λ > 1, this tends to infinity. This argument maps the connected component onto a branching process: each object i produces Z i branches to new objects, and does this independently of the other objects. If E[Z i ] 1, we get subcritical 3

branching; with probability 1, the population will go extinct in finite time. If E[Z i ] > 1, we get supercritical branching; there is a positive probability the population never goes extinct. The branching process lets us see how the giant component happens; as long as E[Z] > 1 (λ > 1) then it will keep growing. If λ 1, it will fizzle out. Because of the giant connected component, we get the small-world phenomenon in random graphs. The number of nodes reached after k steps is roughly O(λ k ), and the size of the giant component is about O(n). How big can k get inside the GC (how big can the diameter be)? O(λ k ) = O(n) k log(λ) = log(n) k = O ( ) log n log λ Thus we get the small-world phenomenon (diamter is O(log(n)) in random graphs if λ > 1. Statistics w/ Random Graphs We can show that the likelihood is L(p) = P (A; n, p) = i<j p A ij (1 p) 1 A ij and so the log-likelihood is l(p) = i<j log(1 p) + A ij log ( p ) 1 p This produces the MLE for p: ˆp MLE = A ij i<j ( n 2) that is, the most likely value for the density is the observed density. Furthermore, the likelihood belongs to an exponential family, so we know that the MLE is strongly consistent (converges almost surely) and efficient (which gives Gaussian CIs for large n). These are all great properties; what could go wrong? 4

Problems with the Random Graph Model Turns out, the random graph model is a horrible model of any real network known to science. There are two big things that rule it out: 1. degree distributions in real networks are not binomial/poisson (they are much too light-tailed and symmetric) 2. because edges form independently in the random graph model, there are too few triangles Triangles and transitivity The proportion of triples forming triangles is p 3 (independent probability for each of the three edges). Also, P (triangle two edges connected) = p. But in real social networks, P (triangle two edges connected) > p, while in some applications (like electric grids) P (triangle two edges connected) < p. This gives us a lesson for model criticism: find things where (1) the model makes predictions and (2) you didn t fit to them, to (3) check against data. The problem with the random graph model is that everything is independent and exchangeable; there are always independent, equal probabilities of an edge between two nodes, so there is nothing intrinsically different about nodes. In other words, the model is too symmetric. To fix the problems, we have to introduce some sort of asymmetry. One thing we can do is make some edges dependent on each other. This produces latent-space models, exponential family models, and stochastic block models. We could also let the probabilities vary; this gives inhomogenous random graphs, block models, P 1 models, configuration models, and β models. Finally, we could differentiate nodes, producing covariate-based link prediction, block models, and some sorts of degree-correction (where we have an idea that some nodes are more popular than others). 5