Random graph models with fixed degree sequences: choices, consequences and irreducibilty proofs for sampling Joel Nishimura 1, Bailey K Fosdick 2, Daniel B Larremore 3 and Johan Ugander 4 1 Arizona State Univ. 2 Colorado State 3 Univ. of Colorado 4 Stanford Univ. ASU Discrete Math Seminar 2018
See paper for the numerous literature connections
What is notable about a graph?
Interpretation requires a Null Model karate club
model error Interpretation requires a Null Model Erdős Rényi fixed degree sequence application specific simulation replicated experiments implementation difficulty and/or required understanding
Stub Matching edges to stubs join 2 stubs drop stub labels Vertex-labeled Stub-labeled
Self-loops edges to stubs join 2 stubs drop stub labels
Self-loops and Multiedges Self-loops and multiedges are asymptotically rare (for reasonable degree sequences) Have been frequently been ignored, or simply deleted BUT they can also have large impacts on finite sized null models AND in null models which allow self-loops or multiedges, stub matching does not sample adjacency matrices uniformly at random
model error Interpretation requires a Null Model Erdős Rényi fixed degree sequence??? application specific simulation replicated experiments implementation difficulty and/or required understanding
simple multiedges self-loops vertexlabeled stublabeled Stub labeled graphs are biased against multiedges and self-loops
Consider 1,2,2,1 Uniformly samples from d and e are the same Uniform samples are different
There s a choice of graphs and it matters
Example 1 Geometer s collaboration graph n=9,072 m=22,577 Nodes: computational geometry researchers. Edges: collaboration on a book or paper Degree assortativity Do high productivity authors coauthor with other high productivity authors?
Multiedges, self-loops, and labeling?
Stub-labeling isn t causal Consider a collaboration network, and two potential stub labelings: Node i Stub 1: first paper Stub 2: second paper Node i Stub 1: first paper Stub 2: second paper Node j Stub 1: first paper Stub 2: second paper Node j Stub 1: first paper Stub 2: second paper
Vertex-Labeling is Causal Consider a collaboration network: Nodes: authors Edges: papers/books with unique title Suppose you order each edge s arrival Each vertex labeled graph has m! edge orderings i.e. all adjacency matrices correspond to the same number of timelines where papers were produced in different orders.
Example 2 Swallow graph n=17 Nodes: barn swallows. Edges: bird-bird interactions Trait assortativity (based on bird color) Do birds of a similar color interact together?
Example 3 South Indian village social support network n=782 Nodes: villagers Edges: reported social support Community detection via modularity maximization Modularity has a built in stub-labeled Chung-Lu null model Chung Lu estimation Do results change if we use vertex labeled model? # of edges observed in configuratio n models
Sampling graphs uniformly at random is surprisingly difficult (except pseudo-graphs)
Sampling graphs uniformly at random
Sampling via Markov chain Monte Carlo G 0 G 1 G 2 G 3 G 4 G 5 Goal: A sequence of degree constrained graphs such that subsampling from this sequence approximates a set of graphs drawn uniformly at random.
Double Edge Swaps
, the Graph of Graphs
, the Graph of Graphs
Dealing with Constraints
Dealing with Constraints no self-loops
Dealing with Constraints
MCMC requirements 1. Random walks can reach any graph -Irreducibility/GOG connected 2. Balanced transition probabilities -P(G i G j ) = P(G j G i ) -i.e. edges will be weighted but undirected 3. Markov chain is aperiodic -otherwise subsampling can be biased NOTE: There are mixing time results for some degree sequences. There are also numerical methods to gauge convergence. I will not discuss either.
Is the GoG periodic? Nope! Or
Are transition probabilities balanced? Stub-labeled GoG Vertex-labeled GoG GoG is an undirected simple graph GoG is a directed pseudograph
Stub-labeled GoG Vertex-labeled GoG GoG is an undirected simple graph
Is the GoG connected? Most difficult of the 3 questions Need special proof for each of choice of selfloops/multiedges Stub labeled GoG connectivity iff vertex labeled GoG Connectivity, because the following swap permutes stubs:
Connectivity of Graph of Pseudographs start target diff # of stubs per node # gold = # maroon
Connectivity of Graph of Pseudographs swap start target can always find a graph one edge closer to target
Connectivity on other GoGs?
Disconnectivity of loopy graphs Consider graphs with self-loops but no multiedges There are no swaps between these graphs Two directions for generalizations: cycles and cliques
Degree sequence: 2,2,,2} Swaps can: 1. Merge two cycles into a larger cycle (or do the reverse). 2. Swap two edges inside a cycles, preserving cycle length 3. Make a self-loop & reduce cycle length by 1 (or do the reverse), but only for cycles of length 4 or more. Swaps cannot make every edge a self-loop This can be further generalized
A taxonomy of V 3) V k are vertices k distance from a vertex in V 0 2) Vertices in V 1 have a neighbor in V 0 1) Let V 0 be vertices without a self-loop
Let V k be vertices k hops from a vertex without a selfloop
Deg seq: n+1, n+1,n-1,,n-1 No swaps are possible
Q 1 and Q 2 are exactly the problems
Proof of 4.20 outline graphs with a fixed degree sequence increasing number of self-loops connected components
graphs with most self-loops in yellow m*-loopy graphs: graphs with the most self-loops increasing number of self-loops connected components
graphs with most self-loops in yellow increasing number of self-loops m*-loopy graphs: graphs with the most self-loops Note: connectivity of follows from connectivity of simple graphs and an exchange lemma. connected components
graphs with most self-loops in yellow increasing number of self-loops m*-loopy graphs: graphs with the most self-loops The GoG is disconnected iff there is some component where: U connected components
Zooming into
Easy case: Harder case:
What do we know about? Maximum number of self-loops implies no open wedges in V 0. & No sequence of swaps can net create open wedges in V 0.
Example: V 4 is empty in any Open Wedge
Q 1 Q 2
is m*-loopy Decreasing any degree in K 0 leaves V u1 with excess degree.
is also m*-loopy By an alternating cycle/path argument. Thus Q: Can a different swap connect loopy-graphs?
Triangle swaps connect the GoG
Bonus: other constraints Connected Graphs GoG known to be connected, but algorithms require complicated data-structures to track effect of edge changes. Graphs with the same clustering coefficients Or, triangle constraints
Triangle MCMC constraints Total number of triangles Number of triangles incident at each node Do these affect connectedness in simple graphs?
Can we constrain number of triangles
How about triangle sequence
And more!
Thanks for listening!