Random graph models with fixed degree sequences: choices, consequences and irreducibilty proofs for sampling

Random graph models with fixed degree sequences: choices, consequences and irreducibilty proofs for sampling Joel Nishimura 1, Bailey K Fosdick 2, Daniel B Larremore 3 and Johan Ugander 4 1 Arizona State Univ. 2 Colorado State 3 Univ. of Colorado 4 Stanford Univ. ASU Discrete Math Seminar 2018

See paper for the numerous literature connections

What is notable about a graph?

Interpretation requires a Null Model karate club

model error Interpretation requires a Null Model Erdős Rényi fixed degree sequence application specific simulation replicated experiments implementation difficulty and/or required understanding

Stub Matching edges to stubs join 2 stubs drop stub labels Vertex-labeled Stub-labeled

Self-loops edges to stubs join 2 stubs drop stub labels

Self-loops and Multiedges Self-loops and multiedges are asymptotically rare (for reasonable degree sequences) Have been frequently been ignored, or simply deleted BUT they can also have large impacts on finite sized null models AND in null models which allow self-loops or multiedges, stub matching does not sample adjacency matrices uniformly at random

model error Interpretation requires a Null Model Erdős Rényi fixed degree sequence??? application specific simulation replicated experiments implementation difficulty and/or required understanding

simple multiedges self-loops vertexlabeled stublabeled Stub labeled graphs are biased against multiedges and self-loops

Consider 1,2,2,1 Uniformly samples from d and e are the same Uniform samples are different

There s a choice of graphs and it matters

Example 1 Geometer s collaboration graph n=9,072 m=22,577 Nodes: computational geometry researchers. Edges: collaboration on a book or paper Degree assortativity Do high productivity authors coauthor with other high productivity authors?

Multiedges, self-loops, and labeling?

Stub-labeling isn t causal Consider a collaboration network, and two potential stub labelings: Node i Stub 1: first paper Stub 2: second paper Node i Stub 1: first paper Stub 2: second paper Node j Stub 1: first paper Stub 2: second paper Node j Stub 1: first paper Stub 2: second paper

Vertex-Labeling is Causal Consider a collaboration network: Nodes: authors Edges: papers/books with unique title Suppose you order each edge s arrival Each vertex labeled graph has m! edge orderings i.e. all adjacency matrices correspond to the same number of timelines where papers were produced in different orders.

Example 2 Swallow graph n=17 Nodes: barn swallows. Edges: bird-bird interactions Trait assortativity (based on bird color) Do birds of a similar color interact together?

Example 3 South Indian village social support network n=782 Nodes: villagers Edges: reported social support Community detection via modularity maximization Modularity has a built in stub-labeled Chung-Lu null model Chung Lu estimation Do results change if we use vertex labeled model? # of edges observed in configuratio n models

Sampling graphs uniformly at random is surprisingly difficult (except pseudo-graphs)

Sampling graphs uniformly at random

Sampling via Markov chain Monte Carlo G 0 G 1 G 2 G 3 G 4 G 5 Goal: A sequence of degree constrained graphs such that subsampling from this sequence approximates a set of graphs drawn uniformly at random.

Double Edge Swaps

, the Graph of Graphs

Dealing with Constraints

Dealing with Constraints no self-loops

Dealing with Constraints

MCMC requirements 1. Random walks can reach any graph -Irreducibility/GOG connected 2. Balanced transition probabilities -P(G i G j ) = P(G j G i ) -i.e. edges will be weighted but undirected 3. Markov chain is aperiodic -otherwise subsampling can be biased NOTE: There are mixing time results for some degree sequences. There are also numerical methods to gauge convergence. I will not discuss either.

Is the GoG periodic? Nope! Or

Are transition probabilities balanced? Stub-labeled GoG Vertex-labeled GoG GoG is an undirected simple graph GoG is a directed pseudograph

Stub-labeled GoG Vertex-labeled GoG GoG is an undirected simple graph

Is the GoG connected? Most difficult of the 3 questions Need special proof for each of choice of selfloops/multiedges Stub labeled GoG connectivity iff vertex labeled GoG Connectivity, because the following swap permutes stubs:

Connectivity of Graph of Pseudographs start target diff # of stubs per node # gold = # maroon

Connectivity of Graph of Pseudographs swap start target can always find a graph one edge closer to target

Connectivity on other GoGs?

Disconnectivity of loopy graphs Consider graphs with self-loops but no multiedges There are no swaps between these graphs Two directions for generalizations: cycles and cliques

Degree sequence: 2,2,,2} Swaps can: 1. Merge two cycles into a larger cycle (or do the reverse). 2. Swap two edges inside a cycles, preserving cycle length 3. Make a self-loop & reduce cycle length by 1 (or do the reverse), but only for cycles of length 4 or more. Swaps cannot make every edge a self-loop This can be further generalized

A taxonomy of V 3) V k are vertices k distance from a vertex in V 0 2) Vertices in V 1 have a neighbor in V 0 1) Let V 0 be vertices without a self-loop

Let V k be vertices k hops from a vertex without a selfloop

Deg seq: n+1, n+1,n-1,,n-1 No swaps are possible

Q 1 and Q 2 are exactly the problems

Proof of 4.20 outline graphs with a fixed degree sequence increasing number of self-loops connected components

graphs with most self-loops in yellow m*-loopy graphs: graphs with the most self-loops increasing number of self-loops connected components

graphs with most self-loops in yellow increasing number of self-loops m*-loopy graphs: graphs with the most self-loops Note: connectivity of follows from connectivity of simple graphs and an exchange lemma. connected components

graphs with most self-loops in yellow increasing number of self-loops m*-loopy graphs: graphs with the most self-loops The GoG is disconnected iff there is some component where: U connected components

Zooming into

Easy case: Harder case:

What do we know about? Maximum number of self-loops implies no open wedges in V 0. & No sequence of swaps can net create open wedges in V 0.

Example: V 4 is empty in any Open Wedge

Q 1 Q 2

is m*-loopy Decreasing any degree in K 0 leaves V u1 with excess degree.

is also m*-loopy By an alternating cycle/path argument. Thus Q: Can a different swap connect loopy-graphs?

Triangle swaps connect the GoG

Bonus: other constraints Connected Graphs GoG known to be connected, but algorithms require complicated data-structures to track effect of edge changes. Graphs with the same clustering coefficients Or, triangle constraints

Triangle MCMC constraints Total number of triangles Number of triangles incident at each node Do these affect connectedness in simple graphs?

Can we constrain number of triangles

How about triangle sequence

And more!

Thanks for listening!