Transitivity and Triads

Similar documents
Statistical Methods for Network Analysis: Exponential Random Graph Models

Exponential Random Graph (p ) Models for Affiliation Networks

Statistical Modeling of Complex Networks Within the ERGM family

Exponential Random Graph (p*) Models for Social Networks

An Introduction to Exponential Random Graph (p*) Models for Social Networks

Structural Balance and Transitivity. Social Network Analysis, Chapter 6 Wasserman and Faust

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families

Exponential Random Graph Models for Social Networks

Technical report. Network degree distributions

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Statistical network modeling: challenges and perspectives

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm

STATISTICAL METHODS FOR NETWORK ANALYSIS

Chapter 4 Graphs and Matrices. PAD637 Week 3 Presentation Prepared by Weijia Ran & Alessandro Del Ponte

Social networks, social space, social structure

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman

Small and Other Worlds: Global Network Structures from Local Processes 1

Subgraph Frequencies:

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman

Snowball sampling for estimating exponential random graph models for large networks I

University of Groningen

CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL

Package hergm. R topics documented: January 10, Version Date

Tom A.B. Snijders Christian Steglich Michael Schweinberger Mark Huisman

This online supplement includes four parts: (1) an introduction to SAB models; (2) an

Networks and Algebraic Statistics

STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS

arxiv: v2 [stat.co] 17 Jan 2013

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models

Extending ERGM Functionality within statnet: Building Custom User Terms. David R. Hunter Steven M. Goodreau Statnet Development Team

The Network Analysis Five-Number Summary

Chapter 1. Introduction

Hierarchical Mixture Models for Nested Data Structures

Exponential Random Graph Models Under Measurement Error

An open software system for the advanced statistical analysis of social networks

Exponential Random Graph Models for Social Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Network Structure and Diffusion

Structure of Social Networks

Tie strength, social capital, betweenness and homophily. Rik Sarkar

Bayesian Inference for Exponential Random Graph Models

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

Algebraic statistics for network models

Monte Carlo Maximum Likelihood for Exponential Random Graph Models: From Snowballs to Umbrella Densities

A New Measure of Linkage Between Two Sub-networks 1

Dynamic organizations. How to measure evolution and change in organizations by analyzing communication networks

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

Specific Communication Network Measure Distribution Estimation

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

Short-Cut MCMC: An Alternative to Adaptation

4. Simplicial Complexes and Simplicial Homology

Network Data Sampling and Estimation

Models for Network Graphs

Package Bergm. R topics documented: September 25, Type Package

1 Homophily and assortative mixing

Recovering Temporally Rewiring Networks: A Model-based Approach

arxiv: v1 [cs.ds] 3 Sep 2010

Statistical Analysis and Data Mining, 5 (4):

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman

Discrete Mathematics Lecture 4. Harper Langston New York University

Auxiliary Parameter MCMC for Exponential Random Graph Models *

Modeling the Variance of Network Populations with Mixed Kronecker Product Graph Models

Lecture 3: Conditional Independence - Undirected

Specific Communication Network Measure Distribution Estimation

CHAPTER 4 WEIGHT-BASED DESIRABILITY METHOD TO SOLVE MULTI-RESPONSE PROBLEMS

Convexization in Markov Chain Monte Carlo

EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks

EVALUATION OF THE NORMAL APPROXIMATION FOR THE PAIRED TWO SAMPLE PROBLEM WITH MISSING DATA. Shang-Lin Yang. B.S., National Taiwan University, 1996

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

A noninformative Bayesian approach to small area estimation

Assessing Degeneracy in Statistical Models of Social Networks 1

Chapter 3. Set Theory. 3.1 What is a Set?

The Structure of Bull-Free Perfect Graphs

Approximate Bayesian Computation. Alireza Shafaei - April 2016

PNet for Dummies. An introduction to estimating exponential random graph (p*) models with PNet. Version Nicholas Harrigan

Conditional Volatility Estimation by. Conditional Quantile Autoregression

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

MCMC Methods for data modeling

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Markov Logic: Representation

Characterization of Super Strongly Perfect Graphs in Chordal and Strongly Chordal Graphs

Networks in economics and finance. Lecture 1 - Measuring networks

arxiv: v3 [stat.co] 4 May 2017

Web Structure Mining Community Detection and Evaluation

K-Means and Gaussian Mixture Models

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Loopy Belief Propagation

Computing Regular Equivalence: Practical and Theoretical Issues

Centrality Book. cohesion.

Web 2.0 Social Data Analysis

Markov Random Fields and Gibbs Sampling for Image Denoising

Hidden Markov Models in the context of genetic analysis

Statistical Matching using Fractional Imputation

Turing Workshop on Statistics of Network Analysis

Trust in the Internet of Things From Personal Experience to Global Reputation. 1 Nguyen Truong PhD student, Liverpool John Moores University

FUZZY SPECIFICATION IN SOFTWARE ENGINEERING

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Universal Properties of Mythological Networks Midterm report: Math 485

Transcription:

1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012

2 / 32 Outline 1 Local Structure Transitivity 2

3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one of the basic questions in modeling social networks is, how the global properties of networks can be understood from local properties.

3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one of the basic questions in modeling social networks is, how the global properties of networks can be understood from local properties. A major example of this is the theory of clusterability of balanced signed graphs.

3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one of the basic questions in modeling social networks is, how the global properties of networks can be understood from local properties. A major example of this is the theory of clusterability of balanced signed graphs. Harary s theorem says that a complete signed graph is balanced if and only if the nodes can be partitioned into two sets so that all ties within sets are positive, and all ties between sets are negative.

4 / 32 This was generalized by Davis and Leinhardt to conditions for clusterability of signed graphs and structures of ranked clusters; see Chapter 6 in Wasserman and Faust (1994). These theories are about the question, how triadic properties of signed graphs, i.e., aggregate properties of all subgraphs of 3 nodes, can determine global properties of signed graphs. This presentation is about such questions for graphs without signs.

5 / 32 Transitivity Transitivity of a relation means that when there is a tie from i to j, and also from j to h, then there is also a tie from i to h: friends of my friends are my friends. Transitivity depends on triads, subgraphs formed by 3 nodes. i i i j? j j h Potentially transitive h Intransitive h Transitive

6 / 32 Transitive graphs One example of a (completely) transitive graph is evident: the complete graph K n, which has n nodes and density 1. (The K is in honor of Kuratowski, a pioneer in graph theory.)

6 / 32 Transitive graphs One example of a (completely) transitive graph is evident: the complete graph K n, which has n nodes and density 1. (The K is in honor of Kuratowski, a pioneer in graph theory.) Is the empty graph transitive?

6 / 32 Transitive graphs One example of a (completely) transitive graph is evident: the complete graph K n, which has n nodes and density 1. (The K is in honor of Kuratowski, a pioneer in graph theory.) Is the empty graph transitive? Try to find out for yourself, what other graphs exist that are completely transitive!

7 / 32 Measure for transitivity A measure for transitivity is the (global) transitivity index, defined as the ratio Transitivity Index = Transitive triads Potentially transitive triads. (Note that A means the number of elements in the set A.) This also is sometimes called a clustering index.

7 / 32 Measure for transitivity A measure for transitivity is the (global) transitivity index, defined as the ratio Transitivity Index = Transitive triads Potentially transitive triads. (Note that A means the number of elements in the set A.) This also is sometimes called a clustering index. This is between 0 and 1; it is 1 for a transitive graph.

7 / 32 Measure for transitivity A measure for transitivity is the (global) transitivity index, defined as the ratio Transitivity Index = Transitive triads Potentially transitive triads. (Note that A means the number of elements in the set A.) This also is sometimes called a clustering index. This is between 0 and 1; it is 1 for a transitive graph. For random graphs, the expected value of the transitivity index is close to the density of the graph (why?); for actual social networks, values between 0.3 and 0.6 are quite usual.

8 / 32 Local structure and triad counts The studies about transitivity in social networks led Holland and Leinhardt (1975) to propose that the local structure in social networks can be expressed by the triad census or triad count, the numbers of triads of any kinds. For (nondirected) graphs, there are four triad types: h h h h i j i j i j i j Empty One edge Two-path / Two-star Triangle

9 / 32 A simple example graph with 5 nodes. 3 2 4 1 5 i j h triad type 1 2 3 triangle 1 2 4 one edge 1 2 5 one edge 1 3 4 two-star 1 3 5 one edge 1 4 5 empty 2 3 4 two-star 2 3 5 one edge 3 4 5 one edge In this graph, the triad census is (1, 5, 2, 1) (ordered as: empty one edge two-star triangle).

10 / 32 It is more convenient to work with triplets instead of triads: triplets are like triads, but they refer only to the presence of the edges, and do not require the absence of edges. E.g., the number of two-star triplets is the number of potentially transitive triads.

10 / 32 It is more convenient to work with triplets instead of triads: triplets are like triads, but they refer only to the presence of the edges, and do not require the absence of edges. E.g., the number of two-star triplets is the number of potentially transitive triads. The triplet count for a non-directed graph is defined by the number of edges, the total number of two-stars (irrespective of whether they are embedded in a triangle), and the number of triangles.

11 / 32 In the 5-node example graph, the triplet-based summary is: L = 4 edges: (1 2); (2 3); (1 3); (3 4). S 2 = 5 two-stars: ( ) ( ) 1 (2, 3) ; 2 (1, 3) ; T = 1 triangle: (1, 2, 3). ( ) ( ) ( ) 3 (1, 2) ; 3 (1, 4) ; 3 (2, 4).

11 / 32 In the 5-node example graph, the triplet-based summary is: L = 4 edges: (1 2); (2 3); (1 3); (3 4). S 2 = 5 two-stars: ( ) ( ) 1 (2, 3) ; 2 (1, 3) ; T = 1 triangle: (1, 2, 3). ( ) ( ) ( ) 3 (1, 2) ; 3 (1, 4) ; 3 (2, 4). (The fourth degree of freedom: ( ) 5 for n = 5 nodes there are = 10 triads.) 3

12 / 32 Formulae Triplet counts can be defined by more simple formulae than triad counts. If the edge indicator (or tie variable) from i to j is denoted Y ij (1 if there is an edge, 0 otherwise) then the formulae are: L = 1 2 S 2 = 1 2 T = 1 6 i,j Y ij i,j,k Y ij Y ik i,j,k Y ij Y ik Y jk edges two-stars triangles

13 / 32 Some algebraic manipulations can be used to show that the degree variance, i.e., the variance of the degrees Y i+, can be expressed as var(y i+ ) = 2 n S 2 + 1 n L 1 n 2 L2. This shows that for non-directed graphs, the triad census gives information equivalent to: density, degree variance, and transitivity index. This can be regarded as a basic set of descriptive statistics for a non-directed network.

14 / 32 Holland and Leinhardt s (1975) proposition was, that many important theories about social relations can be tested by means of hypotheses about the triad census. They focused on directed rather than non-directed graphs.

14 / 32 Holland and Leinhardt s (1975) proposition was, that many important theories about social relations can be tested by means of hypotheses about the triad census. They focused on directed rather than non-directed graphs. The following picture gives the 16 different triads for directed graphs. The coding refers to the numbers of mutual, asymmetric, and null dyads, with a further identifying letter: Up, Down, Cyclical, Transitive. E.g., 120D has 1 mutual, 2 asymmetric, 0 null dyads, and the Down orientation.

15 / 32

16 / 32 Probability models for networks The statistical approach proposed by Holland and Leinhardt now is obsolete. Since 1986, statistical methods have been proposed for probability distributions of graphs depending primarily on the triad or triplet counts, complemented with star counts and nodal variables.

16 / 32 Probability models for networks The statistical approach proposed by Holland and Leinhardt now is obsolete. Since 1986, statistical methods have been proposed for probability distributions of graphs depending primarily on the triad or triplet counts, complemented with star counts and nodal variables. It has been established recently that, in addition, inclusion of higher-order configurations (subgraphs with more nodes) is essential for adequate modeling of empirical network data.

17 / 32 In the statistical approach to network analysis, the use of probability models is model based instead of sampling based. If we are analyzing one network, then the statistical inference is about this network only, and it is supposed that the network observed between these actors could have been different:

17 / 32 In the statistical approach to network analysis, the use of probability models is model based instead of sampling based. If we are analyzing one network, then the statistical inference is about this network only, and it is supposed that the network observed between these actors could have been different: the ties are regarded as the realization of a probabilistic social process where probability comes in as a result of influences not represented by nodal or dyadic variables ( covariates ) and of measurement errors.

18 / 32 Markov graphs In probability models for graphs, usually the set of nodes is fixed and the set of edges (or arcs) is random. Frank and Strauss (1986) defined that a probabilistic graph is a Markov graph if for each set of 4 distinct actors i, j, h, k, the tie indicators Y ij and Y hk are independent, conditionally on all the other ties. This generalizes the concept of Markov dependence for time series, where random variables are ordered by time, to graphs where the random edge indicators are ordered by pairs of nodes.

19 / 32 Frank and Strauss (1986) proved that a probability distribution for graphs, under the assumption that the distribution does not depend on the labeling of nodes, is Markov if and only if it can be expressed as P{Y = y} = exp ( θl(y) + n 1 k=2 σ ks k (y) + τt (y) ) κ(θ, σ, τ) where L is the edge count, T is the triangle count, S k is the k-star count, and κ(θ, σ, τ) is a normalization constant to let the probabilities sum to 1. a 6-star

20 / 32 It is in practice not necessary to use all k-star parameters, but only parameters for lower-order stars, like 2-stars and 3-stars. Varying the parameters leads to quite different distributions. E.g., when using k-stars up to order 3, we have: higher θ gives more edges higher density; higher σ 2 gives more 2-stars more degree dispersion; higher σ 3 gives more 3-stars more degree skewness; higher τ gives more triangles more transitivity. But note that having more triangles and more k-stars also implies a higher density!

21 / 32 Small and other worlds Robins, Woolcock and Pattison (2005) studied these distributions in detail and investigated their potential to generate small world networks (Watts, 1999) defined as networks with many nodes, limited average degrees, low geodesic distances and high transitivity. (Note that high transitivity in itself will lead to long geodesics.) They varied in the first place the parameters τ, σ k and then adjusted θ to give a reasonable average degree. All graphs have 100 nodes.

22 / 32 Bernoulli graph: random

23 / 32 (θ, σ 2, σ 3, τ) = ( 4, 0.1, 0.05, 1.0) small-world graph: high transitivity, short geodesics

24 / 32 (θ, σ 2, σ 3, τ) = ( 1.2, 0.05, 1.0, 1.0) long paths; few high-order stars

25 / 32 (θ, σ 2, σ 3, τ) = ( 2.0, 0.05, 2.0, 1.0) long paths low transitivity

26 / 32 (θ, σ 2, σ 3, τ) = ( 3.2, 1.0, 0.3, 3.0) caveman world

27 / 32 (θ, σ 2, σ 3, τ) = ( 0.533, 0.167, 0.05, 0.5) heated caveman world (all parameters divided by 6)

28 / 32 Thus we see that by varying the parameters, many different graphs can be obtained. This suggests that the Markov graphs will provide a good statistical model for modeling observed social networks.

28 / 32 Thus we see that by varying the parameters, many different graphs can be obtained. This suggests that the Markov graphs will provide a good statistical model for modeling observed social networks. For some time, so-called pseudo-likelihood methods were used for parameter estimation; but these were shown to be inadequate. Snijders (2002) and Handcock (2003) elaborated maximum likelihood estimation procedures using the Markov chain Monte Carlo (MCMC) approach. These are now implemented in the software packages SIENA, statnet, and pnet.

29 / 32 More general specifications Markov graph models, however, turn out to be not flexible enough to represent the degree of transitivity observed in social networks. It is usually necessary for a good representation of empirical data to generalize the Markov model and include in the exponent also higher-order subgraph counts.

29 / 32 More general specifications Markov graph models, however, turn out to be not flexible enough to represent the degree of transitivity observed in social networks. It is usually necessary for a good representation of empirical data to generalize the Markov model and include in the exponent also higher-order subgraph counts. This means that the Markov dependence assumption of Frank and Strauss is too strong, and less strict conditional independence assumptions must be made.

30 / 32 The new models still remain in the framework of so-called exponential random graph models (ERGMs), P θ {Y = y} = exp ( k θ ks k (y) ) κ(θ) also called p models, see Frank (1991), Wasserman and Pattison (1996), Snijders, Pattison, Robins, and Handcock (2006). Here the s k (y) are arbitrary statistics of the network, including covariates, counts of edges, k-stars, and triangles, but also counts of higher-order configurations. Tutorials: both papers Robins et al. (2007).

31 / 32 Literature Frank, Ove, and David Strauss. 1986.." Journal of the American Statistical Association, 81: 832 842. Holland, P.W., and Leinhardt, S. 1975. Local structure in social networks." In D. Heise (ed.), Sociological Methodology. San Francisco: Jossey-Bass. Snijders, Tom A.B. 2002. Markov Chain Monte Carlo Estimation of Exponential Random Graph Models." Journal of Social Structure, 3.2. Snijders, T.A.B., Pattison, P., Robins, G.L., and Handock, M. 2006. New specifications for exponential random graph models." Sociological Methodology, 99 153.

32 / 32 Robins, G., Pattison, P., Kalish, Y., and Lusher, D. 2007. An introduction to exponential random graph (p ) models for social networks." Social Networks, 29, 173 191. Robins, G., Snijders, T., Wang, P., Handcock, M., and Pattison, P. 2007. Recent developments in Exponential Random Graph (p ) Models for Social Networks." Social Networks, 29, 192 215. Robins, G.L., Woolcock, J., and Pattison, P. 2005. Small and other worlds: Global network structures from local processes." American Journal of Sociology, 110, 894 936. Wasserman, Stanley, and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. New York and Cambridge: Cambridge University Press. Wasserman, Stanley, and Philippa E. Pattison. 1996. Logit Models and Logistic Regression for Social Networks: I. An Introduction to and p." Psychometrika, 61: 401 425.