Exponential Random Graph (p ) Models for Affiliation Networks

Size: px
Start display at page:

Download "Exponential Random Graph (p ) Models for Affiliation Networks"

Transcription

1 Exponential Random Graph (p ) Models for Affiliation Networks A thesis submitted in partial fulfillment of the requirements of the Postgraduate Diploma in Science (Mathematics and Statistics) The University of Melbourne by Peng Wang Supervisor: Dr Ken Sharpe October 31, 2006 Abstract Statistical modeling of social networks as complex systems has always been and remains a challenge for social scientists. Exponential family models give us a convenient way of expressing local network structures that have sufficient statistics for their corresponding parameters. This kind of model, known as Exponential Random Graph Models (ERGMs), or p models, have been developed since the 1980s. However, due to the difficulty of dealing with the intractable normalizing constant, pseudo-likelihood estimation methods have been applied in most studies. Recently, simulation based MCMC maximum likelihood estimation techniques have been developed. Furthermore, current advances in the ERGM provides a much better chance of model convergence for large networks compared with the traditional Markov models. To date most work on ERGMs has focused on one-mode networks, and little has been done on applying maximum likelihood estimation in the case of affiliation networks with two or more modes. This paper considers the application of MCMC maximum likelihood estimation to affiliation networks. Similar techniques have been applied to affiliation networks as in the latest specification for one-mode networks. We investigated features of the model by simulation, and compared the goodness of fit results obtained using the maximum likelihood and pseudolikelihood approaches. Examples used in this paper show that the ERGM with the newly specified statistics is a powerful tool for statistical analysis of affiliation networks. Key words: exponential random graph (p ) models, affiliation networks, MCMC MLE, realization dependence assumption.

2 Acknowledgements I wish to give my sincere thank to my supervisor Dr Ken Sharpe for guiding me through the two years of my postgraduate diploma study. Thanks to Dr Malcolm Alexander for providing me with the Interlocking director data used in this thesis. I would also like to thank Associate Professor Garry Robins and Professor Philippa Pattison. Thank you for introducing me to this field, and providing me with generous support and valuable advice for my study. It has been an honour and a great pleasure to work with you. Thanks to all my friends in the honors year and the members of the X ij MelNet social network analysis group. Finally, to Mum and Dad, and my beloved partner Yan. 1

3 Contents 1 Introduction Network Representations Local measures Nodal Degrees Local Clustering Coefficient Global measures Density Global Clustering Coefficient Degree Distribution Geodesic distribution Exponential Random Graph (p ) Models Unconditional ERGMs Conditional ERGMs Model Interpretations Model Specifications Bernoulli Random Graphs Markov Model Markov Model for Non-directed Graphs Markov Model for Directed Graphs Model for Bipartite Graphs Simulation 13 5 Estimation Maximum Pseudolikelihood Estimation Markov Chain Monte Carlo Maximum Likelihood Estimation New Specifications Limitations of Markov models Alternating k-stars Alternating k-stars for one-mode networks Alternating k-stars for Bipartite networks Simulation with alternating k-stars Alternating k-triangles Simulation with alternating k-triangle Alternating k-two-paths Alternating k-two-paths for one-mode networks

4 6.4.2 Alternating k-two-paths for bipartite networks Simulation with alternating k-two-paths Limitations of the new specifications Goodness of fit 27 8 Modeling Examples Southern Women Pseudo-Likelihood and Maximum-Likelihood Estimation Results Model Selection Interlocking directors Top 50 Financial Institutions (1996) Largest interlocked component from the top 500 (1996) Conclusion & Discussion 42 3

5 1 Introduction 1.1 Network Representations A network consists of nodes, and ties representing the relationship between the nodes. A pair of nodes that is either linked, or not linked, by a tie is referred as a dyad; and a set of three nodes, linked or not linked, is called a triad. Depending on the context, we can have different kinds of nodes and different kinds of relationships defined on them and therefore different kinds of networks. In a business network, nodes can be suppliers and consumers, and ties can be the purchasing activities; in a computer network, computer terminals and servers can be the nodes, and the network connections can be the ties. We can also have different networks among the same set of nodes, for example, a friendship network and an advice network among people working in the same organization. An affiliation network represents the association between two or more sets of nodes where each set is a different social entity. For example, in an interlocking director network, one set of nodes are the directors, the other are the companies, with ties representing directors sitting on company boards. The number of entities within the network is the mode of the network. This paper will focus on two-mode networks, also called bipartite networks. Networks can be represented by binary adjacency matrices. A one-mode network with n nodes is an (n, n) square matrix X; if there is a tie from node i to node j, then the cell X ij = 1, otherwise 0. The values of cells X ij and X ji will indicate the direction of the ties between i and j; if the network is non-directed, the matrix will be symmetric, i.e. X ij = X ji, i, j n. Ties in non-directed networks are often referred to as edges, and ties in directed networks are often called arcs. Figure 1 shows an example of the matrix representation of a nondirected network with 10 nodes. Figure 1: Adjacency matrix representation of a network of size 10. Using the matrix format, bipartite networks with n nodes of type A and m nodes of type B will give us a (n, m) rectangular matrix with numbers of rows and columns equal to the number of nodes in each of the two sets. If node i in one set is associated with node j in the other set, X ij = 1, otherwise 0. Figure 2 shows an example of the matrix representation of a network with 4

6 4 nodes in set A and 6 nodes in set B. Figure 2: Adjacency matrix representation of a (4, 6) bipartite network. Two one-mode networks can be derived from a bipartite network. For example, with a club membership bipartite network (i.e. if a person i is connected with club j, then i is a member of j), we can derive a person to person network such that if two people are in the same club, there is a tie between them; a club to club network can be constructed in a similar way to form a member sharing network. Figure 3 shows an example of transforming the bipartite network in Figure 2 to two one-mode networks with four and six nodes. As such a transformation can be carried out, one may analyze a bipartite network by looking at the two one-model networks. However, by transferring a bipartite network into two one-mode network, if we omit the value of the ties, we lose the information about the number of nodes of the other type acting as ties between connected pairs in the one-mode networks. Figure 3: Transformation of (4, 6) bipartite network to 2 one-mode networks. One aspect of the statistical analysis of social networks focuses on the formation of network ties, and investigates the impact of local interactive processes on the global network structure. A good statistical model should capture the significance of various kinds of local processes and still be able to reproduce the observed network at the global level. There are several local and global network properties listed below that are often used as network measurement. If our model can generate a distribution of networks that have consistent network measures, such as what follows, we may say it is a good model. These network measures can be calculated from the adjacency matrices. 5

7 1.2 Local measures Nodal Degrees The degree of a node is the number of ties incident with the node. It ranges from 0 to (n 1) for a network of size n. A node with 0 degree is referred to as an isolate. The nodes in a directed graph have both in and out degrees. The in-degrees are measures of popularity or receptivity of a node, and the out-degrees are measures of expensiveness or the extend to which a node sends out ties. They are also the basic measurements for node degree centrality, i.e. the active node in the center of the network should have high degrees. The degree for node i, denoted as d(i), can be calculated from the adjacency matrix X, by taking row or column sums. For non-directed graphs, d(i) = For directed graphs, x ji = x +i = x ij = x i+ (1.1) j=1 j=1 d in (i) = x ji = x +i, d out (i) = x ij = x i+ (1.2) For bipartite graphs with (N, M) nodes, j=1 j=1 M d n (i) = x ij = x i+, d m (j) = j=1 N x ij = x +j (1.3) i= Local Clustering Coefficient For a node i, the local clustering coefficient is defined as the proportion of nodes connected with node i that are themselves connected. Let τ(i) denote the number of triangles (a complete subgraph of size three) that node i is involved in, and s 2 (i) denote the number of two-stars or two-paths (three nodes connected by two ties) that node i is part of, then the local clustering coefficient C l (X) for non-directed graphs is calculated as: C l (X) = 1 n i=1 τ(i) s 2 (i) (1.4) where, τ(i) = j=1 k>j x ij x ik x jk, i j, i k (1.5) ( d(i) s 2 (i) = 2 ), d(i) > 1 (1.6) For directed graphs, with the direction of ties involved, a triad can form seven different triangles, and six different two-stars, so that different local clustering coefficients can be obtained. 6

8 In bipartite graphs, the smallest closure that is more than a dyad is a diamond, which is a three-path closed by a tie. Let C 4 (i) represent the number of four-cycles node i is part of, and L 3 (i) be the number of three-paths that i is involved in but not at either ends of the three-paths, the local clustering coefficient for bipartite graphs can be calculated as 1.3 Global measures Density C l (X) = 1 n i=1 C 4 (i) L 3 (i) Density is the ratio of the number of ties that are present and the maximum possible number of ties for a network with given size. Let t denote the number of ties in a network X of size n, then the density for a non-directed network D(X) is D(X) = t n(n 1)/2 = 2t n(n 1) (1.7) (1.8) If the network is directed, then D(X) is D(X) = t n(n 1) For a bipartite network X with (n, m) nodes and t ties, the density D(X) is D(X) = t nm (1.9) (1.10) Global Clustering Coefficient The global clustering coefficient is a measure of the overall clustering of a network. As shown in expression (1.11), it is defined as the ratio between the number of triangles (τ(x)) and the number of two-stars (s 2 (X)). Since there are three two-stars involved in a triangle, the factor of three makes the coefficient have range 0 to 1. C g (X) = 3 τ(x) s 2 (X) (1.11) For bipartite networks the global clustering coefficient is defined as the ratio between the number of four-cycles C 4 (X) and the number of three-paths L 3 (X), as shown in expression (1.12). C g (X) = 4 C 4(X) L 3 (X) (1.12) Degree Distribution For a network of size n, the degree distribution is the distribution of degrees of the node in the network over the range of [0,, n 1]. Directed networks have both in and out degree distributions. The (n, m) bipartite networks have separate degree distributions for the two sets of nodes. 7

9 1.3.4 Geodesic distribution Define path as a sequence of consecutive ties in a network, then the geodesic is the shortest path between two nodes. The geodesic distribution is the distribution of geodesics of all pairs of nodes in the network; it has a range from [1,, ), where means the pair of nodes can not reach each other. 2 Exponential Random Graph (p ) Models 2.1 Unconditional ERGMs The exponential random graph models, also called (p ) models are a class of stochastic models which use network local structures to model the formation of network ties for a network with a fixed number of nodes. We define a network space X, which contains all networks with a given number of nodes n. The network with n nodes can then be represented by a random variable X, which itself is a set of (n(n 1)) tie variables X ij, or X = {X ij }. A realization of X is denoted by x = {x ij }. Given the values of all other tie variables, two network tie variables are defined as neighbours if they are conditionally dependent, i.e. one tie s existence depends on the other tie s existence. A neighbourhood of mutually, conditionally dependent, tie variables then forms a local network configuration. Various local interaction processes can be represented by these network configurations based on different tie dependency, or neighbourhood assumptions. From the Hammersley-Clifford theorem (Besag, 1974)[3], a model for X has a form determined by its neighbourhood. This approach leads to ERGMs, or p models, introduced by Frank and Strauss (1986) [5], and Wasserman and Pattison (1996)[31]. Depending on the underlying neighbourhood assumptions, ERGM assigns probabilities to X based on a set of counts of regular local configurations which are sufficient statistics for their parameters. where ERGMs have the following general form Pr(X = x) = 1 κ(θ) exp p z p (x) is the network statistic of neighbourhood type p. θ p is the parameter associated with z p (x). κ(θ) = { X exp } p θ pz p (x) is a normalizing constant. θ p z p (x) (2.1) The network statistic z p (x) has a typical form of z p (x) = x ij, X ij p. The normalizing constant κ(θ) is generated over the entire graph space X with 2 n(n 1) possible graphs. Without Monte Carlo simulations, the intractable normalizing constant κ(θ) makes maximum likelihood estimation of the model very difficult, even for networks with a small number of nodes. 8

10 2.2 Conditional ERGMs The probability of a graph under given conditions can be modeled using Conditional ERGMs. For example, we may have a network with a fixed number of ties; the nodes may have a maximum number of degrees; the nodes in the network may be kept fully connected, i.e. keep the network at one component. These conditions arise due to the properties of network data, or how the network data is collected. For example, if a friendship network is derived based on the survey question List up to five people whom you consider as your friend?, then the maximum degree one node can have is five. Let Q denote the condition that the network must satisfy, the conditional ERGM can be expressed as Pr(X Q = x Q ) = Pr(X = x Q) = 1 κ(θ) exp p θ p z p (x) (2.2) The graph distribution generated by such a model is a conditional graph distribution. The size of the sample graph space for such a graph distribution is reduced from X to (X Q). With a smaller graph space, it might be easier to obtain the maximum likelihood estimates for the parameters. 2.3 Model Interpretations For a tie variable X ij in a given network X, let Cij denote the set that is the complement of {X ij }, x + denote the graph with x ij = 1, and x denote the graph with x ij = 0, then the conditional distribution of tie variable X ij is given as logit{pr(x ij = 1 C ij )} = p θ p u p (x ij ) (2.3) where u p (x ij ) is the change statistic of type p by obtained by changing x ij from 1 to 0, such that u p (x ij ) = z p (x + ) z p (x ) (2.4) Expression (2.3) gives us the log-odds of forming a tie between nodes i and j, conditioning on the rest of the network. 3 Model Specifications Based on different neighbourhood assumptions from the simplest Bernoulli random graph assumption by Holland and Leinhardt (1981)[11] to the most recent Realization-dependent random graph assumption by Pattison and Robins (2002)[13], different ERGM specifications have been developed. 9

11 3.1 Bernoulli Random Graphs The simplest ERGM is called the Bernoulli model, which only has a density effect for a nondirected graph. It is based on the simplest neighbourhood assumption, namely that all tie variables X ij are independent. Ties are equiprobable in a graph; there is only one parameter for the edge effect that controls the density of the network. We assumes homogeneity that the edge effect is the same across the entire network. Pr(X = x) = 1 κ exp {θz e(x)} (3.1) where z e (x) is the total number of edges in the graph, and θ is the density parameter. Graphs generated by a Bernoulli model are called Bernoulli Random Graphs which features low clustering and short path-length. Figure 4 shows an example of a Bernoulli graph on 30 nodes with a probability for a edge to be present φ = Pr(X ij = 1) = 0.1. Figure 4: A Bernoulli graph with φ = 0.1 The relationship between θ and φ is θ = logit(φ) (3.2) The ML estimate of θ can be obtained from the density of the graph D(x): ( ) D(x) ˆθ = log 1 D(x) A negative value of θ produces a graph with density less than 0.5. (3.3) 3.2 Markov Model The Bernoulli model is not particularly interesting and is not adequate for representing social networks, as evidences in the social science literature show that there are much more than just density in social networks. The Markov neighbourhood assumption was introduced by Frank and Strauss (1986) [5], in which all ties sharing a node are conditionally dependent on each other. Markov models are based on the Markov assumption. 10

12 3.2.1 Markov Model for Non-directed Graphs The Markov dependency assumption infers graph statistics including stars of different sizes and triangles for non-directed graphs, hence the Markov model has parameters for such configurations as shown in Figure 5. Figure 5: Configurations for Markov Models. Markov model for non-directed graphs ( ) Pr(X = x) = 1 n 1 κ(θ) exp θz e (x) + σ k z sk (x) + τz t (x) k=2 (3.4) where θ is the parameter for the edge statistic z e (x) = x ij (3.5) i=1 j=i+1 σ k is the parameter for statistics of stars of size k, or k-stars z sk (x) = i=1 ( xi+ k ) (3.6) τ is the parameter for the triangle statistic z t (x) = x ij x jh x ih (3.7) i=1 j=i+1 h=j Markov Model for Directed Graphs A Markov model for directed graphs would include statistics and corresponding parameters for arcs, reciprocal ties, in and out stars, and other triad configurations as shown in Figure 6. The notations for the triads in the figure are in the order of numbers of reciprocal ties, nonreciprocal ties and empty ties. The additional character further distinguishes configurations. T means transitive, C means cyclic, and U or D means upward from the reciprocal tie or downward to the reciprocal tie. For example, 120U stands for triads with one reciprocal tie, two non-reciprocal ties, zero empty ties, and the triangle is upwards. With Markov models, one can explicitly capture the tendency to form stars, which relates to the popularity and expensiveness of nodes, as well as the clustering and balance effects of social networks. Simulation studies on Markov models by Robins, Pattison, and Woolcock. (2005)[22] 11

13 Figure 6: Configurations for Directed Markov Models. shows that Markov graphs have much higher clustering effect compared with Bernoulli graphs, when a positive triangle parameter is included. Figure 7 shows an example Markov graph with positive triangle parameter on 30 nodes. Notice that there are many more triangles in this network than in the Bernoulli random network of Figure 4 with the same density. Figure 7: A Markov random graph Model for Bipartite Graphs As bipartite graphs cannot form triangles, models satisfying the Markov assumption only have density and star configurations, where the stars are of two different types corresponding to the two sets of nodes, we label them as S P for People-Stars and S A for Association-Stars. The 12

14 Markov assumption has the limitation that it cannot capture the basic closure, a four-cycle, in bipartite networks. A simulation study on interlocking directors by Robins and Alexander (2004)[17] introduced another two configurations, three-paths L 3 and four-cycles C 4 to reflect features of bipartite networks. Four-cycles are the simplest local closures representing the strength of ties when transferring to one-mode networks. However, the information about tie strength cannot be captured by binary adjacency matrices. Hence C 4 should be included in the ERGM for bipartite networks. Three-paths represent a local structure that could potentially be closed by another tie to form a four-cycle. For a bipartite network with given density, more three-paths and less four-cycles could shorten the average path length. A typical ERGM for bipartite networks will include local configurations as shown in Figure 8. Figure 8: Configurations for Bipartite Graph Models. The four-cycle and three-path configurations satisfy the realization dependence assumption by Pattison and Robins (2004)[14]. Detailed discussion about this assumption is presented in section 6 New Specifications, as the new specifications for ERGMs are based on the realization dependence assumption. 4 Simulation There are many different strategies for simulating exponential random graph models (Snijders 2002)[25]. The strategy used here is based on the Metropolis-Hastings sampling algorithm, conducted as follows: 1. Start with a given graph x, which can be any graph within the graph distribution state space X. 2. A pair of nodes i and j is selected randomly, and the tie between them x ij is either added or removed to form a candidate graph x, such that x ji = 1 x ij. 3. Using the change statistic of type p, as defined in equation (2.4), denoted by u p u p (x ij ) = z p (x + ) z p (x ) = z p (x ) z p (x) (4.1) The candidate graph x is accepted with probability min(1, r), where r is defined as follows: r = Pr(X = x ) Pr(X = x) = exp p θ p u p (x ij ) (4.2) 13

15 The simulation establishes a Markov Chain through the state space of graphs with a given number of nodes. This strategy has the advantage that the normalizing constant κ cancels due to division, and calculation of the change of graph statistic u p s consumes much less computing power than would recalculation of the graph statistics for every candidate graph. To generate a graph distribution that is independent of the starting graph, an initial burn-in time is required, and the graphs generated from the burn-in simulation should not be taken into account. The length of the burn-in depends on how different the starting graph is from the true graph distribution defined by the model. The simulation method forms the basis for exploring properties of various model specifications and the effect of a change in parameter values for a specified model. The Markov Chain Monte Carlo maximum likelihood estimation relies on simulation, and simulation is also used to test the goodness of fit of the model. 5 Estimation 5.1 Maximum Pseudolikelihood Estimation Maximum likelihood estimation is difficult for exponential random graph models as calculation of the normalizing constant is intractable. To avoid the need to calculate the constant, a pseudo likelihood estimation method was proposed by Strauss and Ikeda (1990) [29]. Instead of maximizing the original likelihood function, a logit model can be fitted conditioning on the rest of the network, using standard logistic regression methods. The maximum pseudolikelihood estimator (MPE) is the value of θ that maximizes the pseudolikelihood function: P L(θ) = Pr(x ij C ij ) (5.1) i j where C ij denotes the complement of x ij, which includes all x kl such that k i and l j. By changing each dyad x ij to (1 x ij ) from the given network, the logistic regression is performed based on the change statistics of various local configurations included in the model. If our observed graph has size n, we will have n(n 1) sets of change statistics. The MPE is the same as the MLE if the dyads of the network are assumed to be conditionally independent. However, this assumption is rarely satisfied in the case of social networks, hence the standard error from PLE does not apply, and the estimates can be quite different from the MLE. This can be assessed by comparing the observed network with the simulated graph distribution from the PLE result. Section gives an example of PLE and its goodness of fit on an observed network. 5.2 Markov Chain Monte Carlo Maximum Likelihood Estimation Maximum likelihood estimation procedures for exponential random graph (p ) models were proposed by Snijders (2002)[25] based on the stochastic approximation method proposed by 14

16 Robbins and Monro (1951) [16]. The maximum likelihood estimate (MLE) ˆθ will generate a graph distribution X with expected values of the graph statistics equal to the observed graph statistics. µ(θ) = E[z(X) ˆθ] = z(y), (5.2) where z(x) is a vector of graph statistics, and y is the observed graph. The moment equation (5.2) can be solved using the Newton-Raphson iterative approximation, ˆθ n+1 = ˆθ n cov 1 θ (ˆθ n )(µ(ˆθ n ) z(y)), (5.3) where cov θ ( ˆ θ n ) = cov 1 (z(x) ˆθ) (5.4) is the asymptotic covariance matrix of the ML estimator. Both µ(θ) and cov θ ( ˆ θ n ), as given in equations (5.2) and (5.4), are intractable for big networks, hence the means of sample statistics from Monte Carlo simulations are used to approximate these values. The original Robbins-Monro algorithm proposed an iterative parameter updating strategy given by where a n is a gain sequence that converges to 0. Zˆθn ˆθ n+1 = ˆθ ) n a n D 1 Z(y) ˆθ n (Zˆθn is the conditional distribution of Z θ given ˆθ. Dˆθn is a consistent estimator for D θ. The overall estimation algorithm consists of three phases. (5.5) 1. Starting with an initial guess θ 0, the first phase simulates a small number N 1 of sample graphs. Denote the sample graph distribution by X 0, if we have m parameters in the parameter vector θ, then the m m variance-covariance derivative matrix D θ0 derived as can be ˆD θ0 = 1 N 1 [z(y) E(z(X 0 ) θ 0 )][z(y) E(z(X 0 ) θ 0 )] T (5.6) 2. The second phase contains L subphases. Within each subphase l, parameter values are updated using the formula ˆθ n+1 = ˆθ ) n a ˆD 1 l Z(y) ˆθ n (Zˆθn where Zˆθn is based on S simulated graph samples with the parameter value ˆθ n. (5.7) The maximum likelihood estimation requires independent samples. To make the simulated samples close to independent samples, there are w simulation iterations before each sample 15

17 is collected. This can be computationally costly. From experience, to achieve adequate performance w = c D(y) (1 D(y)) o 2, (5.8) where c is a constant; o is the number of nodes in the observed network y; and D(y) is the density of the network. Within each subphase, the number of simulated graph samples S must be greater than a lower bound N l. The subphase is terminated by checking whether the sum of successive products Q l is negative, where Q l is defined as S Q l = [z(x s ) z(y)][z(x s 1 ) z(y)] (5.9) s=2 A negative Q l gives the indication that the model is converging. An upper bound N + l also enforces subphase termination, since Q l may never become negative. From experience, N l = (7 + p) 2 4l/3, N + l = N l + 200, (5.10) where p is the number of parameters, have been found to lead to adequate performance. At the end of each subphase, the mean of all updated parameter values are taken as the starting parameter values for the next subphase (l + 1). The newer subphase will simulate more samples and the gain factor a l+1 is reduced. 3. The third phase repeats simulations as in phase one but based on the final estimated parameters ˆθ from the second phase, and a large number of simulation iterations are carried out to check whether ˆθ can generate the expected graph distribution that is centered at the observed network. For each of the statistics, a t-ratio is calculated as t p = z p(y) Ê(z p(x) ˆθ) ˆσ(z p (X) ˆθ) (5.11) where X is the graph distribution simulated by applying parameter ˆθ, and y is the observed network. ˆσ is the estimated standard error calculated from the square-root of the estimated covariance matrix Dˆθ. If t 0.1, then the approximation may be considered as having converged. This estimation algorithm has been implemented in the program SIENA (Snijders, Steglich, Schweinberger and Huisman 2005[26]), and the program PNet (Wang, Robins and Pattison (2006)[30]). Another program called statnet (Handcock, Butts, Hunter, Goodreau and Morris (2006)[9]) is implemented under the R environment, and used a different algorithm based on Geyer and Thompson (1992)[7] to estimate similar models. For the purpose of MLE of p models for bipartite networks, the program BPNet as an extension to PNet is implemented. 16

18 6 New Specifications 6.1 Limitations of Markov models The Markov models described in section 3.2 have problems with achieving convergence. If the parameter associated with the number of triangles and k-stars (k 2) are positive, then changing some of the tie x ij s may lead to large increases in the change statistics for other tie variables x kl. As simulation proceeds, this can lead to a near complete graph with very little probability of getting back to sparse networks. If we change the parameter values to negative, then the model generates graphs that are near empty. To illustrate this behavior, a simulation was carried out on a nondirected network with 50 nodes. We simulated the edge-triangle Markov model with the edge parameter fixed at θ = 3.0, and the triangle parameter τ changed from -1 to 2 in steps of All star parameters in this simulation were kept at 0. For each parameter set (θ, τ), 100, 000 simulated graphs were cut off as the initial burn-in, and every 10,000th sample graph was taken from another 100, 000 simulated graphs, so there were 10 graphs for each set of parameters representing the corresponding graph distribution. The number of edges z e in each simulated graphs is plotted against the triangle parameter in Figure 9. The blue diamonds are from simulations started with an empty graph, and the red crosses are from simulation starting with a complete graph. The plot shows that when τ (0+, 1), and depending on the starting density, the model generates a two-region graph distribution that is close to either empty graphs or complete graphs. Figure 9: Simulation: θ and τ Markov model The model is near degenerate since it puts too much weight on near complete and near empty graphs. Most human social networks are denser than empty networks and sparser than complete networks, and an edge and triangle Markov model is seen to be a poor one for such contexts. Handcock (2003)[10] has a theoretical analysis of this issue, and Robins, Pattison and 17

19 Woolcock(2005) [22] show some simulated degenerate graph examples using Markov models. For bipartite networks, as described in section 3.2.3, the Markov assumption is not capable of capturing three-paths L 3 and the basic closure four-cycles C 4. By including parameters for L 3 and C 4 in the model, we have a model that captures the closure effect. However, this does not solve the degeneracy issue as the model still puts large weight on near complete or near empty bipartite graphs, since the change statistics for C 4, L 3 or large stars can be big. To avoid large changes in triangles and k-star (k 2) statistics, a set of newly specified ERGMs were proposed by Snijders, Pattison, Robins and Handcock (2006)[27]. Robins, Pattison, Kalish and Lusher (2006)[20], Robins, Snijders, Wang, Handcock and Pattison (2006)[23], Hunter (2006)[12] and Goodreau (2006)[8] provide further discussions and modeling examples using the new specifications. The new specifications model all (n 2) parameters for k-stars (k 2) as a function of a single parameter. Since all k-stars up to size (n 1) are modeled by this single parameter, it is effectively a parameter for the degree distribution. The new specification also introduced two new graph statistics k-triangles and k-two-paths based on a more general type of dependence assumption introduced by Pattison and Robins (2002)[13] and further discussed in Pattison and Robins (2004)[14] called the partial conditional dependency assumption, also known as the realization dependence assumption following Baddeley and Möller (1986)[2]. The following sections give a detailed description of the new specifications, and their extensions to bipartite networks. 6.2 Alternating k-stars Alternating k-stars for one-mode networks From the Markov model defined in equation (3.4), one can model stars up to size (n 1). The model puts large weights on big stars, or nodes with high degree, which causes the degeneracy problem. The new specification uses a single parameter for the entire degree distribution by introducing a weight parameter λ s, λ s 1, which dampens the effect of large changes in the statistics of large stars. The weights of stars also have alternating signs, so that the even-k-stars positive weights are balanced by the odd-k-stars negative weight. The new statistic, known as alternating k-stars with parameter λ s, can be expressed as, z s (λ s, x) = z s2 (x) z s 3 (x) λ s n 1 = ( 1) k z s k (x) k=2 λ k 2 s + z s 4 (x) λ 2 s + ( 1) n 2 z s n 1 (x) λ n 3 s (6.1) Denote the degree of node i in network x by d x (i), then each of the statistics for stars of size k, as defined in equation (3.6), can be expressed as ( ) xi+ z sk (x) = = k i=1 18 ( ) dx (i) i=1 k (6.2)

20 Expression 6.1 can then be written as z s (λ s, x) = n 1 ( 1) k z n 1 s k (x) = λ 2 s( 1 ) k λ s k=2 = λ 2 s = λ 2 s λ k 2 s k=2 n 1 ( 1 ( ) ) k dx (i) λ s k i=1 k=2 { n 1 i=1 k=0 ( ) dx (i) i=1 [ ( 1 ( )] ) k dx (i) 1 + d x(i) λ s k λ s Applying the binomial formula, then gives { z s (λ s, x) = λ 2 s (1 1 ) dx(i) + d } x(i) 1 λ s λ s i=1 When λ s = 1.0, expression 6.4 simplifies to z s (λ s, x) = 2z e (x) n + k } (6.3) (6.4) I{d x (i) = 0} (6.5) where z e (x) is the number of edges, and I is a binary indicator function such that i=1 I{d x (i) = 0} = { 1, if dx (i) = 0 0, otherwise (6.6) As defined in equation (2.4), the change statistic for alternating k-stars is calculated based on the number of alternating (k-1)-stars in the reduced graph x where the tie x ij = 0. For node i the change statistic for the alternating k-stars is Similarly, for node j, n 2 ( u si (λ s, x ij ) = 1 ) k 1 ( ) { dx (i) = λ s 1 (1 1 } ) d x (i) λ s k λ s k=1 { u sj (λ s, x ij ) = λ s 1 (1 1 } ) d x (j) λ s Combining (6.7) and (6.8) gives the following formula for the alternating k-star change statistic, { u s (λ s, x ij ) = λ s 2 (1 1 ) d x (i) (1 1 } ) d x (j) (6.9) λ s λ s When λ s = 1.0, (6.9) simplifies to where I is a binary indicator function. (6.7) (6.8) u s (λ s, x ij ) = I{d x (i) > 0} + I{d x (j) > 0}, (6.10) By assigning alternating signs to the stars, we assume that the parameters for stars of different sizes also have alternating signs. Let σ denote the parameter for the alternating k-stars statistic, the parameters for each individual star of size k, denoted by σ k, can be derived from σ by σ k+1 = σ k λ s, where σ 2 = σ, k 2 (6.11) 19

21 When λ s = 1, the alternating k-star parameter models the number of isolated nodes distinctly. When λ s = 2, the difference in the change statistics of 5-stars and 6-stars are less than 0.02, hence the model treats nodes with degree higher than five almost equivalently. As λ s, the alternating k-star is equivalent to a Markov two-star Alternating k-stars for Bipartite networks For Bipartite network x of size (n, m), since there are two sets of nodes, P and A, two separate alternating k-star statistics are defined as z SP (λ s, x) = z SA (λ s, x) = n 1 ( 1) k z S Pk (x) k=2 m 1 k=2 λ k 2 s ( 1) k z S Ak (x) λ k 2 s (6.12) The corresponding change statistics have the same form as for alternating k-stars in one-mode networks, as derived in equation(6.9) Simulation with alternating k-stars Simulations comparing the edge and two-star model versus the edge and alternating k-star model show that the edge and alternating k-star model gives a better coverage over the graph space, hence a better chance of achieving model convergence in the MCMC MLE. Figure 10 shows simulation plots of an edge L and two-star S P 2 model that simulates bipartite graphs with (30, 20) nodes. The L parameter is fixed at θ = 3.0, and the parameter σ (P ) 2 changes from -1 to 1 in steps of 0.1. For each σ (P ) 2, every 100,000th simulated graph is picked from 1,000,000 simulated graphs, the number of ties L for the sample graphs are plotted against the σ (P ) 2 parameter value. The results show that the L and S P 2 model is more consistent in that there is no multiple region for one set of parameters. However it still puts too much weight on graphs with very low or very high densities. Results from simulations conducted using the same simulation strategy for an edge and alternating k-star k-s P model on the same sized (30, 20) bipartite network are plotted in Figure 11. From the results we can see that as the alternating k-star parameter increases, the density of the network increases slowly from empty to the complete graph. There are reasonable numbers of simulated graphs with densities over the entire range of 0 to 1. The edge and alternating k-star model could potentially fit any observed bipartite network of the same size. 6.3 Alternating k-triangles The Markov assumption models network transitivity by a single triangle parameter. The previous simulations show that the Markov edge and triangle model has the problem of degeneracy, since it only covers the near empty or near complete region of the network space. The Markov 20

22 Figure 10: Simulation: L and σ (P ) 2 model assumption restricts the tie dependence structure such that tie variables must share a node to be considered as conditionally dependent. However, according to the realization dependent assumption described below, ties in a network may well be conditionally dependent even if they do not share a node. The Markov assumption is too restrictive, and a simple single triangle is not sufficient to capture all completed structures involved in human social networks. The realization dependence assumption expands the dependency structure to subgraphs of four nodes. The assumption states that two edge variables X ij and X kl are conditionally dependent, given the rest of the network, only if one of the two following conditions is satisfied: 1. X ij and X kl shares a node, i.e. {i, j} {k, l} φ, which is the condition needed to satisfy the Markov assumption. 2. x ik = x jl = 1, i.e. if the tie between nodes i and k, and the tie between j and l exists, then X ij and X kl would be part of a four-cycle as shown in Figure 12. Based on the realization dependence, the formation of a tie x ij is not only affected by other ties that nodes i and j have, but also other ties that do not directly involve nodes i or j, so that the probability of forming a tie is assumed to depend on whether the dyad is part of a social circuit (four-cycle). Graphs generated from a realization dependence model are called realization dependent graphs. From experience, triangles in social networks tends to form clique-like structures (a clique is a completed subgraph), where many triangles are formed within a small group of nodes. The new specification proposed a new graph statistic called k-triangles which is defined as k triangles sharing a common edge, as shown in Figure 13. A k-triangle is a further specification that satisfies the realization dependent assumption; it represents connected dyads having multiple shared partners. 21

23 Figure 11: Simulation: L and σ (P ) k model Figure 12: Realization dependence assumption when a four-cycle is created Let L 2ij (x) denote the number of two-paths between nodes i and j in network x, L 2ij (x) = x ih x jh, h i, j (6.13) h=1 the k-triangle statistic for a nondirected graph x of size n can be expressed as z tk (x) = i=1 j=i+1 ( ) L2ij (x) x ij, 2 k (n 3) (6.14) k To avoid the problem that the model puts large weight on large sized triangles, in analogy to alternating k-stars, the parameters for all (n 2) k-triangles are modeled as a function of a single parameter τ. The k-triangles also have a weight parameter λ t and alternating signs such that τ k = τ k 1 /λ t, which leads to the alternating k-triangle statistic which can be simplified by the binomial formula. When λ t > 1, 22

24 Figure 13: K-triangles z t (λ t, x) = 3z t1 (x) z t 2 (x) + z t 3 (x) + ( 1) n 3 z t n 2 (x) = = i=1 j=i+1 i=1 j=i+1 = λ t i=1 j=i+1 λ t x ij n 2 x ij n 2 λ 2 t λ n 3 t ( ) 1 k 1 ( ) L2ij (x) λ t k k=1 { ( ) 1 k ( ) } L2ij (x) ( λ t ) + λ t λ t k k=0 { x ij 1 (1 1 } ) L 2ij(x) λ t (6.15) To calculate the change statistic for alternating k-triangles, a tie x ij is removed from the network x. The formula involves two terms, since tie x ij can either be the base of the alternating k-triangle that makes the closure of multiple two-paths, or form part of the multiple two-paths. Let x denote the graph without tie x ij, the change statistic for x ij as the base is n 2 ( ) 1 k 1 ( L2ij (x ) { ) u tb (λ t, x ij ) = = λ t 1 (1 1 } ) L 2ij(x ) k λ t k=1 λ t Let h be another node that is connected to both i or j such that x ih x jh = 1, for x ij to be part of the multiple two-paths in an alternating k-triangle, the change statistic is the number of (k 1) two-paths that the dyads x ih and x jh have, for all h. u ts (λ t, x ij ) = = { n 3 ( ) 1 k ( L2ih (x ) n 3 ) ( ) 1 k ( L2jh (x ) } ) x ih x jh + x ih x jh λ t k λ t k k=0 k=0 ( {x ih x jh 1 1 ) L2ih (x ) + x ih x jh (1 1 ) } L2hj (x ) (6.16) λ t λ t h=1 h=1 Therefore, the change statistic for the alternating k-triangles is { u t (λ t, x ij ) = λ t 1 (1 1 } ) L 2ij(x ) λ t + h=1 {x ih x jh ( 1 1 λ t ) L2ih (x ) + x ih x jh (1 1 λ t ) L2hj (x ) } (6.17) 23

25 When λ t = 1.0 the alternating k-triangle statistic and the corresponding change statistic are z t (λ t, x) = i=1 j=i+1 u t (λ t, x ij ) = I L2ij >0 + x ij I L2ij(x) >0, h=1 { } x ih x jh I L2ih (x )=0 + x ih x jh I L2jh (x )=0 (6.18) where I is a binary indicator function Simulation with alternating k-triangle The edge and triangle Markov model can only model networks with either very low density or complete networks. The edge and alternating k-triangle model gives a better coverage of network space, as shown in Figure 14 which is the result of applying the same simulation strategy. Figure 14: Simulation: θ and τ k model The alternating k-triangle statistic is a measure of clustering based on the dependency between the formation of a tie between two nodes and whether they share multiple partners. A positive value for the alternating k-triangle parameter indicates that people sharing multiple partners are likely to be connected. In bipartite networks, triangles cannot be formed, as ties within one of the two sets of actors are not defined. Hence the alternating k-triangle statistic does not apply. The smallest local closure is a four-cycle, which is a two-path closed by another two-path. If we apply the same technique as the alternating k-triangles, we have a new statistic Alternating k-two-paths which is a measure of how nodes with multiple shared partners are likely to be closed by another two-path. 24

26 6.4 Alternating k-two-paths Alternating k-two-paths for one-mode networks A two-path is the same as a two-star, four nodes with two two-paths forms a four-cycle or a 2-two-path. We define a k-two-path as a structure such that two nodes are connected by k twopaths, as shown in Figure 15. The k-two-path structure also satisfies the realization dependence assumption. Figure 15: K-two-paths The number of k-two-paths can be expressed as, { n n ( L2ij (x)) i=1 j=i+1 if k > 2 z vk (x) = k 1 n n ) (6.19) 2 i=1 j=i+1 if k = 2, due to symmetry ( L2ij (x) 2 For ties to be part of a k-triangle, it can either be the base that makes the closure of the k-two-paths, or be the side as part of a two-path. Inclusion of the k-two-path in the model will distinguish between the effect of closure and the effect of forming prerequisites for closure. Applying a weight parameter λ v, and alternating signs as for k-stars and k-triangles, we form the alternating k-two-path statistic, when λ v > 1, z v (λ v, x) = z v1 (x) 2z v 2 (x) λ v + = λ v i=1 j=i+1 n 2 ( ) 1 k 1 z (x) vk k=3 λ v { 1 (1 1 } ) L 2ij(x) λ v (6.20) When λ v = 1, the statistic reduces to the number of dyads which are indirectly connected by at least one two-path. z v (λ v, x) = i=1 j=i+1 I L2ij (x)>0 (6.21) The change statistic for alternating k-two-paths is similar to the formula for the alternating k-triangles, except there is no base tie involved. ( u v (λ v, x ij ) = {x jh 1 1 ) L2ih (x ) + x ih (1 1 ) } L2jh (x ), if λ v > 1 λ v λ v u v (λ v, x ij ) = h=1 h=1 { } x jh I L2ih (x )=0 + x ih I L2jh (x )=0, if λ v = 1 (6.22) 25

27 6.4.2 Alternating k-two-paths for bipartite networks The alternating k-two-paths in bipartite networks can be understood in two different ways. By analogy to triangles in one-mode networks, a four-path is the smallest closure that is not a dyad, hence the parameter value for alternating k-two-paths is an indication of the likelihood of forming a social circuit. As bipartite networks have two sets of nodes P and A, two different k-two-path structures, k-c P and k-c A can be formed as shown in Figure 16. Figure 16: k-c P s and k-c A s The change statistics for the alternating k-c P and alternating k-c A can be derived in a similar way as for the alternating k-two-paths for one-model networks expressed in formula (6.22), the only difference is in the definition of L 2ij (x). For bipartite networks, L 2ij (x) is defined as the number of two-paths between nodes i and j, where both i and j belong to the same set of nodes Simulation with alternating k-two-paths Simulation was carried out on bipartite graphs with (30, 20) nodes, starts both from empty and complete graphs. The parameter θ was fixed at θ = 3.0, and the parameter β (P ) k for k-c P varied from -1 to 10. The result is plotted in Figure 17, and shows a good coverage over the graph space. Figure 17: Simulation: θ and β (P ) k model 26

28 6.5 Limitations of the new specifications The new specifications provide much higher possibility of obtaining maximum likelihood parameter estimates. Based on the realization dependence assumption, which is a weaker assumption compared with the Markov assumption, the model has a wider coverage of the graph space, and less likely to be degenerate. However, different combinations of k-stars or k-two-paths of different sizes can produce the same number of alternating-k-stars or alternating-k-two-paths, hence we may have a converged model that fits the newly specified statistics well, but not each individual k-star or k-two-path, such that the underlining graph distribution would be different from the observed network. Section is an example of such a case. Further investigation of possible dependency assumptions and graph statistics may help resolve this issue. 7 Goodness of fit The goodness of fit of an ERGM can be assessed by simulation, where various statistics from the observed network are compared with the statistics collected from the simulated network distribution to see whether the simulated graph distribution is centered at the observed network. The various statistics should not be limited to the ones that are being modeled in the given ERGM, as they should have been considered as very well fitted during the third phase of the MCMC maximum likelihood estimation algorithm where model convergence is tested. Instead they should include all possible network statistics and other local and global network measurements like the ones described in sections 1.2 and 1.3. A simple goodness of fit statistic is the t-ratio as defined in equation(5.11). Small t-ratios indicate good model fit. For statistics that are modeled in a given ERGM, the absolute value of the t-ratios should be less than 0.1 to prove that the model has converged. For other network statistics, t-ratios that are smaller than 2.0 are considered as indicating a good fit. The t-ratios assess goodness of fit on each network statistic independently. To test the overall fit of the model, we need to take into account correlations among these statistics. The Mahalanobis distance, introduced by P. C. Mahalanobis in 1936, gives us a way of testing how similar the observed network is compared with a distribution of networks generated by a p model. Let Z(x) = [z 1 (x), z 2 (x),, z p (x)] be the vector of observed network statistics, µ = (µ 1, µ 2,, µ p ) be the vector of means of network statistics from the simulated graph distribution, and Σ be the covariance matrix, the Mahalanobis distance d M is calculated as d M = (Z(x) µ) T Σ 1 (Z(x) µ) (7.1) If the distribution of Z(x) is multivariate normal, then d 2 M follows χ2 p k-distribution, where k is the number of parameters that are inside a given model. However, there is evidence that in quite a lot of cases, the distributions of graph statistics are not normal, hence Z(x) is not multivariate normal. Appropriate transformations are needed to perform a more valid χ 2 -test on model goodness of fit. 27

29 Table 1: Transformations of Graph Statistics z(x) KS p-value (z) z = f(z) KS p-value (z ) L > f(z) = z > S P f(z) = z 1/4 > S P3 < f(z) = z 1/5 > S A f(z) = z 1/4 > S A3 < f(z) = z 1/5 > L 3 < f(z) = z 1/5 > C 4 < f(z) = z 1/6 > K-S P > f(z) = z > K-S A > f(z) = z > K-C P f(z) = z 1/2 > K-C A > f(z) = z > stddev D P > f(z) = z > skew D P > f(z) = z > stddev D A < f(z) = z 1/2 > skew D A < f(z) = (z + 2) 1/2 > Clust.Coef. > f(z) = z > To assess normality, a simulation with a Bernoulli model (7.2) on bipartite networks of size (18, 14) was carried out, and all available graph statistics that are implemented in BPNet were collected from every 1,000th graph of the 1,000,000 simulated graphs. The collected statistics were then tested using the Normal Q-Q plot. The Komogorov-Smirnov p-values were used to indicate departure from normality. Pr(X = x) = 1 κ exp [ 0.605z L(x)] (7.2) Graph statistics with significant departure from normality (p-value < 0.1) were transformed using various forms of power transformations. The p-values were then tested on the transformed statistics. The transformation functions and p-values are listed in Table (1), where D P and D A are the degree distributions for nodes of type P and A. As for each individual graph there are degree distributions associated with each type of node, the means and standard deviations are used as graph statistics for the distribution of graphs representing the degree distributions. Using the graph statistic C 4 as an example, before the transformation, it is highly skewed, the Normal Q-Q plot shows a huge departure from normality. However, after a one-sixth power transformation, the transformed statistic passed the normality test, as shown in Figure

30 Figure 18: Normality test of C 4 raw and transformed data. 29

31 8 Modeling Examples In this section, two bipartite data sets are analyzed using the newly proposed ERGM for bipartite networks. The first dataset, known as the Southern Women data set, is a classic affiliation network data set collected by Davis, Gardner and Gardner (1941)[4]. It is about the participation in 14 informal social events by 18 women in Natchez, Mississippi over nine months. The second dataset, collected by the Social Networks Research Group at the Netherlands Institute for Advanced Study (NIAS) in , and analysed by Robins and Alexander (2004)[17], has two affiliation networks describing how directors were interlocked among the top 500 companies in Australia in Various models with different parameters were fitted and assessed using the goodness of fit strategy described in section Southern Women Since first published in the 1940s, the Southern Women data has been analysed using several social network analysis techniques, including some early specifications of ERGM for affiliation networks by Skovretz and Faust (1999)[24]. Freeman (2003)[6] gives an overview of various analyses that have been conducted on this data set. A plot of the network is shown in Figure 19 where circles represent women and squares represent events. Figure 19: Southern Women Data There are some interesting features of this network, the women painted in yellow attended 30

32 Table 2: PLE and MLE of Model 8.1 for the Southern Women Data Effect PLE MLE (S.E.) t-ratio* Choice (L) (0.314) Woman 2-Stars (S P2 ) (0.059) Event 2-Stars (S A2 ) (0.039) *t-ratio for convergence blue and pink events, women in green attended white and pink evens, and women in red only attended pink events. From the display of the data, we see that most of the women can reach most the events within three-steps; pink, green and white nodes, or pink, yellow and blue nodes form a lot of four-cycles, hence one may conclude that three-paths and four-cycles are important local configurations. Are three-paths and four-cycles really the building blocks for this network? We need to fit models and get a statistical answer Pseudo-Likelihood and Maximum-Likelihood Estimation Results Skovretz and Faust (1999) [24] explored some possible p models on this data set, including some network statistics that satisfy the Markov assumption. However, the maximum likelihood (ML) estimation method was not available at that time, and pseudo-likelihood (PL) estimation was used. Table 2 shows both the PL estimates (PLE) from Skovretz and Faust (1999) and ML estimates (MLE) with estimated standard errors using BPNet for the same Markov model. Pr(X = x) = 1 } {θz κ exp L (x) + σ P2 z SP2 (x) + σ A2 z SA2 (x) (8.1) The t-ratios from the MLE are less than 0.1 indicating good model convergence, and the standard errors for θ and σ A2 are less than half their corresponding parameter estimates suggesting that both parameters are significantly different from 0, whereas σ P2 has a big standard error indicating that the parameter is not significant, and may be removed from the model. We keep it here to provide a direct comparison with the PLE, more detailed discussion of model selection is presented in section Comparing the PLE and MLE, we see that the estimates are similar for event two-stars σ A2, however, the parameter for the choice effect θ P LE is less than θ MLE by more than one standard error, and the woman two-star parameter σ P2 is over-estimated by more than one standard error in the PL estimation. These differences in parameter estimates will cause large differences in the graph distributions represented by the model. To demonstrate the difference in graph distributions, simulations were carried out and model goodness of fit were assessed using the simulated graph distributions. In the simulation, the first 100,000 simulated graphs were used as initial burn-in; 1,000 graphs are taken out of another 1,000,000 simulated graphs by selecting every 1,000th graph. The means 31

33 Table 3: Goodness of Fit for the PLE and MLE of Model 8.1 PLE MLE z(x) obs. M ean S.D. t-ratio M ean S.D. t-ratio L S P S P S A S A L C K-S P K-S A K-C P K-C A stddev D P skew D P stddev D A skew D A Clust.Coef Mahalanobis Distance (µ) of various statistics collected through the simulation were used to test the observed graph statistics (obs); standard deviations (S.D.) and the values of the t-ratios are shown in Table 3, where S P denotes woman stars, S A denotes event stars, K-C P denotes alternating k-two-paths expended by two-paths centered on a woman, K-C A denotes alternating k-two-paths expended by two-paths centered on an event, D P denotes the degree distribution of woman nodes and D A denotes the degree distribution of event nodes. The differences in t-ratios of the estimated parameters between Table 2 and Table 3 are due to randomness of the simulations. The PLE provide a poor fit to the data as most of the t-ratios are greater than 2.0. The huge Mahalanobis distance also indicates poor model fit. In contrast, MLE give a very good fit to each individual network statistic, where the largest t-ratio is for the skewness of the event degree distribution (t = < 2.0), and it has a much smaller Mahalanobis distance. Hence the MLE does provide a much better model compared with the PLE. The advantage of PLE is that it will always produce parameter estimates, and quite often the estimated parameters are consistent with MLE. The PLE however, can be misleading, as illustrated here in the over-estimation of the parameter for woman two-stars (σ P2 ). Therefore, we should consider MLE, if possible, before using the PLE. 32

34 Table 4: Parameter Estimates of Models from (8.2) to (8.7) MLE (standard error) Effect Model (8.2) Model (8.3) Model (8.4) Choice (L) (0.127) (0.408) (0.267) Woman2-Stars (S P2 ) (0.084) Event 2-Stars (S A2 ) (0.039) Model (8.5) Model (8.6) Model (8.7) Choice (L) (0.314) (0.413) (0.491) Woman2-Stars (S P2 ) (0.059) (0.176) (0.175) Event 2-Stars (S A2 ) (0.039) (0.131) (0.160) 3-Paths (L 3 ) (0.018) (0.019) Alt.2-Paths (K-CP ) (0.208) Model Selection For an observed network, one may fit several p models with different numbers of parameters according to the underlying neighborhood assumption. However, not all fitted, or converged, models will give a good fit to the observations. An ideal model should converge, provide a good fit to the original network, and be easy to interpret. In our case, to find the best model for the Southern Women data, six different p models were fitted. Starting with the simplest Bernoulli model (8.2), and ending with a model involving alternating two-paths (8.7), they all successfully converged during estimation. The parameter estimates and their estimated standard errors are shown in Table 4. Pr(X = x) = 1 κ exp {θz L(x)} (8.2) Pr(X = x) = 1 } {θz κ exp L (x) + σ P2 z SP2 (x) (8.3) Pr(X = x) = 1 } {θz κ exp L (x) + σ A2 z SA2 (x) (8.4) Pr(X = x) = 1 } {θz κ exp L (x) + σ P2 z SP2 (x) + σ A2 z SA2 (x) (8.5) Pr(X = x) = 1 } {θz κ exp L (x) + σ P2 z SP2 (x) + σ A2 z SA2 (x) + αz L3 (x) (8.6) Pr(X = x) = 1 {θz κ exp L (x) + σ P2 z SP2 (x) + σ A2 z SA2 (x) + αz L3 (x) + β KCP z KCP (x, λ)}, λ = 1.1 (8.7) To select the best model out of the six, the goodness of fit strategy was carried out where graph distributions are simulated from each of the models, and tested against the original data. The goodness of fit involved 100,000 graphs as burn-in, then 1,000 graphs taken from 1,000,000 33

35 Table 5: Goodness of Fit of Models from (8.2) to (8.7) Model t-ratios Statistics (8.2) (8.3) (8.4) (8.5) (8.6) (8.7) L S P S P S A S A L C K-S P K-S A K-C P K-C A stddev D P skew D P stddev D A skew D A Clust.Coef d M simulated graphs using a step size of 1,000. The t-ratios of various statistics and the Mahalanobis distances are listed in Table 5. The Mahalanobis distances were calculated after transforming the graph statistics using the transformation functions in Table 1. Note that model (8.5) is the same as model (8.1), which is used to test against pseudo-likelihood estimates. For the Bernoulli model (8.2), the MCMC MLE of parameter θ = 0.605(0.127) agrees with the MLE obtained from the density of the graph as in equation (3.3), where for the Southern Women data, D(x) = , and ˆθ ) = log = The model gives a good fit on the ( D(x) 1 D(x) density of the network, but not on the event stars (S A2 and S A3 ), the event degree distribution (D A ),the closure effect (C 4 ) or the global clustering coefficient. The large Mahalanobis distance also indicates this is not a good model for the data. By adding the Women two-star effect (S P2 ), model (8.3) does not improve the goodness of fit of the model. Also notice that the parameter estimate for σ P2 from 0. is not significantly different Both model (8.4) and model (8.5) fitted the data quite well, as all t-ratios in both cases are less than 2.0. In both models, the Event two-star parameter σ P2 is significant. Also notice that σ A2 is not significant in model (8.5), however, inclusion of σ A2 in the model, gives a better fit 34

36 to the data, especially on C 4 and the standard deviation of the event degree distribution. For ERGMs, it is not always the case that the goodness of fit would be improved by including more parameters in the model. Model (8.6) fitted the three-paths (L 3 ) explicitly, and the estimation results show that all parameters in this model are significant. However, compared with the simpler model (8.5), the model has a greater Mahalanobis distance, and it provides a worse fit on C 4, and the clustering coefficient is not fitted well. The model with a parameter for C 4 did not converge due to the degenerate behavior of the model. Instead, model (8.7), with an alternating k-women-two-paths did converge with a small damping parameter λ = 1.1. This model gives a reasonable fit to the data, but it is more complicated than model (8.5) which also provided a good fit for the data. If we compare the squares of the Mahalanobis distances to the χ 2 -distribution with corresponding degrees of freedom, all models have p-values less than 0.01, i.e. the observed network is not in the centre of any of the graph distributions generated from any of the models considered, when correlation between the statistics are taken into account. However, given that we are testing 16 different statistics using models that have less than 6 parameters, the Mahalanobis distance test is a very powerful test. The differences in Mahalanobis distance between models give us a good indication as to which model fits the network relatively better. Therefore, models (8.4),(8.5) and (8.7) have similar Mahalanobis distance where model (8.5) has the smallest (d M = 8.068). This result is consistent with the results we get from the t-ratios for various statistics. From the discussion above, we conclude that model (8.5) is the best model from the models we have considered for the Southern Women data set. We can obtain the conditional log-odds of woman i attending event j is given by 2.031u L (x ij ) u SP2 (x ij ) u SA2 (x ij ) (8.8) The model tells us that the three-path and four-cycle structures all happened just by chance given the density and two-star effects. However, we know from the Mahalanobis distance that there are correlations between the graph statistics that are not controlled well by the model, further investigations of other graph configurations may result a better model that may give us more interesting interpretations about this data set. 8.2 Interlocking directors In the simulation study conducted by Robins and Alexander (2004)[17], they compared the interlocking company directors network structures of the US and Australia in The observed network statistics were compared with simulated random network distributions with the same density as the observed network, then Z-scores were used as indications of the level of differences between the observed networks and the random network distribution. The modeling examples we are going to use here are based on the same data source. The first example is the data from the top fifty financial institutions in Australia (1996); the second 35

37 is the largest interlocked component from the top 500 companies from both the financial and industrial sectors Top 50 Financial Institutions (1996) From the data collected, there are 366 directors working for the top 50 financial institutions in Australia in The network plot is shown in Figure 20, the blue squares are companies and the red circles are directors. If a director is sitting on the board of a company, there is a tie between them. There are 395 ties which gives a density of Figure 20: Top 50 Financial Institutions, Australia (1996) There are thirty separate components in this network, and the largest interlocked component has fourteen companies and eighty directors. The greatest number of directors one company has is fourteen, and there is one such company. The largest director degree is of size four, and there is one such director. The (366, 50) node network is much larger then the (18, 14) node southern women data, we use this example to show how the new specifications perform on big networks. Using a Markov model with L 3 and C 4, it is almost impossible to get convergence. The New Specification Model (8.9) does converge with a damping factor λ = 2.0. The estimation results are shown in Table 6; the parameter estimates for Choice and Company Alternating- K-Star are not significantly different from 0. There is a strong negative tendency for forming Director Alternating-K-Stars and Alternating-two-paths centered at a company (K CA, where 36

38 Table 6: Parameter Estimates for Model (8.9) on 366 directors Effect MLE S.E. t-ratio* Choice (L) Director Alternating-K-Star (K-S P ) Company Alternating-K-Star (K-S A ) Alternating-Two-Path (K-C A ) *t-ratio for convergence two directors are linked by one company). Pr(X = x) = 1 κ exp {θz L(x) + σ P z SP (x, λ) + σ A z SA (x, λ) + β KCA z KCA (x, λ)}, λ = 2.0 (8.9) The goodness of fit test used 3,000 out of 5,000,000 simulated graphs as a representation of the underlining graph distribution from the estimated model. The test results are shown in Table 7. From the goodness of fit results we can see that the model gives a good fit on most of the graph statistics, except the director three-star (S P3 ) and the skewness of the director degree distribution (D P ), as indicated by the large t-ratios, and hence the large Mahalanobis distance. This can be explained by looking at the observed network, as mentioned before, there is one director that has degree four, and this director is a four-star that makes up four out of the five observed director three-stars. The model does not include a director four-star parameter, and even if we included such a parameter, it will still be difficult to ask the model to keep a single four-star throughout the distribution of graphs, and very unlikely to achieve model convergence on a network of this size. A way of solving this problem is to treat the two directors (one has degree three; the other has degree four) as special cases, and we can fit a model for the network without them, hence there is no director three stars in the observed network. The model converged with the same set of parameters as in Model 8.9 on this (364, 50) network, and the parameter estimates are listed in Table 8. The new estimation result gives similar interpretations as the result for the 366 directors. Although the parameter estimates for company alternating-k-stars has changed sign from negative to positive, it is still not significant. The goodness of fit results are listed in Table 9 for the 364 directors. Without the three- and four-star directors, the model fits the data very well, as indicated by the small t-ratios, small Mahalanobis distance, and a non-significant p-value at 0.1 level from the χ 2 -test with 12 degrees of freedom. We can conclude that, by remove the two high-degree directors, model 8.9 fitted most of the graph statistics very well, hence it is a good model for this network. A direct interpretation is 37

39 Table 7: Goodness of Fit of Model (8.9) on 366 directors z(x) obs. Mean S.D. t-ratio L S P S P S A S A L C K-S P K-S A K-C P K-C A stddev D P skew D P stddev D A skew D A Clust.Coef Mahalanobis Distance Table 8: Parameter Estimates for Model (8.9) on 364 directors Effect MLE S.E. t-ratio* Choice (θ) Director Alternating-K-Star (σ P ) Company Alternating-K-Star (σ A ) Alternating-Two-Path K CA (β KCA ) *t-ratio for convergence 38

40 Table 9: Goodness of Fit of Model (8.9) on 364 directors z(x) obs. Mean S.D. t-ratio L S P S P S A S A L C K-S P K-S A K-C P K-C A stddev D P skew D P stddev D A skew D A Clust.Coef Mahalanobis Distance p-value (χ 2 12 )

41 that the conditional log-odds of director i sitting on the board of company j is given by 4.806u L (x ij ) 6.485u KSP (x ij, λ) u KSA (x ij, λ) 5.806u KCA (x ij, λ), λ = 2.0 (8.10) Given the rest of the model, there are not many popular directors, or directors with high degrees; and directors tend not to be linked by multiple companies Largest interlocked component from the top 500 (1996) The largest interlocked component network, which is from the top 500 listed companies of both financial and industrial sectors in Australia in 1996, has 198 companies interlocked by 255 directors with 675 ties. A display of the network is shown in Figure 21 where squares are the companies and circles are directors. We use this large network as an example to show how robust the new specification is, in terms of obtaining model convergence. At the same time, it also shows the limitations of the model for large networks. Figure 21: Largest interlocked component, Australia (1996) The Markov model with L 3 and C 4 parameters is far away from convergence due to the degenerate property. Two different new specification models have converged successfully. Model (8.12) has parameters for the choice effect and alternating company and director k-stars, plus company alternating k-two-paths, while Model (8.11) used the same choice and star effects, plus a director alternating two-path parameter. Both models used a damping parameter of λ = 2.0. Pr(X = x) = 1 κ exp {θz L(x) + σ P z SP (x, λ) + σ A z SA (x, λ) + β KCP z KCP (x, λ)}, λ = 2.0(8.11) Pr(X = x) = 1 κ exp {θz L(x) + σ P z SP (x, λ) + σ A z SA (x, λ) + β KCA z KCA (x, λ)}, λ = 2.0(8.12) 40

Transitivity and Triads

Transitivity and Triads 1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012 2 / 32 Outline 1 Local Structure Transitivity 2 3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one

More information

Statistical Methods for Network Analysis: Exponential Random Graph Models

Statistical Methods for Network Analysis: Exponential Random Graph Models Day 2: Network Modeling Statistical Methods for Network Analysis: Exponential Random Graph Models NMID workshop September 17 21, 2012 Prof. Martina Morris Prof. Steven Goodreau Supported by the US National

More information

Exponential Random Graph (p*) Models for Social Networks

Exponential Random Graph (p*) Models for Social Networks Exponential Random Graph (p*) Models for Social Networks Author: Garry Robins, School of Behavioural Science, University of Melbourne, Australia Article outline: Glossary I. Definition II. Introduction

More information

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm Fitting Social Network Models Using the Varying Truncation Stochastic Approximation MCMC Algorithm May. 17, 2012 1 This talk is based on a joint work with Dr. Ick Hoon Jin Abstract The exponential random

More information

Statistical Modeling of Complex Networks Within the ERGM family

Statistical Modeling of Complex Networks Within the ERGM family Statistical Modeling of Complex Networks Within the ERGM family Mark S. Handcock Department of Statistics University of Washington U. Washington network modeling group Research supported by NIDA Grant

More information

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data Wasserman and Faust, Chapter 3: Notation for Social Network Data Three different network notational schemes Graph theoretic: the most useful for centrality and prestige methods, cohesive subgroup ideas,

More information

Exponential Random Graph Models for Social Networks

Exponential Random Graph Models for Social Networks Exponential Random Graph Models for Social Networks ERGM Introduction Martina Morris Departments of Sociology, Statistics University of Washington Departments of Sociology, Statistics, and EECS, and Institute

More information

An Introduction to Exponential Random Graph (p*) Models for Social Networks

An Introduction to Exponential Random Graph (p*) Models for Social Networks An Introduction to Exponential Random Graph (p*) Models for Social Networks Garry Robins, Pip Pattison, Yuval Kalish, Dean Lusher, Department of Psychology, University of Melbourne. 22 February 2006. Note:

More information

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Michael Schweinberger Pennsylvania State University ONR grant N00014-08-1-1015 Scalable Methods for the Analysis of Network-Based

More information

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Computational Issues with ERGM: Pseudo-likelihood for constrained degree models For details, see: Mark S. Handcock University of California - Los Angeles MURI-UCI June 3, 2011 van Duijn, Marijtje A. J.,

More information

Snowball sampling for estimating exponential random graph models for large networks I

Snowball sampling for estimating exponential random graph models for large networks I Snowball sampling for estimating exponential random graph models for large networks I Alex D. Stivala a,, Johan H. Koskinen b,davida.rolls a,pengwang a, Garry L. Robins a a Melbourne School of Psychological

More information

STATISTICAL METHODS FOR NETWORK ANALYSIS

STATISTICAL METHODS FOR NETWORK ANALYSIS NME Workshop 1 DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness, Ph.D. Supported by the US National Institutes of Health Today we will cover

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Social networks, social space, social structure

Social networks, social space, social structure Social networks, social space, social structure Pip Pattison Department of Psychology University of Melbourne Sunbelt XXII International Social Network Conference New Orleans, February 22 Social space,

More information

University of Groningen

University of Groningen University of Groningen A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models van Duijn, Maria; Gile, Krista J.; Handcock,

More information

Chapter 4 Graphs and Matrices. PAD637 Week 3 Presentation Prepared by Weijia Ran & Alessandro Del Ponte

Chapter 4 Graphs and Matrices. PAD637 Week 3 Presentation Prepared by Weijia Ran & Alessandro Del Ponte Chapter 4 Graphs and Matrices PAD637 Week 3 Presentation Prepared by Weijia Ran & Alessandro Del Ponte 1 Outline Graphs: Basic Graph Theory Concepts Directed Graphs Signed Graphs & Signed Directed Graphs

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Networks and Algebraic Statistics

Networks and Algebraic Statistics Networks and Algebraic Statistics Dane Wilburne Illinois Institute of Technology UC Davis CACAO Seminar Davis, CA October 4th, 2016 dwilburne@hawk.iit.edu (IIT) Networks and Alg. Stat. Oct. 2016 1 / 23

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

10708 Graphical Models: Homework 4

10708 Graphical Models: Homework 4 10708 Graphical Models: Homework 4 Due November 12th, beginning of class October 29, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,

More information

Exponential Random Graph Models Under Measurement Error

Exponential Random Graph Models Under Measurement Error Exponential Random Graph Models Under Measurement Error Zoe Rehnberg Advisor: Dr. Nan Lin Abstract Understanding social networks is increasingly important in a world dominated by social media and access

More information

Algebraic statistics for network models

Algebraic statistics for network models Algebraic statistics for network models Connecting statistics, combinatorics, and computational algebra Part One Sonja Petrović (Statistics Department, Pennsylvania State University) Applied Mathematics

More information

Bayesian Inference for Exponential Random Graph Models

Bayesian Inference for Exponential Random Graph Models Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2011 Bayesian Inference for Exponential Random Graph Models Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie Nial

More information

Tom A.B. Snijders Christian Steglich Michael Schweinberger Mark Huisman

Tom A.B. Snijders Christian Steglich Michael Schweinberger Mark Huisman Manual for SIENA version 2.1 Tom A.B. Snijders Christian Steglich Michael Schweinberger Mark Huisman ICS, Department of Sociology Grote Rozenstraat 31, 9712 TG Groningen, The Netherlands February 14, 2005

More information

STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS

STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS statnetweb Workshop (Sunbelt 2018) 1 STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS SUNBELT June 26, 2018 Martina Morris, Ph.D. Skye Bender-deMoll statnet

More information

An open software system for the advanced statistical analysis of social networks

An open software system for the advanced statistical analysis of social networks StOCNET An open software system for the advanced statistical analysis of social networks User s Manual version 1.7 February 2006 Groningen: ICS / Science Plus http://stat.gamma.rug.nl/stocnet/ Peter Boer

More information

Complex-Network Modelling and Inference

Complex-Network Modelling and Inference Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

V2: Measures and Metrics (II)

V2: Measures and Metrics (II) - Betweenness Centrality V2: Measures and Metrics (II) - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity - Homophily and Assortative Mixing 1 Betweenness

More information

The Network Analysis Five-Number Summary

The Network Analysis Five-Number Summary Chapter 2 The Network Analysis Five-Number Summary There is nothing like looking, if you want to find something. You certainly usually find something, if you look, but it is not always quite the something

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL

CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL Essentially, all models are wrong, but some are useful. Box and Draper (1979, p. 424), as cited in Box and Draper (2007) For decades, network

More information

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman Manual for SIENA version 3.2 Provisional version Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman University of Groningen: ICS, Department of Sociology Grote Rozenstraat 31,

More information

3 : Representation of Undirected GMs

3 : Representation of Undirected GMs 0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

More information

1 Homophily and assortative mixing

1 Homophily and assortative mixing 1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate

More information

Technical report. Network degree distributions

Technical report. Network degree distributions Technical report Network degree distributions Garry Robins Philippa Pattison Johan Koskinen Social Networks laboratory School of Behavioural Science University of Melbourne 22 April 2008 1 This paper defines

More information

Random Graph Model; parameterization 2

Random Graph Model; parameterization 2 Agenda Random Graphs Recap giant component and small world statistics problems: degree distribution and triangles Recall that a graph G = (V, E) consists of a set of vertices V and a set of edges E V V.

More information

Centrality Book. cohesion.

Centrality Book. cohesion. Cohesion The graph-theoretic terms discussed in the previous chapter have very specific and concrete meanings which are highly shared across the field of graph theory and other fields like social network

More information

PNet for Dummies. An introduction to estimating exponential random graph (p*) models with PNet. Version Nicholas Harrigan

PNet for Dummies. An introduction to estimating exponential random graph (p*) models with PNet. Version Nicholas Harrigan PNet for Dummies An introduction to estimating exponential random graph (p*) models with PNet Version 1.04 Nicholas Harrigan To download the latest copy of this manual go to: http://www.sna.unimelb.edu.au/pnet/pnet.html#download

More information

Package hergm. R topics documented: January 10, Version Date

Package hergm. R topics documented: January 10, Version Date Version 3.1-0 Date 2016-09-22 Package hergm January 10, 2017 Title Hierarchical Exponential-Family Random Graph Models Author Michael Schweinberger [aut, cre], Mark S.

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Monte Carlo Maximum Likelihood for Exponential Random Graph Models: From Snowballs to Umbrella Densities

Monte Carlo Maximum Likelihood for Exponential Random Graph Models: From Snowballs to Umbrella Densities Monte Carlo Maximum Likelihood for Exponential Random Graph Models: From Snowballs to Umbrella Densities The Harvard community has made this article openly available. Please share how this access benefits

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy

Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Pallavi Baral and Jose Pedro Fique Department of Economics Indiana University at Bloomington 1st Annual CIRANO Workshop on Networks

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

Introduction to network metrics

Introduction to network metrics Universitat Politècnica de Catalunya Version 0.5 Complex and Social Networks (2018-2019) Master in Innovation and Research in Informatics (MIRI) Instructors Argimiro Arratia, argimiro@cs.upc.edu, http://www.cs.upc.edu/~argimiro/

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

Tied Kronecker Product Graph Models to Capture Variance in Network Populations Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2012 Tied Kronecker Product Graph Models to Capture Variance in Network Populations Sebastian

More information

This online supplement includes four parts: (1) an introduction to SAB models; (2) an

This online supplement includes four parts: (1) an introduction to SAB models; (2) an Supplemental Materials Till Stress Do Us Part: On the Interplay Between Perceived Stress and Communication Network Dynamics by Y. Kalish et al., 2015, Journal of Applied Psychology http://dx.doi.org/10.1037/apl0000023

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Web Science and Web Technology Affiliation Networks

Web Science and Web Technology Affiliation Networks 707.000 Web Science and Web Technology Affiliation Networks Markus Strohmaier Univ. Ass. / Assistant Professor Knowledge Management Institute Graz University of Technology, Austria [Freeman White 1993]

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

Statistical network modeling: challenges and perspectives

Statistical network modeling: challenges and perspectives Statistical network modeling: challenges and perspectives Harry Crane Department of Statistics Rutgers August 1, 2017 Harry Crane (Rutgers) Network modeling JSM IOL 2017 1 / 25 Statistical network modeling:

More information

Package Bergm. R topics documented: September 25, Type Package

Package Bergm. R topics documented: September 25, Type Package Type Package Package Bergm September 25, 2018 Title Bayesian Exponential Random Graph Models Version 4.2.0 Date 2018-09-25 Author Alberto Caimo [aut, cre], Lampros Bouranis [aut], Robert Krause [aut] Nial

More information

Part II. Graph Theory. Year

Part II. Graph Theory. Year Part II Year 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2017 53 Paper 3, Section II 15H Define the Ramsey numbers R(s, t) for integers s, t 2. Show that R(s, t) exists for all s,

More information

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman Manual for SIENA version 3.2 Provisional version Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman University of Groningen: ICS, Department of Sociology Grote Rozenstraat 31,

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie Unit 2: Graphs and Matrices ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie Four main ways to notate a social network There are a variety of ways to mathematize a social network,

More information

arxiv: v3 [stat.co] 4 May 2017

arxiv: v3 [stat.co] 4 May 2017 Efficient Bayesian inference for exponential random graph models by correcting the pseudo-posterior distribution Lampros Bouranis, Nial Friel, Florian Maire School of Mathematics and Statistics & Insight

More information

Modeling the Variance of Network Populations with Mixed Kronecker Product Graph Models

Modeling the Variance of Network Populations with Mixed Kronecker Product Graph Models Modeling the Variance of Networ Populations with Mixed Kronecer Product Graph Models Sebastian Moreno, Jennifer Neville Department of Computer Science Purdue University West Lafayette, IN 47906 {smorenoa,neville@cspurdueedu

More information

Decomposition of log-linear models

Decomposition of log-linear models Graphical Models, Lecture 5, Michaelmas Term 2009 October 27, 2009 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs A density f factorizes w.r.t. A if there

More information

GiRaF: a toolbox for Gibbs Random Fields analysis

GiRaF: a toolbox for Gibbs Random Fields analysis GiRaF: a toolbox for Gibbs Random Fields analysis Julien Stoehr *1, Pierre Pudlo 2, and Nial Friel 1 1 University College Dublin 2 Aix-Marseille Université February 24, 2016 Abstract GiRaF package offers

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

Tied Kronecker Product Graph Models to Capture Variance in Network Populations Tied Kronecker Product Graph Models to Capture Variance in Network Populations Sebastian Moreno, Sergey Kirshner +, Jennifer Neville +, S.V.N. Vishwanathan + Department of Computer Science, + Department

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Segmentation: Clustering, Graph Cut and EM

Segmentation: Clustering, Graph Cut and EM Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs

More information

Strategies for Modeling Two Categorical Variables with Multiple Category Choices

Strategies for Modeling Two Categorical Variables with Multiple Category Choices 003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

Degree Distribution: The case of Citation Networks

Degree Distribution: The case of Citation Networks Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is a

More information

Constructing a G(N, p) Network

Constructing a G(N, p) Network Random Graph Theory Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Introduction At first inspection, most

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Nested Sampling: Introduction and Implementation

Nested Sampling: Introduction and Implementation UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Assessing Degeneracy in Statistical Models of Social Networks 1

Assessing Degeneracy in Statistical Models of Social Networks 1 Assessing Degeneracy in Statistical Models of Social Networks 1 Mark S. Handcock University of Washington, Seattle Working Paper no. 39 Center for Statistics and the Social Sciences University of Washington

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

CS6702 GRAPH THEORY AND APPLICATIONS QUESTION BANK

CS6702 GRAPH THEORY AND APPLICATIONS QUESTION BANK CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1 UNIT I INTRODUCTION CS6702 GRAPH THEORY AND APPLICATIONS QUESTION BANK 1. Define Graph. 2. Define Simple graph. 3. Write few problems

More information

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Digital Image Processing Laboratory: Markov Random Fields and MAP Image Segmentation

Digital Image Processing Laboratory: Markov Random Fields and MAP Image Segmentation Purdue University: Digital Image Processing Laboratories Digital Image Processing Laboratory: Markov Random Fields and MAP Image Segmentation December, 205 Introduction This laboratory explores the use

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Introduction to Machine Learning

Introduction to Machine Learning Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information