STATISTICAL METHODS FOR NETWORK ANALYSIS

Size: px
Start display at page:

Download "STATISTICAL METHODS FOR NETWORK ANALYSIS"

Transcription

1 NME Workshop 1 DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness, Ph.D. Supported by the US National Institutes of Health

2 Today we will cover Three classes of statistical models for networks Simple null models Generative models for static networks Generative models for dynamic networks morning afternoon Ending with an example How we can use these models to answer questions about epidemic dynamics and interventions NME Workshop 2

3 Outline for the morning session Basic null hypothesis statistical tests Does your network differ from a simple random graph? Simple null models to test against: CUG and BRG ERGMs : generative models to test for multiple structural properties simultaneously in static networks Components of an ERG model Interpretation of coefficients Estimation algorithms (and when they fail) Model Diagnostics (estimation, and goodness of fit) Simulation from fitted models Network data requirements This is what we ll use ERGMs for in Epi modeling NME Workshop 3

4 Note We ll cover a lot of ground here some of the material and vocabulary may be unfamiliar Don t worry if you don t understand everything Focus on getting the big picture, not the details EpiModel puts a lot of this behind the curtain So you don t have to deal with it, for the most part The details do matter when you have a problematic model And don t be afraid to ask questions NME Workshop 4

5 Getting started Recall, the two ways to access statnetweb library(statnetweb); run_sw(); Open statnetweb and load the faux.mesa.high network NME Workshop 5

6 6 Statistical Testing: Basics How do you know if your network is significantly different than a simple random graph? NME Workshop

7 Description vs. Inference in statistics So far we have been using descriptive statistics to explore our network data Density Degree and geodesic distributions Mixing matrices Component size distributions Next, we might want to compare these statistics to what we would expect by chance What do we mean by chance? Is there a natural null hypothesis test in this context? NME Workshop 7

8 Recap Does the structure of our social network differ from a simple random graph? faux.mesa.high network Simple random graph with the same tie probability What are some structural differences you can see? NME Workshop 8

9 Consider triangles Suppose kids have a tendency to become friends with their friends friends And this is the only generative process occurring. Presumably, this would mean that you would observe more triangles than expected by chance in the graph. How would you test this for a specific network? NME Workshop 9

10 A basic statistical test for triangles Begin by counting the # triangles in your network Say this is T, your test statistic Then determine the probability of observing T or more triangles in this network And see if it is less than 5% But how do you determine that probability? For that you need a null hypothesis of some sort NME Workshop 10

11 What is the natural null hypothesis? It turns out there s more than one But they all get used the same way when constructing a statistical test. To create a sampling distribution consistent with the null And compare your observed value to that distribution NME Workshop 11

12 Null probability distribution (1) Unconditional: For a network this size (size = # nodes) enumerate all possible networks for a fixed number of nodes, count the number of triangles in each network, and construct the frequency distribution of these counts. Where does the number of triangles in your network lie in this distribution? Top 5%? Bottom 5% Near the middle? NME Workshop 12

13 Null probability distribution (1) For example: Take a network with 3 nodes How many dyads are there? 3 2 = 3! = 3 2! 3 2! How many different networks on these dyads? Every dyad has 2 possible values, and there are 3 dyads So the number of possible networks is: 2 3 = 8 What is the distribution of triangle counts? 7 networks have 0 triangles 1 network has 1 triangle Triangle distribution for 3 node networks 0 1 So if your network has 1 triangle, what do you think? NME Workshop 13

14 Null probability distribution (1) One problem with the unconditional null distribution enumerate all possible networks for a fixed number of nodes this is not so easy in practice for 4 nodes: # of dyads is 4*3/2 = 6 # of possible networks = 2 6 = 64 for 10 nodes: # of dyads is 10*9/2 = 45 # of possible networks = trillion for 20 nodes: # of dyads is 20*19/2 = 190 # of possible networks = We can solve this problem by sampling from the space of networks. NME Workshop 14

15 Null probability distribution (1) More important question for the unconditional null distribution Do you really care about comparing your network to networks with zero ties? Or all possible ties? Or does it make more sense to compare your network to other networks with the same number of ties? Controlling for density, does your network have more or less triangles than expected? NME Workshop 15

16 Null probability distribution (2) Condition on the density, the number of nodes and links This is the Conditional Uniform Graph test (CUG) enumerate all possible networks for a fixed number of nodes and links, count the number of triangles in each network, construct the frequency distribution of the counts compare the value in your network This also reduces the sample space but it s still a lot of graphs n 2 e = n 2! /e! ( n 2 e)! so we will still need to sample from this space in practice NME Workshop 16

17 The CUG is implemented as a permutation test Since full enumeration is typically not possible We sample the enumeration space by permutation Randomly choose a tied dyad, and a dyad without a tie Permute the tie and the non-tie This preserves the exact density in the network Count the number of triangles in the new network Repeat until you have the desired sample size Permutation tests are often used in statistics When the distribution of a sample statistic is not known NME Workshop 17

18 Null probability distribution (3) Condition on the probability of a tie This is the Bernoulli Random Graph test (BRG) Similar to the CUG, but treats density as a random variable Implemented via Markov Chain Monte-Carlo (MCMC) Randomly choose a dyad Flip a coin with probability(tie) = density of the network This will not preserve the exact density for each network, but will preserve it on average Repeat many times, then count the number of triangles in the final network Repeat until you have a sample of the desired size NME Workshop 18

19 Null models in statnetweb Select a summary measure for the observed data Compare it to the distribution simulated from a null model In statnetweb: We can plot null distribution overlays on degree and geodesic distributions And plot the CUG and BRG distributions for selected network summary measures NME Workshop 19

20 In statnetweb: Degree distribution Compare the degree distribution in faux.mesa.high to what we would expect by chance Network Descriptives Degree Distribution Select CUG and BRG null models Overlays the mean and 95% confidence intervals from 100 simulations What do you see now? NME Workshop 20

21 Test for the number of isolates Compare the number of isolates in faux.mesa.high to what we would expect by chance Network Descriptives More Conditional uniform graph tests NME Workshop 21

22 CUG test for triangles Are there more triangles in the observed network? Choose the triangle term from the dropdown menu and run 100 simulations to see how our network compares to the two null models CUG and BRG NME Workshop 22

23 Indeed NME Workshop 23

24 Yes the observed triangle count is high But why? a simple null hypothesis test doesn t provide any insight about that. NME Workshop 24

25 Limitations of simple null hypotheses If we are only interested in whether the triangle counts are different than expected given the density of the graph One can use these simple null hypothesis tests But if we want to understand the underlying generative process, quantify the impact of each process on our network, and control for other network features This requires a more general statistical modeling framework NME Workshop 25

26 26 Statistical Testing: Beyond Basics Can you control for more than just density? What if you want to test more than one network feature? And you want a model grounded in generative social theory? That s when you turn to ERGMs NME Workshop

27 Motivation Why are there so many more triangles? What do you see when color-coding the nodes by their attributes? faux.mesa.high network Simple random graph with the same tie probability NME Workshop 27

28 Friend of a friend, or birds of a feather? (At least) two theories about the process that generates triangles: 1. Homophily: People tend to chose friends who are like them, in terms of grade, race, etc. ( birds of a feather ), triad closure is a by-product 2. Transitivity: People who have friends in common tend to become friends ( friend of a friend ), triad closure is the key process So, for three actors in the same grade A cycle-closing tie may form due to transitivity But it may be due instead to homophily NME Workshop 28

29 partially Transitivity and homophily are confounded But not completely. Any tie may be classified by whether it is: Triangle forming: Within Grade: Yes No Yes Both Homophily No Transitivity Neither The cells represent how the processes jointly influence that tie, so the distribution of ties in this table is informative. This suggests we should be able to disentangle the two processes statistically NME Workshop 29

30 ERGMs: Basic idea We want to model the probability of a tie as a function of: Nodal attributes (that influence degree and mixing) The propensity for certain configurations (like triangles) The dyads may be dependent Nodal attribute effects do not induce dyad dependence But triad closure does So we model the joint distribution directly NME Workshop 30

31 Exponential Random Graph Model (ERGM) Probability of observing a graph (set of relationships) y on a fixed set of nodes: P Y = y ) = exp(θ g y ) k( ) where: g(y) = vector of network statistics = vector of model parameters k( ) = numerator summed over all possible networks on node set y Exponential family model Well understood statistical properties Besag (1974), Frank (1986) Very general and flexible NME Workshop 31

32 Exponential Random Graph Model (ERGM) Probability of observing a graph (set of relationships) y on a fixed set of nodes: P Y = y ) = exp(θ g y ) k( ) If you re not familiar with this kind of compact vector notation, the numerator is just: exp(θ 1 x 1 + θ 2 x θ p x p ) Kind of like a linear model, but a bit different (watch out for this later) NME Workshop 32

33 The conditional odds of a tie The probability of the graph P Y = y ) = exp(θ g y ) k( ) can be re-expressed as The conditional log odds of a specific tie logit P Y ij = 1 rest of the graph ) = log P Y ij = 1 rest of the graph ) P Y ij = 0 rest of the graph ) = θ g y where g y represents the change in g y when Y ij is toggled between 0 and 1 This is an auto logistic regression (auto because of the possible dependence) NME Workshop 33

34 ERGM specification: θ g y The g y terms in the model are summary network statistics Counts of network configurations, for example: 1. Edges: y ij 2. Within-group ties: y ij I(i C, j C) 3. 2-stars: y ij y ik 4. 3-cycles: y ij y ik y jk A key distinction in the types of terms: Dyad independent (1 & 2 are examples) Dyad dependent (3 & 4 are examples) NME Workshop 34

35 ERGM specification: θ g y Model specification involves: 1. Choosing the set of network statistics g y From minimal : # of edges To saturated: one term for every dyad in the network statnetweb allows you to choose from the list of terms and retrieve documentation for each one 2. Choosing homogeneity constraints on the parameter, for example, with edges: all homogeneous group specific (e.g., sex or age specific ) dyad specific NME Workshop 35

36 36 to StatnetWeb Let s explore the Florentine marriage network NME Workshop

37 Flomarriage: Bernoulli Model Load the flomarriage network Network of marriage ties between families in Renaissance Florence On the Fit Model page, look up the documentation on the edges term NME Workshop 37

38 Flomarriage: Bernoulli Model Add edges to the ergm formula Fit the model Step 1 Step 2 What does this model imply? Homogeneous edge probability Every tie is equally likely Not a very interesting model NME Workshop 38

39 Interpreting the coefficients The log-odds of any tie existing is: = change in # ties = Corresponding probability: = exp exp = You can confirm that this is the density of the network NME Workshop 39

40 Flomarriage: Triad Formation The triangle term is a measure of clustering Read the documentation for the triangle term Fit the model edges + triangle Hint: you can just add the triangle term if edges is already in your formula Then click Fit Model Triangle is a dyad dependent term, so the estimation algorithm changes to MCMC (more on this later) NME Workshop 40

41 Flomarriage: Triad Formation Note, not significant Now how to interpret the coefficients? Conditional log-odds of two actors having a tie: ( 1.68 change in the # of ties) + (0.16 change in # of triangles) always=1 how many triangles can one tie change? For a tie that will create zero triangles = 1.68 One triangle (0.16 1) = 1.52 Two triangles (0.16 2) = 1.36 Still unlikely, but a bit less so NME Workshop 41

42 Flomarriage: Nodal covariates flomarriage sized by wealth What do you notice? We can test whether edge probabilities are a function of wealth This is a quantitative nodal attribute, so we use the ergm term nodecov NME Workshop 42

43 Flomarriage: Nodal covariates Reset the ergm formula and fit the following model: There is a significant positive wealth effect on the odds of a tie What does the positive coefficient mean? Not that there is homophily by wealth Just that wealthy nodes have more ties Note that the wealth effect operates on both nodes in a dyad. NME Workshop 43

44 Flomarriage: Nodal covariates The conditional log-odds of a tie between two actors is: 2.59 change in # ties wealth of node wealth of node 2 For a tie between two nodes with minimum wealth (3) = 2.53 For a tie between two nodes with maximum wealth (146) = 0.33 For a tie between nodes with maximum and minimum wealth = 1.1 Note: To specify homophily on wealth, you would use the ergm-term absdiff NME Workshop 44

45 Estimation (in one slide) There is no closed form or analytic solution for the estimated coefficients (as there is in OLS: β = X X 1 (X Y)) Instead, we rely on a defining property of Maximum Likelihood Estimates (MLEs) for exponential family models At the MLE of the coefficients: expected values of the statistics under the model = the observed statistics And we find these MLEs using an iterative search algorithm A Markov Chain Monte Carlo (MCMC) algorithm Start with some initial θ values, simulate a sample of networks from those values Compare the means of the simulated statistics to the observed values Update the values of θ based on the deviations Repeat until the (expected observed) < epsilon NME Workshop 45

46 Estimation (ok, I needed 2 slides) What does it mean to simulate networks from those values? Pick a dyad at random Toss a coin to set the tie status The probability of the tie is determined by the model And the details of the MCMC sampling algorithm (Gibbs, Metropolis, Metropolis-Hastings) Repeat (many many many times) This produces a Markov Chain of networks Sample from this chain, every 1000 th element (say) Calculate the mean of the model statistics from this sample And compare the this mean to the observed network statistics NME Workshop 46

47 Computationally intensive estimation Has been key to statistical estimation of complex (i.e., realistic and interesting) models for dependent data And to the emergence of the field of data science In most cases, it works really well And there is lots of mathematical theory proving it has good convergence properties but, it can run into trouble especially if the model you re trying to fit is not a good one for the observed network NME Workshop 47

48 Model Degeneracy Models with dyad dependent terms can behave differently than we expect They look simple, almost like logistic regression But they represent effects that cascade through a network via a chain of dependence (this is the watch out from earlier) Homogeneous triangle and k-star terms turn out to be some of the worst offenders NME Workshop 48

49 Model Degeneracy Technical Definition: When a model places almost all probability on a small number of uninteresting graphs Most common uninteresting graphs: Complete (all links exist) Empty Model degeneracy = misspecification The model you specified would almost never produce the network you observed NME Workshop 49

50 Model Degeneracy Switch to the faux.mesa.high network Fit a model with: edges + triangle What happens? Trying to fit this model, the algorithm heads off into networks that are much more dense than the observed network. What does this mean? That this model would not have produced this network, for any combination of parameter estimates for the two terms i.e., this is a model misspecification problem NME Workshop 50

51 Degeneracy Plot (for the 2 star model) Only the white area has networks with some interesting variation The dark areas are complete graphs, or empty graphs (+/- 1 or 2 edges) From Mark Handcock s 2003 tech report: This model does not produce many useful networks NME Workshop 51

52 Solution: better network statistics Old statistic: t(x) = # of triangles in the graph Here, every additional 3-cycle has the same impact, New statistic: Set declining marginal returns for each additional 3-cycle involving the same edge The specific function we place on this shared partner distribution involves a geometric weighting Hence the name: geometrically weighted edge-wise shared partners A.k.a. GWESP The parameter that specifies the rate of decline in marginal returns is α The smaller the α, the more rapid the decline NME Workshop 52

53 Solution: better network statistics gwesp = e α n 2 i=1 1 1 e α i sp i sp i = # of edges with i shared partners This configuration contains: 1 edge with 3 shared partners 6 edges with 1 shared partner α GWESP(α) 0 e e e e = e e e e = 7.55 The # of edges with 1+ shared partners 1 e e e e = 8.03 NME Workshop 53

54 Solution: better network statistics gwesp = e α n 2 i=1 1 1 e α i sp i sp i = # of edges with i shared partners Count of edges in each triangle (i.e. # of triangles x 3) Count of edges in at least one triangle (because only an edge s first triangle counts) NME Workshop 54

55 55 to StatnetWeb Adding a gwesp term to the faux.mesa.high model And conducting model assessments NME Workshop

56 Fitting and diagnosing a model Convergence is the first assessment Dyad independent models will always converge Dyad dependent models may or may not Next step depends on the model: Dyad independent Dyad dependent Convergence assessment: MCMC diagnostics Goodness of fit assessment: GOF plots NME Workshop 56

57 What are MCMC Diagnostics? MCMC Diagnostics tell us if the estimation algorithm is mixing well These are taken from the penultimate MCMC chain, which is stored in the ergm output object These look pretty good The traceplots on the left show random walks around the target value (a bit of correlation in the sampled networks, but not enough to cause concern) The distribution of sampled statistics on the right is centered on the target values NME Workshop 57

58 Goodness of Fit (GOF) Traditional GOF stats can be used AIC, BIC are included in the model summary We also take another approach We are interested in how well we fit aggregate properties of the network structure that we did not include as model terms This helps to identify what the model gets wrong We use 3 higher order statistics: Degree distribution Shared partner distribution (non-parametric) (local clustering) Geodesic distance distribution (global clustering) NME Workshop 58

59 DATA MODEL ESTIMATED COEFFICIENTS SIMULATED DATA (draws from the prob. dist.) HIGHER ORDER GRAPH STATISTICS OF DATA HIGHER ORDER GRAPH STATISTICS OF SIMULATED DATA GOODNESS OF FIT OF MODEL TO DATA We ll show how to do this next NME Workshop 59

60 60 Take a break? NME Workshop

61 So let s run and compare several models These will allow us to examine the evidence for homophily vs. transitivity We ll assess the convergence of the different models As well as the goodness of fit And the implications for the generative process of high school friendship patterns in this network NME Workshop 61

62 Fit the Bernoulli model to faux.mesa.high Estimate, and run the default set of GOF terms for this model: faux.mesa.high ~ edges Is this a dyad independent or dyad dependent model? Dyad independent models are not fit with MCMC, so we don t need to check MCMC diagnostics We can move directly to GOF NME Workshop 62

63 Save the model This will keep the results so we can compare them later NME Workshop 63

64 Run the GOF for this model Go to the Goodness of Fit tab Run the default GOF This will take a moment because GOF is simulating 100 networks from the model, and calculating the default summary statistics for each one NME Workshop 64

65 Goodness of fit measure 1: degree distribution Data: Black line shows the observed data from faux.mesa.high Boxplots show 100 simulations from the Bernoulli model Model: Bernoulli (i.e. edges only) NME Workshop 65

66 Goodness of fit measure 2: ESP distribution (local clustering) Data: Model: Bernoulli (i.e. edges only) This edge has an ESP value of 3 NME Workshop 66

67 Goodness of fit measure 3: geodesic distribution (global clustering) Data: Model: Bernoulli (i.e. edges only) A C B A/B have geodesic 2 A/C have geodesic NME Workshop 67

68 Goodness of fit measures assembled faux.mesa.high ~ edges degree edgewise shared partner geodesic Summary: Not a good fit to any of the aggregate structural properties observed NME Workshop 68

69 Fit a model with gwesp Estimate, save and assess this model: faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE) Save this model too. This is a dyad dependent model It converges (unlike the triangle model) It is fit with MCMC NME Workshop So we should check the MCMC diagnostics 69

70 Run the MCMC diagnostics for this model Go to the MCMC diagnostics tab Select Model 2 Looks pretty good NME Workshop 70

71 Run the Goodness of Fit Much better, though the ESP distribution fit isn t great faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE) degree edge-wise shared partners minimum geodesic distance NME Workshop 71

72 And, a quick eyeball test The global structure looks kinda similar now,... But something is not right Observed network Network simulated from model* * We ll get to simulation in just a bit So, back to our original question: How much of the clustering is due to homophily, and how much to transitivity? NME Workshop 72

73 Test this by comparing four models Model Edges Edges + GWESP (transitivity) Edges + Attributes (homophily) Network Statistics g(y) # of edges # of edges weighted shared partners # of edges # of edges for each race, sex, grade # of edges that are within-race, within-grade, within-sex Edges + Attributes + GWESP (both) # of edges # of edges for each race, sex, grade # of edges that are within-race, within-grade, within-sex weighted shared partners NME Workshop 73

74 Fitting and saving models statnetweb allows you to save up to 5 models we ll fit 4 (you can cut and paste from here to statnetweb): 1. edges Fit model, save model, reset formula 2. edges + gwesp(0.25, fixed = T) Fit model, save model, reset formula You ve already fit and saved these 3. edges + nodefactor("grade") + nodefactor("race") + nodefactor("sex") + nodematch("grade", diff = T) + nodematch("race", diff = F) + nodematch("sex", diff = F) Fit model, save model, reset formula 4. edges + nodefactor("grade") + nodefactor("race") + nodefactor("sex") + nodematch("grade", diff = TRUE) + nodematch("race", diff = FALSE) + nodematch("sex", diff = FALSE) + gwesp(0.25, fixed = TRUE) Fit model, save model NME Workshop 74

75 Model Comparison Note how the gwesp estimate changes from model 2 to 4 About 25% smaller That s the impact of controlling for attribute effects, including homophily Homophily estimates change also, once you control for transitivity NME Workshop 75

76 GOF comparison for all 4 models: This will take some time to run 1. Edges AIC: Edges + GWESP AIC: Edges + Attributes AIC: Edges + Attributes + GWESP AIC: 1648 NME Workshop 76

77 Summary Both transitivity and homophily play a role in clustering these friendships Homophily accounts for the distribution of path lengths (geodesics) Transitivity (Triadic closure) Accounts for the large number of isolates Captures the local clustering (ESP) reasonably well ~25% of the transitivity effect is a by-product of homophily The gwesp coefficient drops by ~25% when homophily is added to the model The GOF suggests the ESP distribution is still not well fit You could tinker some more, if this was a real research question But we ll move on NME Workshop 77

78 Simulating networks from the model A fitted model describes a probability distribution across all networks of this size The model assigns a probability to every possible network The model terms and the estimated coefficients make some networks more likely than others You can simulate networks from this distribution Using the same MCMC algorithm that was used for estimation And the simulated networks will be centered on the network statistics in the original observed network This, of course, is why these models are really useful for network epidemiology NME Workshop 78

79 Simulations Choose one of the models that you have saved and run 100 simulations with the default control settings Choose the model on the Simulations page next to ergm formula Do you see autocorrelation in the simulation statistics? Increase the MCMC interval to 10,000 and re-run the simulations to see how this changes the autocorrelation NME Workshop 79

80 Some common statistics in ergms undirected network of 10 nodes, including nodal attribute color, with values: 1=black, 2=red, 3=green Term Formula Unit Value(s) ~edges # of edges edges 8 ~nodefactor( color ) Sum of degrees for nodes of each color nodes/edges* [8,] 6, 2 ~nodefactor( color, base=2) Sum of degrees for nodes of each color nodes/edges* 8, [6,] 2 ~nodematch( color ) # of edges between nodes of same color edges 6 ~nodematch( color, diff = TRUE) # of edges between nodes of same color, for each color edges 3, 2, 1 NME Workshop 80

81 Some common statistics in ergms undirected network of 10 nodes, including nodal attribute color, with values: 1=black, 2=red, 3=green Term Formula Unit Value(s) ~nodemix( color, base=1) # of edges between nodes of each color combo edges [3,] 2, 2, 0, 0, 1 ~degree(0) # of nodes of degree 0 nodes 2 ~degree(2:5) # of nodes of degrees 2, 3, 4, 5 each nodes 1, 2, 1, 0 ~concurrent # of nodes of at least degree 2 nodes 4 NME Workshop 81

82 Some common statistics in ergms undirected network of 10 nodes, including nodal attribute color, with values: 1=black, 2=red, 3=green Term Formula Unit Value(s) ~triangle # of triangles (beware!) triangles 2 ~gwesp(0) # of edges in at least one triangle edges 5 ~gwesp( ) # of edges in triangles total (=3 * # triangles) triangles 6 NME Workshop 82

83 83 Network Data Where we will see the other benefit of an MLE based statistical modeling approach

84 Network data: Three main types Network census Data on every node and every link Adaptively sampled networks Link tracing designs (e.g., snowball or RDS) Infeasible in practice Challenging, and requires strong assumptions for limited purposes Egocentrically sampled networks Enroll population sample ( egos ) Ask them the usual questions about themselves Ask them non-identifying information about their partners ( alters ) Timing (start and end of partnership) Alter characteristics (sex, age, race, etc.) Relational characteristics (type, cohabitation, etc.) Pair-specific behaviors (act frequency, condom use, etc.) Optional: ask about alter-alter ties Optional: ask about perceptions of alters alters more generally Feasible, statistically supported and general NME Workshop 84

85 Egocentric data Egocentrically sampled data allow us to observe Degree Mean degree, which sets density Degree distributions Nodal attributes Heterogeneity in degree by nodal attributes Mixing by nodal attributes Triads Only if the alter-alter matrix data are collected Timing Start/End, Duration of active and completed partnerships Much of the global structure of a network is set by these local properties And we can used the observed data to estimate the ERGM coefficients NME Workshop 85

86 Egocentric estimation for ERGMs Why does this work? MLEs for exponential families ERGMs are based in exponential family theory One of the properties of MLEs for exponential families is that E(sufficient stats under the model) = observed sufficient stats. Any graph with the same observed sufficient stats has the same probability under the model So we don t need to observe the specific complete network We just iterate our way (using MCMC) to finding the coefficients that satisfy E(sufficient stats under the model) = observed sufficient stats. Statistical inference for sampled data The sufficient stats are like any other sample statistic (e.g., a sample mean) There is a sampling distribution for these statistics Which allows the standard errors to be estimated NME Workshop 86

87 Egocentric data in ERGMs These can be handled in the software quite easily. Recall with faux.mesa.high above, we fit the ergm by providing: A model formula A network containing: nodes with their attributes the relations among those nodes But alternatively, one can pass: A model formula A network containing nodes with their attributes The sufficient statistics for the terms in the model formula for that set of nodes these are known as the target stats NME Workshop 87

88 Egocentric data in ERGMs Option 1: Option 2: net~edges+triangle net~edges+triangle target.stats = c(40, 7) NME Workshop 88

89 We ll be using this extensively this week EpiModel is designed to work with both Complete network data (census) Egocentric data with target stat specifications So you ll get lots of practice during the labs And we will be reviewing published examples Based on egocentric data That address key issues in HIV prevention and care NME Workshop 89

90 Egocentric estimation for ERGMs There is a also a specific package for estimating ERGMs from egocentrically sampled data ergm.ego Automates calculation of the target stats Handles survey weighting Provides other utilities for egocentric EDA Available on CRAN But is currently being refactored with a new API And is not yet integrated with EpiModel In the (near) future, this will be an option for EpiModel NME Workshop 90

91 Egocentric data for temporal ERGMs The same principles apply to estimating temporal ERGMs TERGMS -- For dynamic networks Specify the process of link formation and dissolution This requires collecting data on the duration of ties You ll learn more about this in the next session (on STERGMs) And it is the foundation for dynamic, stochastic network-based epidemic simulations This is what makes the EpiModel framework so powerful Simple data collection requirements Robust statistical methodology for estimation and inference Simulations rooted in empirical network data NME Workshop 91

92 Summary Network structure influences transmission dynamics Statistical models for networks (ERGMs) provide a way to estimate and evaluate hypotheses about the generative processes that lead to the structures we observe And the fully specified models can also be used to simulate networks. The expected values of the model statistics from the simulated networks will match the statistics in the observed network Of course, the networks we want to simulate need to be dynamic (and that s where we ll go after lunch) NME Workshop 92

93 Selected References Journal of Statistical Software (v42) 2008 Eight papers on ERGMs and statnet Goodreau, S., et al. (2009). "Birds of a Feather, or Friend of a Friend? Using Statistical Network Analysis to Investigate Adolescent Social Networks." Demography 46(1): Krivitsky PN, Morris M. Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. Annals of Applied Statistics. 2017;11(1): NME Workshop 93

Statistical Methods for Network Analysis: Exponential Random Graph Models

Statistical Methods for Network Analysis: Exponential Random Graph Models Day 2: Network Modeling Statistical Methods for Network Analysis: Exponential Random Graph Models NMID workshop September 17 21, 2012 Prof. Martina Morris Prof. Steven Goodreau Supported by the US National

More information

STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS

STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS statnetweb Workshop (Sunbelt 2018) 1 STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS SUNBELT June 26, 2018 Martina Morris, Ph.D. Skye Bender-deMoll statnet

More information

Exponential Random Graph Models for Social Networks

Exponential Random Graph Models for Social Networks Exponential Random Graph Models for Social Networks ERGM Introduction Martina Morris Departments of Sociology, Statistics University of Washington Departments of Sociology, Statistics, and EECS, and Institute

More information

Transitivity and Triads

Transitivity and Triads 1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012 2 / 32 Outline 1 Local Structure Transitivity 2 3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one

More information

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm Fitting Social Network Models Using the Varying Truncation Stochastic Approximation MCMC Algorithm May. 17, 2012 1 This talk is based on a joint work with Dr. Ick Hoon Jin Abstract The exponential random

More information

CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL

CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL Essentially, all models are wrong, but some are useful. Box and Draper (1979, p. 424), as cited in Box and Draper (2007) For decades, network

More information

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models Computational Issues with ERGM: Pseudo-likelihood for constrained degree models For details, see: Mark S. Handcock University of California - Los Angeles MURI-UCI June 3, 2011 van Duijn, Marijtje A. J.,

More information

Statistical Modeling of Complex Networks Within the ERGM family

Statistical Modeling of Complex Networks Within the ERGM family Statistical Modeling of Complex Networks Within the ERGM family Mark S. Handcock Department of Statistics University of Washington U. Washington network modeling group Research supported by NIDA Grant

More information

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data Wasserman and Faust, Chapter 3: Notation for Social Network Data Three different network notational schemes Graph theoretic: the most useful for centrality and prestige methods, cohesive subgroup ideas,

More information

EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks

EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks June 1, 2017 Samuel M. Jenness Emory University Steven M. Goodreau University of Washington Martina Morris University

More information

Exponential Random Graph Models Under Measurement Error

Exponential Random Graph Models Under Measurement Error Exponential Random Graph Models Under Measurement Error Zoe Rehnberg Advisor: Dr. Nan Lin Abstract Understanding social networks is increasingly important in a world dominated by social media and access

More information

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Michael Schweinberger Pennsylvania State University ONR grant N00014-08-1-1015 Scalable Methods for the Analysis of Network-Based

More information

Exponential Random Graph (p*) Models for Social Networks

Exponential Random Graph (p*) Models for Social Networks Exponential Random Graph (p*) Models for Social Networks Author: Garry Robins, School of Behavioural Science, University of Melbourne, Australia Article outline: Glossary I. Definition II. Introduction

More information

Exponential Random Graph (p ) Models for Affiliation Networks

Exponential Random Graph (p ) Models for Affiliation Networks Exponential Random Graph (p ) Models for Affiliation Networks A thesis submitted in partial fulfillment of the requirements of the Postgraduate Diploma in Science (Mathematics and Statistics) The University

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Networks and Algebraic Statistics

Networks and Algebraic Statistics Networks and Algebraic Statistics Dane Wilburne Illinois Institute of Technology UC Davis CACAO Seminar Davis, CA October 4th, 2016 dwilburne@hawk.iit.edu (IIT) Networks and Alg. Stat. Oct. 2016 1 / 23

More information

Extending ERGM Functionality within statnet: Building Custom User Terms. David R. Hunter Steven M. Goodreau Statnet Development Team

Extending ERGM Functionality within statnet: Building Custom User Terms. David R. Hunter Steven M. Goodreau Statnet Development Team Extending ERGM Functionality within statnet: Building Custom User Terms David R. Hunter Steven M. Goodreau Statnet Development Team Sunbelt 2012 ERGM basic expression Probability of observing a network

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

The Network Analysis Five-Number Summary

The Network Analysis Five-Number Summary Chapter 2 The Network Analysis Five-Number Summary There is nothing like looking, if you want to find something. You certainly usually find something, if you look, but it is not always quite the something

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Warm-up as you walk in

Warm-up as you walk in arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

An Introduction to Exponential Random Graph (p*) Models for Social Networks

An Introduction to Exponential Random Graph (p*) Models for Social Networks An Introduction to Exponential Random Graph (p*) Models for Social Networks Garry Robins, Pip Pattison, Yuval Kalish, Dean Lusher, Department of Psychology, University of Melbourne. 22 February 2006. Note:

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Algebraic statistics for network models

Algebraic statistics for network models Algebraic statistics for network models Connecting statistics, combinatorics, and computational algebra Part One Sonja Petrović (Statistics Department, Pennsylvania State University) Applied Mathematics

More information

1 More configuration model

1 More configuration model 1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.

More information

An Introduction to Markov Chain Monte Carlo

An Introduction to Markov Chain Monte Carlo An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other

More information

Intro to Random Graphs and Exponential Random Graph Models

Intro to Random Graphs and Exponential Random Graph Models Intro to Random Graphs and Exponential Random Graph Models Danielle Larcomb University of Denver Danielle Larcomb Random Graphs 1/26 Necessity of Random Graphs The study of complex networks plays an increasingly

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany Problem Set #7: Handling Egocentric Network Data Adapted from original by Peter V. Marsden, Harvard University Egocentric network data sometimes known as personal

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

Package ergm.ego. August 17, 2018

Package ergm.ego. August 17, 2018 Version 0.4.0 Date 2018-08-17 Package ergm.ego August 17, 2018 Title Fit, Simulate and Diagnose Exponential-Family Random Graph Models to Egocentrically Sampled Network Data Depends ergm (>= 3.9), network

More information

Package hergm. R topics documented: January 10, Version Date

Package hergm. R topics documented: January 10, Version Date Version 3.1-0 Date 2016-09-22 Package hergm January 10, 2017 Title Hierarchical Exponential-Family Random Graph Models Author Michael Schweinberger [aut, cre], Mark S.

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Week 10: Heteroskedasticity II

Week 10: Heteroskedasticity II Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Complex-Network Modelling and Inference

Complex-Network Modelling and Inference Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

University of Groningen

University of Groningen University of Groningen A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models van Duijn, Maria; Gile, Krista J.; Handcock,

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen 1 Background The primary goal of most phylogenetic analyses in BEAST is to infer the posterior distribution of trees and associated model

More information

Random Graph Model; parameterization 2

Random Graph Model; parameterization 2 Agenda Random Graphs Recap giant component and small world statistics problems: degree distribution and triangles Recall that a graph G = (V, E) consists of a set of vertices V and a set of edges E V V.

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Data 8 Final Review #1

Data 8 Final Review #1 Data 8 Final Review #1 Topics we ll cover: Visualizations Arrays and Table Manipulations Programming constructs (functions, for loops, conditional statements) Chance, Simulation, Sampling and Distributions

More information

1 Homophily and assortative mixing

1 Homophily and assortative mixing 1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

V2: Measures and Metrics (II)

V2: Measures and Metrics (II) - Betweenness Centrality V2: Measures and Metrics (II) - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity - Homophily and Assortative Mixing 1 Betweenness

More information

5 Machine Learning Abstractions and Numerical Optimization

5 Machine Learning Abstractions and Numerical Optimization Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Randomized algorithms have several advantages over deterministic ones. We discuss them here: CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

This online supplement includes four parts: (1) an introduction to SAB models; (2) an

This online supplement includes four parts: (1) an introduction to SAB models; (2) an Supplemental Materials Till Stress Do Us Part: On the Interplay Between Perceived Stress and Communication Network Dynamics by Y. Kalish et al., 2015, Journal of Applied Psychology http://dx.doi.org/10.1037/apl0000023

More information

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas

More information

Random Simplicial Complexes

Random Simplicial Complexes Random Simplicial Complexes Duke University CAT-School 2015 Oxford 8/9/2015 Part I Random Combinatorial Complexes Contents Introduction The Erdős Rényi Random Graph The Random d-complex The Random Clique

More information

Notes on Simulations in SAS Studio

Notes on Simulations in SAS Studio Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write

More information

Section 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS

Section 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS Section 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS Are hubs adjacent to hubs? How does a node s degree relate to its neighbors degree? Real networks usually show a non-zero degree correlation

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

Package Bergm. R topics documented: September 25, Type Package

Package Bergm. R topics documented: September 25, Type Package Type Package Package Bergm September 25, 2018 Title Bayesian Exponential Random Graph Models Version 4.2.0 Date 2018-09-25 Author Alberto Caimo [aut, cre], Lampros Bouranis [aut], Robert Krause [aut] Nial

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy

Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Pallavi Baral and Jose Pedro Fique Department of Economics Indiana University at Bloomington 1st Annual CIRANO Workshop on Networks

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks

A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks Bilal Khan, Kirk Dombrowski, and Mohamed Saad, Journal of Transactions of Society Modeling and Simulation

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Snowball sampling for estimating exponential random graph models for large networks I

Snowball sampling for estimating exponential random graph models for large networks I Snowball sampling for estimating exponential random graph models for large networks I Alex D. Stivala a,, Johan H. Koskinen b,davida.rolls a,pengwang a, Garry L. Robins a a Melbourne School of Psychological

More information

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3 CPSC 320: Intermediate Algorithm Design and Analysis Author: Susanne Bradley Tutorial: Week 3 At the time of this week s tutorial, we were approaching the end of our stable matching unit and about to start

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Erdős-Rényi Model for network formation

Erdős-Rényi Model for network formation Network Science: Erdős-Rényi Model for network formation Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Why model? Simpler representation

More information

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506. An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Workshop 8: Model selection

Workshop 8: Model selection Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

CHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd

CHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd CHAPTER 2 Morphometry on rodent brains A.E.H. Scheenstra J. Dijkstra L. van der Weerd This chapter was adapted from: Volumetry and other quantitative measurements to assess the rodent brain, In vivo NMR

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Grade 6 Middle School Math Solution Alignment to Oklahoma Academic Standards

Grade 6 Middle School Math Solution Alignment to Oklahoma Academic Standards 6.N.1 Read, write, and represent integers and rational numbers expressed as fractions, decimals, percents, and ratios; write positive integers as products of factors; use these representations in real-world

More information

Basics of Computational Geometry

Basics of Computational Geometry Basics of Computational Geometry Nadeem Mohsin October 12, 2013 1 Contents This handout covers the basic concepts of computational geometry. Rather than exhaustively covering all the algorithms, it deals

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2

Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2 CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2 Due Tuesday July 5 at 1:59PM 1. (8 points: 3/5) Hit or miss For each of the claims and proofs below, state whether

More information