STATISTICAL METHODS FOR NETWORK ANALYSIS
|
|
- Warren Oliver
- 5 years ago
- Views:
Transcription
1 NME Workshop 1 DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSIS Martina Morris, Ph.D. Steven M. Goodreau, Ph.D. Samuel M. Jenness, Ph.D. Supported by the US National Institutes of Health
2 Today we will cover Three classes of statistical models for networks Simple null models Generative models for static networks Generative models for dynamic networks morning afternoon Ending with an example How we can use these models to answer questions about epidemic dynamics and interventions NME Workshop 2
3 Outline for the morning session Basic null hypothesis statistical tests Does your network differ from a simple random graph? Simple null models to test against: CUG and BRG ERGMs : generative models to test for multiple structural properties simultaneously in static networks Components of an ERG model Interpretation of coefficients Estimation algorithms (and when they fail) Model Diagnostics (estimation, and goodness of fit) Simulation from fitted models Network data requirements This is what we ll use ERGMs for in Epi modeling NME Workshop 3
4 Note We ll cover a lot of ground here some of the material and vocabulary may be unfamiliar Don t worry if you don t understand everything Focus on getting the big picture, not the details EpiModel puts a lot of this behind the curtain So you don t have to deal with it, for the most part The details do matter when you have a problematic model And don t be afraid to ask questions NME Workshop 4
5 Getting started Recall, the two ways to access statnetweb library(statnetweb); run_sw(); Open statnetweb and load the faux.mesa.high network NME Workshop 5
6 6 Statistical Testing: Basics How do you know if your network is significantly different than a simple random graph? NME Workshop
7 Description vs. Inference in statistics So far we have been using descriptive statistics to explore our network data Density Degree and geodesic distributions Mixing matrices Component size distributions Next, we might want to compare these statistics to what we would expect by chance What do we mean by chance? Is there a natural null hypothesis test in this context? NME Workshop 7
8 Recap Does the structure of our social network differ from a simple random graph? faux.mesa.high network Simple random graph with the same tie probability What are some structural differences you can see? NME Workshop 8
9 Consider triangles Suppose kids have a tendency to become friends with their friends friends And this is the only generative process occurring. Presumably, this would mean that you would observe more triangles than expected by chance in the graph. How would you test this for a specific network? NME Workshop 9
10 A basic statistical test for triangles Begin by counting the # triangles in your network Say this is T, your test statistic Then determine the probability of observing T or more triangles in this network And see if it is less than 5% But how do you determine that probability? For that you need a null hypothesis of some sort NME Workshop 10
11 What is the natural null hypothesis? It turns out there s more than one But they all get used the same way when constructing a statistical test. To create a sampling distribution consistent with the null And compare your observed value to that distribution NME Workshop 11
12 Null probability distribution (1) Unconditional: For a network this size (size = # nodes) enumerate all possible networks for a fixed number of nodes, count the number of triangles in each network, and construct the frequency distribution of these counts. Where does the number of triangles in your network lie in this distribution? Top 5%? Bottom 5% Near the middle? NME Workshop 12
13 Null probability distribution (1) For example: Take a network with 3 nodes How many dyads are there? 3 2 = 3! = 3 2! 3 2! How many different networks on these dyads? Every dyad has 2 possible values, and there are 3 dyads So the number of possible networks is: 2 3 = 8 What is the distribution of triangle counts? 7 networks have 0 triangles 1 network has 1 triangle Triangle distribution for 3 node networks 0 1 So if your network has 1 triangle, what do you think? NME Workshop 13
14 Null probability distribution (1) One problem with the unconditional null distribution enumerate all possible networks for a fixed number of nodes this is not so easy in practice for 4 nodes: # of dyads is 4*3/2 = 6 # of possible networks = 2 6 = 64 for 10 nodes: # of dyads is 10*9/2 = 45 # of possible networks = trillion for 20 nodes: # of dyads is 20*19/2 = 190 # of possible networks = We can solve this problem by sampling from the space of networks. NME Workshop 14
15 Null probability distribution (1) More important question for the unconditional null distribution Do you really care about comparing your network to networks with zero ties? Or all possible ties? Or does it make more sense to compare your network to other networks with the same number of ties? Controlling for density, does your network have more or less triangles than expected? NME Workshop 15
16 Null probability distribution (2) Condition on the density, the number of nodes and links This is the Conditional Uniform Graph test (CUG) enumerate all possible networks for a fixed number of nodes and links, count the number of triangles in each network, construct the frequency distribution of the counts compare the value in your network This also reduces the sample space but it s still a lot of graphs n 2 e = n 2! /e! ( n 2 e)! so we will still need to sample from this space in practice NME Workshop 16
17 The CUG is implemented as a permutation test Since full enumeration is typically not possible We sample the enumeration space by permutation Randomly choose a tied dyad, and a dyad without a tie Permute the tie and the non-tie This preserves the exact density in the network Count the number of triangles in the new network Repeat until you have the desired sample size Permutation tests are often used in statistics When the distribution of a sample statistic is not known NME Workshop 17
18 Null probability distribution (3) Condition on the probability of a tie This is the Bernoulli Random Graph test (BRG) Similar to the CUG, but treats density as a random variable Implemented via Markov Chain Monte-Carlo (MCMC) Randomly choose a dyad Flip a coin with probability(tie) = density of the network This will not preserve the exact density for each network, but will preserve it on average Repeat many times, then count the number of triangles in the final network Repeat until you have a sample of the desired size NME Workshop 18
19 Null models in statnetweb Select a summary measure for the observed data Compare it to the distribution simulated from a null model In statnetweb: We can plot null distribution overlays on degree and geodesic distributions And plot the CUG and BRG distributions for selected network summary measures NME Workshop 19
20 In statnetweb: Degree distribution Compare the degree distribution in faux.mesa.high to what we would expect by chance Network Descriptives Degree Distribution Select CUG and BRG null models Overlays the mean and 95% confidence intervals from 100 simulations What do you see now? NME Workshop 20
21 Test for the number of isolates Compare the number of isolates in faux.mesa.high to what we would expect by chance Network Descriptives More Conditional uniform graph tests NME Workshop 21
22 CUG test for triangles Are there more triangles in the observed network? Choose the triangle term from the dropdown menu and run 100 simulations to see how our network compares to the two null models CUG and BRG NME Workshop 22
23 Indeed NME Workshop 23
24 Yes the observed triangle count is high But why? a simple null hypothesis test doesn t provide any insight about that. NME Workshop 24
25 Limitations of simple null hypotheses If we are only interested in whether the triangle counts are different than expected given the density of the graph One can use these simple null hypothesis tests But if we want to understand the underlying generative process, quantify the impact of each process on our network, and control for other network features This requires a more general statistical modeling framework NME Workshop 25
26 26 Statistical Testing: Beyond Basics Can you control for more than just density? What if you want to test more than one network feature? And you want a model grounded in generative social theory? That s when you turn to ERGMs NME Workshop
27 Motivation Why are there so many more triangles? What do you see when color-coding the nodes by their attributes? faux.mesa.high network Simple random graph with the same tie probability NME Workshop 27
28 Friend of a friend, or birds of a feather? (At least) two theories about the process that generates triangles: 1. Homophily: People tend to chose friends who are like them, in terms of grade, race, etc. ( birds of a feather ), triad closure is a by-product 2. Transitivity: People who have friends in common tend to become friends ( friend of a friend ), triad closure is the key process So, for three actors in the same grade A cycle-closing tie may form due to transitivity But it may be due instead to homophily NME Workshop 28
29 partially Transitivity and homophily are confounded But not completely. Any tie may be classified by whether it is: Triangle forming: Within Grade: Yes No Yes Both Homophily No Transitivity Neither The cells represent how the processes jointly influence that tie, so the distribution of ties in this table is informative. This suggests we should be able to disentangle the two processes statistically NME Workshop 29
30 ERGMs: Basic idea We want to model the probability of a tie as a function of: Nodal attributes (that influence degree and mixing) The propensity for certain configurations (like triangles) The dyads may be dependent Nodal attribute effects do not induce dyad dependence But triad closure does So we model the joint distribution directly NME Workshop 30
31 Exponential Random Graph Model (ERGM) Probability of observing a graph (set of relationships) y on a fixed set of nodes: P Y = y ) = exp(θ g y ) k( ) where: g(y) = vector of network statistics = vector of model parameters k( ) = numerator summed over all possible networks on node set y Exponential family model Well understood statistical properties Besag (1974), Frank (1986) Very general and flexible NME Workshop 31
32 Exponential Random Graph Model (ERGM) Probability of observing a graph (set of relationships) y on a fixed set of nodes: P Y = y ) = exp(θ g y ) k( ) If you re not familiar with this kind of compact vector notation, the numerator is just: exp(θ 1 x 1 + θ 2 x θ p x p ) Kind of like a linear model, but a bit different (watch out for this later) NME Workshop 32
33 The conditional odds of a tie The probability of the graph P Y = y ) = exp(θ g y ) k( ) can be re-expressed as The conditional log odds of a specific tie logit P Y ij = 1 rest of the graph ) = log P Y ij = 1 rest of the graph ) P Y ij = 0 rest of the graph ) = θ g y where g y represents the change in g y when Y ij is toggled between 0 and 1 This is an auto logistic regression (auto because of the possible dependence) NME Workshop 33
34 ERGM specification: θ g y The g y terms in the model are summary network statistics Counts of network configurations, for example: 1. Edges: y ij 2. Within-group ties: y ij I(i C, j C) 3. 2-stars: y ij y ik 4. 3-cycles: y ij y ik y jk A key distinction in the types of terms: Dyad independent (1 & 2 are examples) Dyad dependent (3 & 4 are examples) NME Workshop 34
35 ERGM specification: θ g y Model specification involves: 1. Choosing the set of network statistics g y From minimal : # of edges To saturated: one term for every dyad in the network statnetweb allows you to choose from the list of terms and retrieve documentation for each one 2. Choosing homogeneity constraints on the parameter, for example, with edges: all homogeneous group specific (e.g., sex or age specific ) dyad specific NME Workshop 35
36 36 to StatnetWeb Let s explore the Florentine marriage network NME Workshop
37 Flomarriage: Bernoulli Model Load the flomarriage network Network of marriage ties between families in Renaissance Florence On the Fit Model page, look up the documentation on the edges term NME Workshop 37
38 Flomarriage: Bernoulli Model Add edges to the ergm formula Fit the model Step 1 Step 2 What does this model imply? Homogeneous edge probability Every tie is equally likely Not a very interesting model NME Workshop 38
39 Interpreting the coefficients The log-odds of any tie existing is: = change in # ties = Corresponding probability: = exp exp = You can confirm that this is the density of the network NME Workshop 39
40 Flomarriage: Triad Formation The triangle term is a measure of clustering Read the documentation for the triangle term Fit the model edges + triangle Hint: you can just add the triangle term if edges is already in your formula Then click Fit Model Triangle is a dyad dependent term, so the estimation algorithm changes to MCMC (more on this later) NME Workshop 40
41 Flomarriage: Triad Formation Note, not significant Now how to interpret the coefficients? Conditional log-odds of two actors having a tie: ( 1.68 change in the # of ties) + (0.16 change in # of triangles) always=1 how many triangles can one tie change? For a tie that will create zero triangles = 1.68 One triangle (0.16 1) = 1.52 Two triangles (0.16 2) = 1.36 Still unlikely, but a bit less so NME Workshop 41
42 Flomarriage: Nodal covariates flomarriage sized by wealth What do you notice? We can test whether edge probabilities are a function of wealth This is a quantitative nodal attribute, so we use the ergm term nodecov NME Workshop 42
43 Flomarriage: Nodal covariates Reset the ergm formula and fit the following model: There is a significant positive wealth effect on the odds of a tie What does the positive coefficient mean? Not that there is homophily by wealth Just that wealthy nodes have more ties Note that the wealth effect operates on both nodes in a dyad. NME Workshop 43
44 Flomarriage: Nodal covariates The conditional log-odds of a tie between two actors is: 2.59 change in # ties wealth of node wealth of node 2 For a tie between two nodes with minimum wealth (3) = 2.53 For a tie between two nodes with maximum wealth (146) = 0.33 For a tie between nodes with maximum and minimum wealth = 1.1 Note: To specify homophily on wealth, you would use the ergm-term absdiff NME Workshop 44
45 Estimation (in one slide) There is no closed form or analytic solution for the estimated coefficients (as there is in OLS: β = X X 1 (X Y)) Instead, we rely on a defining property of Maximum Likelihood Estimates (MLEs) for exponential family models At the MLE of the coefficients: expected values of the statistics under the model = the observed statistics And we find these MLEs using an iterative search algorithm A Markov Chain Monte Carlo (MCMC) algorithm Start with some initial θ values, simulate a sample of networks from those values Compare the means of the simulated statistics to the observed values Update the values of θ based on the deviations Repeat until the (expected observed) < epsilon NME Workshop 45
46 Estimation (ok, I needed 2 slides) What does it mean to simulate networks from those values? Pick a dyad at random Toss a coin to set the tie status The probability of the tie is determined by the model And the details of the MCMC sampling algorithm (Gibbs, Metropolis, Metropolis-Hastings) Repeat (many many many times) This produces a Markov Chain of networks Sample from this chain, every 1000 th element (say) Calculate the mean of the model statistics from this sample And compare the this mean to the observed network statistics NME Workshop 46
47 Computationally intensive estimation Has been key to statistical estimation of complex (i.e., realistic and interesting) models for dependent data And to the emergence of the field of data science In most cases, it works really well And there is lots of mathematical theory proving it has good convergence properties but, it can run into trouble especially if the model you re trying to fit is not a good one for the observed network NME Workshop 47
48 Model Degeneracy Models with dyad dependent terms can behave differently than we expect They look simple, almost like logistic regression But they represent effects that cascade through a network via a chain of dependence (this is the watch out from earlier) Homogeneous triangle and k-star terms turn out to be some of the worst offenders NME Workshop 48
49 Model Degeneracy Technical Definition: When a model places almost all probability on a small number of uninteresting graphs Most common uninteresting graphs: Complete (all links exist) Empty Model degeneracy = misspecification The model you specified would almost never produce the network you observed NME Workshop 49
50 Model Degeneracy Switch to the faux.mesa.high network Fit a model with: edges + triangle What happens? Trying to fit this model, the algorithm heads off into networks that are much more dense than the observed network. What does this mean? That this model would not have produced this network, for any combination of parameter estimates for the two terms i.e., this is a model misspecification problem NME Workshop 50
51 Degeneracy Plot (for the 2 star model) Only the white area has networks with some interesting variation The dark areas are complete graphs, or empty graphs (+/- 1 or 2 edges) From Mark Handcock s 2003 tech report: This model does not produce many useful networks NME Workshop 51
52 Solution: better network statistics Old statistic: t(x) = # of triangles in the graph Here, every additional 3-cycle has the same impact, New statistic: Set declining marginal returns for each additional 3-cycle involving the same edge The specific function we place on this shared partner distribution involves a geometric weighting Hence the name: geometrically weighted edge-wise shared partners A.k.a. GWESP The parameter that specifies the rate of decline in marginal returns is α The smaller the α, the more rapid the decline NME Workshop 52
53 Solution: better network statistics gwesp = e α n 2 i=1 1 1 e α i sp i sp i = # of edges with i shared partners This configuration contains: 1 edge with 3 shared partners 6 edges with 1 shared partner α GWESP(α) 0 e e e e = e e e e = 7.55 The # of edges with 1+ shared partners 1 e e e e = 8.03 NME Workshop 53
54 Solution: better network statistics gwesp = e α n 2 i=1 1 1 e α i sp i sp i = # of edges with i shared partners Count of edges in each triangle (i.e. # of triangles x 3) Count of edges in at least one triangle (because only an edge s first triangle counts) NME Workshop 54
55 55 to StatnetWeb Adding a gwesp term to the faux.mesa.high model And conducting model assessments NME Workshop
56 Fitting and diagnosing a model Convergence is the first assessment Dyad independent models will always converge Dyad dependent models may or may not Next step depends on the model: Dyad independent Dyad dependent Convergence assessment: MCMC diagnostics Goodness of fit assessment: GOF plots NME Workshop 56
57 What are MCMC Diagnostics? MCMC Diagnostics tell us if the estimation algorithm is mixing well These are taken from the penultimate MCMC chain, which is stored in the ergm output object These look pretty good The traceplots on the left show random walks around the target value (a bit of correlation in the sampled networks, but not enough to cause concern) The distribution of sampled statistics on the right is centered on the target values NME Workshop 57
58 Goodness of Fit (GOF) Traditional GOF stats can be used AIC, BIC are included in the model summary We also take another approach We are interested in how well we fit aggregate properties of the network structure that we did not include as model terms This helps to identify what the model gets wrong We use 3 higher order statistics: Degree distribution Shared partner distribution (non-parametric) (local clustering) Geodesic distance distribution (global clustering) NME Workshop 58
59 DATA MODEL ESTIMATED COEFFICIENTS SIMULATED DATA (draws from the prob. dist.) HIGHER ORDER GRAPH STATISTICS OF DATA HIGHER ORDER GRAPH STATISTICS OF SIMULATED DATA GOODNESS OF FIT OF MODEL TO DATA We ll show how to do this next NME Workshop 59
60 60 Take a break? NME Workshop
61 So let s run and compare several models These will allow us to examine the evidence for homophily vs. transitivity We ll assess the convergence of the different models As well as the goodness of fit And the implications for the generative process of high school friendship patterns in this network NME Workshop 61
62 Fit the Bernoulli model to faux.mesa.high Estimate, and run the default set of GOF terms for this model: faux.mesa.high ~ edges Is this a dyad independent or dyad dependent model? Dyad independent models are not fit with MCMC, so we don t need to check MCMC diagnostics We can move directly to GOF NME Workshop 62
63 Save the model This will keep the results so we can compare them later NME Workshop 63
64 Run the GOF for this model Go to the Goodness of Fit tab Run the default GOF This will take a moment because GOF is simulating 100 networks from the model, and calculating the default summary statistics for each one NME Workshop 64
65 Goodness of fit measure 1: degree distribution Data: Black line shows the observed data from faux.mesa.high Boxplots show 100 simulations from the Bernoulli model Model: Bernoulli (i.e. edges only) NME Workshop 65
66 Goodness of fit measure 2: ESP distribution (local clustering) Data: Model: Bernoulli (i.e. edges only) This edge has an ESP value of 3 NME Workshop 66
67 Goodness of fit measure 3: geodesic distribution (global clustering) Data: Model: Bernoulli (i.e. edges only) A C B A/B have geodesic 2 A/C have geodesic NME Workshop 67
68 Goodness of fit measures assembled faux.mesa.high ~ edges degree edgewise shared partner geodesic Summary: Not a good fit to any of the aggregate structural properties observed NME Workshop 68
69 Fit a model with gwesp Estimate, save and assess this model: faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE) Save this model too. This is a dyad dependent model It converges (unlike the triangle model) It is fit with MCMC NME Workshop So we should check the MCMC diagnostics 69
70 Run the MCMC diagnostics for this model Go to the MCMC diagnostics tab Select Model 2 Looks pretty good NME Workshop 70
71 Run the Goodness of Fit Much better, though the ESP distribution fit isn t great faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE) degree edge-wise shared partners minimum geodesic distance NME Workshop 71
72 And, a quick eyeball test The global structure looks kinda similar now,... But something is not right Observed network Network simulated from model* * We ll get to simulation in just a bit So, back to our original question: How much of the clustering is due to homophily, and how much to transitivity? NME Workshop 72
73 Test this by comparing four models Model Edges Edges + GWESP (transitivity) Edges + Attributes (homophily) Network Statistics g(y) # of edges # of edges weighted shared partners # of edges # of edges for each race, sex, grade # of edges that are within-race, within-grade, within-sex Edges + Attributes + GWESP (both) # of edges # of edges for each race, sex, grade # of edges that are within-race, within-grade, within-sex weighted shared partners NME Workshop 73
74 Fitting and saving models statnetweb allows you to save up to 5 models we ll fit 4 (you can cut and paste from here to statnetweb): 1. edges Fit model, save model, reset formula 2. edges + gwesp(0.25, fixed = T) Fit model, save model, reset formula You ve already fit and saved these 3. edges + nodefactor("grade") + nodefactor("race") + nodefactor("sex") + nodematch("grade", diff = T) + nodematch("race", diff = F) + nodematch("sex", diff = F) Fit model, save model, reset formula 4. edges + nodefactor("grade") + nodefactor("race") + nodefactor("sex") + nodematch("grade", diff = TRUE) + nodematch("race", diff = FALSE) + nodematch("sex", diff = FALSE) + gwesp(0.25, fixed = TRUE) Fit model, save model NME Workshop 74
75 Model Comparison Note how the gwesp estimate changes from model 2 to 4 About 25% smaller That s the impact of controlling for attribute effects, including homophily Homophily estimates change also, once you control for transitivity NME Workshop 75
76 GOF comparison for all 4 models: This will take some time to run 1. Edges AIC: Edges + GWESP AIC: Edges + Attributes AIC: Edges + Attributes + GWESP AIC: 1648 NME Workshop 76
77 Summary Both transitivity and homophily play a role in clustering these friendships Homophily accounts for the distribution of path lengths (geodesics) Transitivity (Triadic closure) Accounts for the large number of isolates Captures the local clustering (ESP) reasonably well ~25% of the transitivity effect is a by-product of homophily The gwesp coefficient drops by ~25% when homophily is added to the model The GOF suggests the ESP distribution is still not well fit You could tinker some more, if this was a real research question But we ll move on NME Workshop 77
78 Simulating networks from the model A fitted model describes a probability distribution across all networks of this size The model assigns a probability to every possible network The model terms and the estimated coefficients make some networks more likely than others You can simulate networks from this distribution Using the same MCMC algorithm that was used for estimation And the simulated networks will be centered on the network statistics in the original observed network This, of course, is why these models are really useful for network epidemiology NME Workshop 78
79 Simulations Choose one of the models that you have saved and run 100 simulations with the default control settings Choose the model on the Simulations page next to ergm formula Do you see autocorrelation in the simulation statistics? Increase the MCMC interval to 10,000 and re-run the simulations to see how this changes the autocorrelation NME Workshop 79
80 Some common statistics in ergms undirected network of 10 nodes, including nodal attribute color, with values: 1=black, 2=red, 3=green Term Formula Unit Value(s) ~edges # of edges edges 8 ~nodefactor( color ) Sum of degrees for nodes of each color nodes/edges* [8,] 6, 2 ~nodefactor( color, base=2) Sum of degrees for nodes of each color nodes/edges* 8, [6,] 2 ~nodematch( color ) # of edges between nodes of same color edges 6 ~nodematch( color, diff = TRUE) # of edges between nodes of same color, for each color edges 3, 2, 1 NME Workshop 80
81 Some common statistics in ergms undirected network of 10 nodes, including nodal attribute color, with values: 1=black, 2=red, 3=green Term Formula Unit Value(s) ~nodemix( color, base=1) # of edges between nodes of each color combo edges [3,] 2, 2, 0, 0, 1 ~degree(0) # of nodes of degree 0 nodes 2 ~degree(2:5) # of nodes of degrees 2, 3, 4, 5 each nodes 1, 2, 1, 0 ~concurrent # of nodes of at least degree 2 nodes 4 NME Workshop 81
82 Some common statistics in ergms undirected network of 10 nodes, including nodal attribute color, with values: 1=black, 2=red, 3=green Term Formula Unit Value(s) ~triangle # of triangles (beware!) triangles 2 ~gwesp(0) # of edges in at least one triangle edges 5 ~gwesp( ) # of edges in triangles total (=3 * # triangles) triangles 6 NME Workshop 82
83 83 Network Data Where we will see the other benefit of an MLE based statistical modeling approach
84 Network data: Three main types Network census Data on every node and every link Adaptively sampled networks Link tracing designs (e.g., snowball or RDS) Infeasible in practice Challenging, and requires strong assumptions for limited purposes Egocentrically sampled networks Enroll population sample ( egos ) Ask them the usual questions about themselves Ask them non-identifying information about their partners ( alters ) Timing (start and end of partnership) Alter characteristics (sex, age, race, etc.) Relational characteristics (type, cohabitation, etc.) Pair-specific behaviors (act frequency, condom use, etc.) Optional: ask about alter-alter ties Optional: ask about perceptions of alters alters more generally Feasible, statistically supported and general NME Workshop 84
85 Egocentric data Egocentrically sampled data allow us to observe Degree Mean degree, which sets density Degree distributions Nodal attributes Heterogeneity in degree by nodal attributes Mixing by nodal attributes Triads Only if the alter-alter matrix data are collected Timing Start/End, Duration of active and completed partnerships Much of the global structure of a network is set by these local properties And we can used the observed data to estimate the ERGM coefficients NME Workshop 85
86 Egocentric estimation for ERGMs Why does this work? MLEs for exponential families ERGMs are based in exponential family theory One of the properties of MLEs for exponential families is that E(sufficient stats under the model) = observed sufficient stats. Any graph with the same observed sufficient stats has the same probability under the model So we don t need to observe the specific complete network We just iterate our way (using MCMC) to finding the coefficients that satisfy E(sufficient stats under the model) = observed sufficient stats. Statistical inference for sampled data The sufficient stats are like any other sample statistic (e.g., a sample mean) There is a sampling distribution for these statistics Which allows the standard errors to be estimated NME Workshop 86
87 Egocentric data in ERGMs These can be handled in the software quite easily. Recall with faux.mesa.high above, we fit the ergm by providing: A model formula A network containing: nodes with their attributes the relations among those nodes But alternatively, one can pass: A model formula A network containing nodes with their attributes The sufficient statistics for the terms in the model formula for that set of nodes these are known as the target stats NME Workshop 87
88 Egocentric data in ERGMs Option 1: Option 2: net~edges+triangle net~edges+triangle target.stats = c(40, 7) NME Workshop 88
89 We ll be using this extensively this week EpiModel is designed to work with both Complete network data (census) Egocentric data with target stat specifications So you ll get lots of practice during the labs And we will be reviewing published examples Based on egocentric data That address key issues in HIV prevention and care NME Workshop 89
90 Egocentric estimation for ERGMs There is a also a specific package for estimating ERGMs from egocentrically sampled data ergm.ego Automates calculation of the target stats Handles survey weighting Provides other utilities for egocentric EDA Available on CRAN But is currently being refactored with a new API And is not yet integrated with EpiModel In the (near) future, this will be an option for EpiModel NME Workshop 90
91 Egocentric data for temporal ERGMs The same principles apply to estimating temporal ERGMs TERGMS -- For dynamic networks Specify the process of link formation and dissolution This requires collecting data on the duration of ties You ll learn more about this in the next session (on STERGMs) And it is the foundation for dynamic, stochastic network-based epidemic simulations This is what makes the EpiModel framework so powerful Simple data collection requirements Robust statistical methodology for estimation and inference Simulations rooted in empirical network data NME Workshop 91
92 Summary Network structure influences transmission dynamics Statistical models for networks (ERGMs) provide a way to estimate and evaluate hypotheses about the generative processes that lead to the structures we observe And the fully specified models can also be used to simulate networks. The expected values of the model statistics from the simulated networks will match the statistics in the observed network Of course, the networks we want to simulate need to be dynamic (and that s where we ll go after lunch) NME Workshop 92
93 Selected References Journal of Statistical Software (v42) 2008 Eight papers on ERGMs and statnet Goodreau, S., et al. (2009). "Birds of a Feather, or Friend of a Friend? Using Statistical Network Analysis to Investigate Adolescent Social Networks." Demography 46(1): Krivitsky PN, Morris M. Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. Annals of Applied Statistics. 2017;11(1): NME Workshop 93
Statistical Methods for Network Analysis: Exponential Random Graph Models
Day 2: Network Modeling Statistical Methods for Network Analysis: Exponential Random Graph Models NMID workshop September 17 21, 2012 Prof. Martina Morris Prof. Steven Goodreau Supported by the US National
More informationSTATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS
statnetweb Workshop (Sunbelt 2018) 1 STATNET WEB THE EASY WAY TO LEARN (OR TEACH) STATISTICAL MODELING OF NETWORK DATA WITH ERGMS SUNBELT June 26, 2018 Martina Morris, Ph.D. Skye Bender-deMoll statnet
More informationExponential Random Graph Models for Social Networks
Exponential Random Graph Models for Social Networks ERGM Introduction Martina Morris Departments of Sociology, Statistics University of Washington Departments of Sociology, Statistics, and EECS, and Institute
More informationTransitivity and Triads
1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012 2 / 32 Outline 1 Local Structure Transitivity 2 3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one
More informationFitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm
Fitting Social Network Models Using the Varying Truncation Stochastic Approximation MCMC Algorithm May. 17, 2012 1 This talk is based on a joint work with Dr. Ick Hoon Jin Abstract The exponential random
More informationCHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL
CHAPTER 3. BUILDING A USEFUL EXPONENTIAL RANDOM GRAPH MODEL Essentially, all models are wrong, but some are useful. Box and Draper (1979, p. 424), as cited in Box and Draper (2007) For decades, network
More informationComputational Issues with ERGM: Pseudo-likelihood for constrained degree models
Computational Issues with ERGM: Pseudo-likelihood for constrained degree models For details, see: Mark S. Handcock University of California - Los Angeles MURI-UCI June 3, 2011 van Duijn, Marijtje A. J.,
More informationStatistical Modeling of Complex Networks Within the ERGM family
Statistical Modeling of Complex Networks Within the ERGM family Mark S. Handcock Department of Statistics University of Washington U. Washington network modeling group Research supported by NIDA Grant
More informationAlessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data
Wasserman and Faust, Chapter 3: Notation for Social Network Data Three different network notational schemes Graph theoretic: the most useful for centrality and prestige methods, cohesive subgroup ideas,
More informationEpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks
EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks June 1, 2017 Samuel M. Jenness Emory University Steven M. Goodreau University of Washington Martina Morris University
More informationExponential Random Graph Models Under Measurement Error
Exponential Random Graph Models Under Measurement Error Zoe Rehnberg Advisor: Dr. Nan Lin Abstract Understanding social networks is increasingly important in a world dominated by social media and access
More informationInstability, Sensitivity, and Degeneracy of Discrete Exponential Families
Instability, Sensitivity, and Degeneracy of Discrete Exponential Families Michael Schweinberger Pennsylvania State University ONR grant N00014-08-1-1015 Scalable Methods for the Analysis of Network-Based
More informationExponential Random Graph (p*) Models for Social Networks
Exponential Random Graph (p*) Models for Social Networks Author: Garry Robins, School of Behavioural Science, University of Melbourne, Australia Article outline: Glossary I. Definition II. Introduction
More informationExponential Random Graph (p ) Models for Affiliation Networks
Exponential Random Graph (p ) Models for Affiliation Networks A thesis submitted in partial fulfillment of the requirements of the Postgraduate Diploma in Science (Mathematics and Statistics) The University
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationNetworks and Algebraic Statistics
Networks and Algebraic Statistics Dane Wilburne Illinois Institute of Technology UC Davis CACAO Seminar Davis, CA October 4th, 2016 dwilburne@hawk.iit.edu (IIT) Networks and Alg. Stat. Oct. 2016 1 / 23
More informationExtending ERGM Functionality within statnet: Building Custom User Terms. David R. Hunter Steven M. Goodreau Statnet Development Team
Extending ERGM Functionality within statnet: Building Custom User Terms David R. Hunter Steven M. Goodreau Statnet Development Team Sunbelt 2012 ERGM basic expression Probability of observing a network
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationThe Network Analysis Five-Number Summary
Chapter 2 The Network Analysis Five-Number Summary There is nothing like looking, if you want to find something. You certainly usually find something, if you look, but it is not always quite the something
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationHeteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationBootstrapping Methods
Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationWarm-up as you walk in
arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationAn Introduction to Exponential Random Graph (p*) Models for Social Networks
An Introduction to Exponential Random Graph (p*) Models for Social Networks Garry Robins, Pip Pattison, Yuval Kalish, Dean Lusher, Department of Psychology, University of Melbourne. 22 February 2006. Note:
More informationSamuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR
Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report
More informationAlgebraic statistics for network models
Algebraic statistics for network models Connecting statistics, combinatorics, and computational algebra Part One Sonja Petrović (Statistics Department, Pennsylvania State University) Applied Mathematics
More information1 More configuration model
1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.
More informationAn Introduction to Markov Chain Monte Carlo
An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other
More informationIntro to Random Graphs and Exponential Random Graph Models
Intro to Random Graphs and Exponential Random Graph Models Danielle Larcomb University of Denver Danielle Larcomb Random Graphs 1/26 Necessity of Random Graphs The study of complex networks plays an increasingly
More informationRockefeller College University at Albany
Rockefeller College University at Albany Problem Set #7: Handling Egocentric Network Data Adapted from original by Peter V. Marsden, Harvard University Egocentric network data sometimes known as personal
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationPackage ergm.ego. August 17, 2018
Version 0.4.0 Date 2018-08-17 Package ergm.ego August 17, 2018 Title Fit, Simulate and Diagnose Exponential-Family Random Graph Models to Egocentrically Sampled Network Data Depends ergm (>= 3.9), network
More informationPackage hergm. R topics documented: January 10, Version Date
Version 3.1-0 Date 2016-09-22 Package hergm January 10, 2017 Title Hierarchical Exponential-Family Random Graph Models Author Michael Schweinberger [aut, cre], Mark S.
More information10.4 Linear interpolation method Newton s method
10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by
More informationWeek 10: Heteroskedasticity II
Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation
More informationShort-Cut MCMC: An Alternative to Adaptation
Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationComplex-Network Modelling and Inference
Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationUniversity of Groningen
University of Groningen A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models van Duijn, Maria; Gile, Krista J.; Handcock,
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationTutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen
Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen 1 Background The primary goal of most phylogenetic analyses in BEAST is to infer the posterior distribution of trees and associated model
More informationRandom Graph Model; parameterization 2
Agenda Random Graphs Recap giant component and small world statistics problems: degree distribution and triangles Recall that a graph G = (V, E) consists of a set of vertices V and a set of edges E V V.
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationExcel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller
Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller Table of Contents Introduction!... 1 Part 1: Entering Data!... 2 1.a: Typing!... 2 1.b: Editing
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationData 8 Final Review #1
Data 8 Final Review #1 Topics we ll cover: Visualizations Arrays and Table Manipulations Programming constructs (functions, for loops, conditional statements) Chance, Simulation, Sampling and Distributions
More information1 Homophily and assortative mixing
1 Homophily and assortative mixing Networks, and particularly social networks, often exhibit a property called homophily or assortative mixing, which simply means that the attributes of vertices correlate
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More informationV2: Measures and Metrics (II)
- Betweenness Centrality V2: Measures and Metrics (II) - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity - Homophily and Assortative Mixing 1 Betweenness
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationRandomized algorithms have several advantages over deterministic ones. We discuss them here:
CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationStatistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400
Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,
More informationThis online supplement includes four parts: (1) an introduction to SAB models; (2) an
Supplemental Materials Till Stress Do Us Part: On the Interplay Between Perceived Stress and Communication Network Dynamics by Y. Kalish et al., 2015, Journal of Applied Psychology http://dx.doi.org/10.1037/apl0000023
More informationCSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3
CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas
More informationRandom Simplicial Complexes
Random Simplicial Complexes Duke University CAT-School 2015 Oxford 8/9/2015 Part I Random Combinatorial Complexes Contents Introduction The Erdős Rényi Random Graph The Random d-complex The Random Clique
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationSection 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS
Section 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS Are hubs adjacent to hubs? How does a node s degree relate to its neighbors degree? Real networks usually show a non-zero degree correlation
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.
More informationPackage Bergm. R topics documented: September 25, Type Package
Type Package Package Bergm September 25, 2018 Title Bayesian Exponential Random Graph Models Version 4.2.0 Date 2018-09-25 Author Alberto Caimo [aut, cre], Lampros Bouranis [aut], Robert Krause [aut] Nial
More informationOverview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week
Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationEstimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy
Estimation of Bilateral Connections in a Network: Copula vs. Maximum Entropy Pallavi Baral and Jose Pedro Fique Department of Economics Indiana University at Bloomington 1st Annual CIRANO Workshop on Networks
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More informationA stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks
A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks Bilal Khan, Kirk Dombrowski, and Mohamed Saad, Journal of Transactions of Society Modeling and Simulation
More informationIntro. Scheme Basics. scm> 5 5. scm>
Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationSnowball sampling for estimating exponential random graph models for large networks I
Snowball sampling for estimating exponential random graph models for large networks I Alex D. Stivala a,, Johan H. Koskinen b,davida.rolls a,pengwang a, Garry L. Robins a a Melbourne School of Psychological
More informationCPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3
CPSC 320: Intermediate Algorithm Design and Analysis Author: Susanne Bradley Tutorial: Week 3 At the time of this week s tutorial, we were approaching the end of our stable matching unit and about to start
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationErdős-Rényi Model for network formation
Network Science: Erdős-Rényi Model for network formation Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Why model? Simpler representation
More informationTo complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.
An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationWorkshop 8: Model selection
Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some
More informationMultivariate Capability Analysis
Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8
More information6.001 Notes: Section 8.1
6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything
More informationCHAPTER 2. Morphometry on rodent brains. A.E.H. Scheenstra J. Dijkstra L. van der Weerd
CHAPTER 2 Morphometry on rodent brains A.E.H. Scheenstra J. Dijkstra L. van der Weerd This chapter was adapted from: Volumetry and other quantitative measurements to assess the rodent brain, In vivo NMR
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2
More informationMore Summer Program t-shirts
ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling
More informationBayesian Estimation for Skew Normal Distributions Using Data Augmentation
The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationGrade 6 Middle School Math Solution Alignment to Oklahoma Academic Standards
6.N.1 Read, write, and represent integers and rational numbers expressed as fractions, decimals, percents, and ratios; write positive integers as products of factors; use these representations in real-world
More informationBasics of Computational Geometry
Basics of Computational Geometry Nadeem Mohsin October 12, 2013 1 Contents This handout covers the basic concepts of computational geometry. Rather than exhaustively covering all the algorithms, it deals
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationDiscrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2
CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye HW 2 Due Tuesday July 5 at 1:59PM 1. (8 points: 3/5) Hit or miss For each of the claims and proofs below, state whether
More information