Infectious Disease Models. Angie Dobbs. A Thesis. Presented to. The University of Guelph. In partial fulfilment of requirements.

Size: px
Start display at page:

Download "Infectious Disease Models. Angie Dobbs. A Thesis. Presented to. The University of Guelph. In partial fulfilment of requirements."

Transcription

1 Issues of Computational Efficiency and Model Approximation for Individual-Level Infectious Disease Models by Angie Dobbs A Thesis Presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Mathematics and Statistics Guelph, Ontario, Canada c Angie Dobbs, December, 2011

2 ABSTRACT Issues of Computational Efficiency and Model Approximation for Individual-Level Infectious Disease Models Angie Dobbs University of Guelph, 2011 Advisor: Professor R. Deardon Individual-level models (ILMs) are models that can account for the spatial-temporal nature of disease data to capture the dynamics of such infectious diseases. Parameter estimation is usually done via Markov chain Monte Carlo (MCMC) methods, but correlation between the model parameters negatively affects MCMC mixing. Introducing a normalization constant to alleviate the correlation results in MCMC convergence over fewer iterations. This occurs, however, at the expense of greater computation time per MCMC iteration. It is important that fitting these models is done as efficiently as possible. An uppertruncated distance kernel is introduced to speed up the computation of the likelihood, but in doing so there is a loss of information and thus goodness-of-fit. Both the normalization constant and upper-truncated distance kernel are evaluated as components in various ILMs via a simulation study. The normalization constant is seen not to prove to be worthwhile as the negative effect of increased computation time is not outweighed by the positive effect of reduced correlation. The upper-truncated distance kernel behaves as expected, reducing computation time yet worsening model fit as the truncation distance decreases.

3 Table of Contents 1 Introduction 1 2 Issues of Computational Efficiency and Model Approximation for Spatial Individual-Level Infectious Disease Models Abstract Introduction Individual-Level Models Study Simulation Study Results Study Methods Simulation Study Results Discussion and Future Work iii

4 List of Tables 2.1 Average correlations between parameters (standard errors in parentheses) averaged over the results from the ten epidemics of Set 1 and Set 2, respectively Average effective sample sizes for α and β (standard errors in parentheses) averaged over the results from the ten epidemics of Set 1 and Set 2, respectively Average computation time in seconds and the run-time ratio for α and β averaged over the results from the ten epidemics of Set 1 and Set 2, respectively (standard errors are in parentheses) Computation time in seconds (standard errors are in parentheses) averaged over the results from the ten epidemics of Set 1 and Set 2, respectively. *These results are for only 4 out of 20 epidemics that could be fitted under these models. MCMC was run for 125,000, rather than 20,000, iterations here iv

5 2.5 Run-time ratios for α and β averaged over the results from the ten epidemics of Set 1 and Set 2, respectively (standard errors are shown in parentheses). *These results are for only 4 out of 20 epidemics that could be fitted under these models. MCMC was run for 125,000, rather than 20,000, iterations here Average DIC and AIC values averaged over the results from the ten epidemics of Set 1 and Set 2, respectively. *These results are for only 4 out of 20 epidemics that could be fitted under these models. MCMC was run for 125,000, rather than 20,000, iterations here v

6 List of Figures 2.1 Posterior predictive plots for model P it (ø) (top left) and P it (N,δ,) with δ = 6 (top right), δ = 4 (bottom left) and δ = 2 (bottom right) vi

7 Chapter 1 Introduction The risks associated with infectious disease spread go beyond human health. Although a major incentive behind the study of modelling infectious diseases is in understanding those which cause human mortality, threats to both vegetative and livestock populations are also of great interest. Diseases which affect plants and animals, such as the tomato spotted wilt virus and foot and mouth disease, limit the populations food resources and can cost millions to eradicate. Potentially foodborne infectious disease, for example, amount to estimated costs of $88.8 million a year in New Zealand (Scott et al., 2000). Advancements in antibiotics, vaccinations and other treatments of disease have been successful in increasing human lifespan. However, a multitude of diseases still manage to persist and evolve, and the importance in understanding the dynamics of such diseases is becoming increasingly more apparent. Greater and more accessible computational power, in addition to this increase in interest in public health policy, are factors which contribute to a growing tide of in-depth research on models of disease transmission (Cook et al., 2007). Deardon et al. (2010) provide an example of statistical models which can be used to 1

8 model infectious disease through space and time at an individual level. These individuallevel models (ILMs) aim to predict the spread of an infection based on an individual s geographic location. Spatio-temporal models such as these are generally applied in a Bayesian framework where parameter estimation is done using Markov chain Monte Carlo (MCMC) methodology. This framework is desirable due to its ability to handle complex models and infer unobserved or missing data. Chis-Ster and Ferguson (2007), Jewell et al. (2009) and Deardon et al. (2010), for example, demonstrate this ability by applying an MCMC-based methodology to the 2001 UK Foot and Mouth epidemic. The parameters of the spatio-temporal ILMs of Deardon et al. (2010) have a naturally occurring, underlying correlation which negatively affects the mixing of the MCMC. A normalization constant can be introduced to these models as a means of reducing this correlation, thus resulting in MCMC convergence in fewer iterations. However, it is generally a computationally intensive task to compute this normalization constant and, thus, the computation time required per MCMC iteration tends to increase. The primary ILM used in Deardon et al. (2010) utilizes a geometric distance kernel involving the Euclidean distance between individuals in the population. A truncated distance-kernel can be used, imposing a maximum distance over which disease can transmit between two individuals. With careful programming, such a truncated distance kernel can be used to reduce computation time. This is desirable since calculation for ILMs can be highly computationally-burdensome, especially the likelihood for large populations. The model fit worsens, however, as the upper-truncation distance decreases, and so caution must be taken in selecting an appropriate truncation point. This thesis is designed to evaluate and compare the inclusion of the normalization constant and upper-truncated distance kernel in the form of two simulation studies. 2

9 Chapter 2 presents the details of these studies. It is written as a manuscript to be submitted to the journal, Spatial and Spatio-temporal Epidemiology. 3

10 Chapter 2 Issues of Computational Efficiency and Model Approximation for Spatial Individual-Level Infectious Disease Models 2.1 Abstract Individual-level models (ILMs) are models that can account for the spatial-temporal nature of disease data to capture the dynamics of such infectious diseases. Parameter estimation is usually done via Markov chain Monte Carlo (MCMC) methods, but correlation between the model parameters negatively affects MCMC mixing. Introducing a normalization constant to alleviate the correlation results in MCMC convergence over fewer iterations. This occurs, however, at the expense of greater computation time per MCMC iteration. 4

11 It is important that fitting these models is done as efficiently as possible. An uppertruncated distance kernel is introduced to speed up the computation of the likelihood, but in doing so there is a loss of information and thus goodness-of-fit. Both the normalization constant and upper-truncated distance kernel are evaluated as components in various ILMs via a simulation study. The normalization constant is seen not to prove to be worthwhile as the negative effect of increased computation time is not outweighed by the positive effect of reduced correlation. The upper-truncated distance kernel behaves as expected, reducing computation time yet worsening model fit as the truncation distance decreases. 2.2 Introduction Modelling the spread of infectious diseases is an ever-growing field as more and more diseases arise posing risks to public health and economic growth. Real time epidemic modelling is also being seen as increasingly useful. It can be important to the understanding of how a disease is spreading as an epidemic unfolds, so as to implement appropriate measures to control or eradicate the disease as quickly as possible. Outbreaks of H1N1 and SARS are just two which have triggered a response in the increase in research on disease transmission models (O Neill, 2010). However, these models are important for understanding not only the dynamics of human diseases, but also for animals and plants. For example, the 2001 Foot and Mouth epidemic in the UK, in which 34 million animals were slaughtered, has been analyzed by many researchers including Keeling et al. (2001), Diggle (2006), Chis-Ster and Ferguson (2007), and Deardon et al. (2010). Similar models can also be used to model invasive species; for example, Cook et al. (2007) implement a fully spatio-temporal model for the spread of 5

12 Heracleum mantegazzianum (Giant Hogweed) in Britain. Spatio-temporal models often provide us with a means of estimating the course a disease will take. However, in order to understand how the disease is spreading with as high a degree of certainty as possible, these models must be sufficiently accurate. Accuracy of these models can be assessed in a variety of ways including posterior predictive plots (Gardner et al., 2011; Gelman et al., 2000). Model comparison can also be carried out in a number of ways, including using Akaike s Information Criterion (Akaike, 1974), the Deviance Information Criterion (Spiegelhalter et al., 2001), or reversible jump MCMC (Green, 1995) to find posterior model probabilities and/or Bayes factors. The individual-level models (ILMs) of Deardon et al. (2010) allow for the modelling of spatio-temporal disease dynamics. They are discrete time models that can utilize information on an individual s geographic location to model the spread of an infectious disease. These models are generally applied in a Bayesian framework and the parameter estimates of the models can be obtained via Markov chain Monte Carlo (MCMC) methods. Among the benefits of using a Bayesian MCMC framework is its ability to handle complex models, as well as its ability to infer unobserved or missing data. It can be difficult to obtain complete data on the infectious status of individuals within a population due to misclassification of disease or under-reporting, as is often the case with common, seasonal diseases such as influenza. Another common issue with disease data is the inaccurate recording of infection times. Often the time of onset of symptoms is recorded as the time of infection, however, some diseases may show symptoms well after an individual has actually contracted the disease (Cauchemez and Ferguson, 2011). The primary ILM used in Deardon et al. (2010) utilizes a geometric distance kernel 6

13 involving the Euclidean distance between all individuals in a population. Due to a naturally occurring, underlying correlation between the parameters of such ILMs, MCMC mixing can be negatively affected. A normalization constant can be introduced to such models to reduce this correlation, resulting in MCMC convergence in fewer iterations. The calculation of the normalization constant, however, is generally computationally intensive and results in an increase in required computation time per MCMC iteration. A second computational issue is that the likelihood itself can take a very long time to calculate, especially for large populations. One way of helping to deal with this problem is to truncate the distance kernel that is, impose a maximum distance over which disease can transmit between two individuals under the model. The lower this maximum distance is set, the greater the opportunity for time saving. However, the lower this maximum distance is set, the more likely it is that long distance infections that occurred in reality will be excluded as a possibility from the fitted model, thus compromising model fit. The purpose of this paper is to evaluate and compare the effects of the normalization constant and the effects of an upper-truncated distance kernel. The effect of a sparks term, a parameter which describes sources of infection unexplained by the model, is also considered. Two studies are conducted for these evaluations: Study 1, described in Section 2.4, examines the effect of including the normalization constant and the tradeoff between improved mixing and increased computation time; and Study 2, described in Section 2.5, evaluates the use of the upper-truncated distance kernel and the trade-off between improved computation time and reduced goodness-of-fit. A simulation study is carried out in which all models are fit to the simulated data via MCMC methods. Computation time is recorded to determine efficiency and goodness-of-fit is determined by study-specific methods. 7

14 2.3 Individual-Level Models Eight different models are used in this paper that are all examples of the individual-level model (ILM) of Deardon et al. (2010). The first four are emphasized in Study 1, and the remaining four are included for discussion in Study 2. Here we use an SIR framework which assumes discrete time and that each individual is within one of three states at any given time t. If individual i does not currently have the disease but is able to contract it they are susceptible, i S; once they have contracted the disease they are infectious, i I; and once the individual is no longer infectious they are considered to be removed, i R. Removal could represent recovery accompanied by acquired immunity, isolation or death. Two transitions are possible and must occur in the specific order: S I and I R. S(t), I(t) and R(t) are the sets of susceptible, infectious and removed individuals of the population at time t respectively. Deardon et al. (2010) give the probability of a susceptible individual i being infected at discrete time t in the following general ILM: P it = 1 exp η(i) τ( j)κ(i, j) + (i, t), (2.1) j I(t) where, η(i) is a function which describes the potential risk factors related to the risk of susceptible individual i contracting the disease; τ( j) is a function which describes the potential risk factors related to the risk of infectious individual i transmitting the disease to others; κ(i, j) is the infection kernel, and (i, t) is a sparks function which represents the chance of being randomly infected by sources not explained by η(i), τ(t) or κ(i, j). This probability determines the time that the S I transition occurs. In this paper the models considered assume that the main drives of the epidemic are predominantly spatio-temporal, with η(i)τ( j) = α, i, j. α is known as as infectivity 8

15 constant. The parameter α can be thought of as representing the average strength of the epidemic. An increase in α would result in an increase in the probability of infection, P it, which would result in a larger population of infectious individuals and thus a stronger epidemic. Here, (i, t) is set to take one of two values, 0 or, where is itself referred to as the sparks term. Sparks terms describe random behaviour. Here, a model which includes a distance kernel but no sparks term allows the risk of disease transmission to be based solely on the proximity between a susceptible individual and individuals in the infectious population. The sparks term can therefore be thought of as introducing random noise to the system, or, if used in conjunction with a truncated kernel, allowing for infections over longer distance than that allowed by the kernel (see Section 2.5). The infection kernel, κ(i, j), is a function describing the risk of transmission specifically from individual j to individual i. Here, the infection kernel takes one of two forms, as described in Sections 2.4 and 2.5.1, or variants thereof. Both forms are primarily geometric distance kernels based on the Euclidean distance between two individuals i and j. However, one form allows for disease spread to occur across the entire population, and the other, termed the upper-truncated distance kernel, only allows infection between those individuals within a specified distance of one another. For simplicity, the second transition I R is assumed to occur after a constant time lag, γ I, known as the infectious period. Of course, it would be easy, and fairly standard, to extend the model in order to estimate the infectious period. 9

16 Likelihood The likelihood for the general form of the ILM in (2.1) is given as follows: t max π(d θ) = t=1 i S (t+1) 1 P it P it, (2.2) i I(t+1)\I(t) where D is the observed epidemic data, θ is the vector of parameters to be estimated, S (t + 1) is the set of susceptible individuals at time t + 1, and I(t + 1)\I(t) is the set of newly infected individuals at time t Study 1 The purpose of this study is to examine the effect of including, or not including, a normalization constant for the spatial infection kernel in a simple two-parameter spatial ILM. The expected effect of including such a normalization constant is to increase computation time, but reduce the correlation between two parameters, α and β, and thus improve MCMC mixing. The aim here is to determine whether the negative effect of increased computation time is outweighed by the positive effect of improved mixing. Models for Study 1 Each model in this paper is derived from the general ILM given in the previous section with η(i)τ( j) = α, i, j. We allow m to represent the total number of individuals in the population. Additionally we set κ(i, j) = κ 1 (d ij ) = d β ij, a geometric distance kernel which characterizes the risk of infection over distance, where d ij is the Euclidean distance between individuals i and j. β is the geometric rate of decay. It is assumed here 10

17 that (i, t) = 0. This gives our primary model, P it (ø) : P (ø) it = 1 exp α κ 1 (d ij ), α, β > 0. j I(t) The second model has κ(i, j) = κ 1 (d ij ) N (1) where N (1) is a normalization constant. Specifically: where, P (N) it = 1 exp α κ1 (d ij ) N (1), α, β > 0 j I(t) m N (1) = ji m κ 1 (d ij ) i=1 1 and (i, t) = 0. A third model, P () it, has κ(i, j) = κ 1 (d ij ) and (i, t) =. A fourth model, P (N,) it, has κ(i, j) = κ 1 (d ij ) N (1) and (i, t) = to give: P () it = 1 exp α κ1 (d ij ) +, α, β, > 0 j I(t) and, P (N,) it = 1 exp α κ1 (d ij ) N (1) +, α, β, > 0 j I(t) respectively. The likelihood for each of the four models is obtained by substituting P it (ø), P it (N), P it () or P it (N,) for P it into (2.2). 11

18 2.4.1 Simulation Study The purpose of this simulation study is to evaluate and compare each model when fitted to data generated from P it (ø) (the true model). Observed Data Two sets of ten epidemics are generated from the primary model, P (ø) it. The first ten epidemics form Set 1, which spread through a population of 225 individuals who are spaced equidistantly apart on a grid for each epidemic. That is, the 225 individuals collectively have coordinates (x, y) for every combination of x, y = 1,...,15. We consider the entire population to be in the set of susceptible individuals S(t) at time t=0 and one individual becomes infectious at time t=1. The individual at the approximately central location situated at (7,8) on the grid is chosen to be infected first in each of the 10 epidemics. Each epidemic is run for ten time steps t=1,...,10. The model parameters are set to be α=1.0, β=3.0, and the infectious period γ I is set to 1 time unit. The second set of ten epidemics, Set 2, are also generated from P (ø) it. They differ from Set 1 in that four individuals are infected at time t = 1 instead of one. The population is increased to a grid of 400 individuals with coordinates (x, y) for every combination of x, y = 1,...,20. We again consider the entire population to be in S(t) at time t=0, and the four aforementioned individuals become infectious at time t=1. The four individuals which begin the epidemics are chosen arbitrarily and are situated near the top two corners, the centre, and the bottom right corner of the grid with coordinates (3,17), (18,19), (10,10) and (19,2) respectively. 12

19 Fitting the Models Each of the four models are fit to the twenty observed epidemics using a random walk Metropolis-Hastings Markov chain Monte Carlo (MH-MCMC) algorithm to obtain samples from the posterior distribution. The posterior distribution is given (up to proportionality) by multiplying the likelihood by the marginal prior densities for each parameter. Here, vague half-normal priors with mode 0 and variance 10 6 are used for each parameter. The MH-MCMC algorithm is as follows: 1. Set i = 0 and choose initial values α (0) and β (0). 2. Generate random values Z α and Z β from uniform proposal densities, U[-a,a] and U[-b,b], respectively, where a,b R Assign candidate values based on the current state of both parameters as follows: α = α (i) + Z α β = β (i) + Z β. 4. Calculate the acceptance probability, ψ, as: π(d α,β ) ψ = min 1,. π(d α (i),β (i) ) 5. Sample u from a Uniform(0,1) distribution and set α (i+1) = α and β (i+1) = β if u <ψ, otherwise set α (i+1) = α (i) and β (i+1) = β (i). 6. Set i = i + 1 and then repeat steps 2 to 6 for the required number of iterations. For all models, a U[-0.1, 0.1] proposal is used for updating β. For updating α, all models which do not incorporate a normalization constant use a proposal of U[-0.1, 13

20 0.1]. For the models which do, a proposal variance of U[-100, 100] is used for the epidemics of Set 1 and U[-200, 200] is used for Set 2. For models which contain a sparks term, a U[-0.01, 0.01] proposal is used for updating. These proposals were arrived at through tuning. A total of 20,000 MCMC iterations are run for each model with a burn-in period of 1,000 iterations. Note that parameters are updated together in one block. MCMC convergence is checked visually. Model Evaluation The Effective Sample Size (ESS) is obtained using the coda package in R from each converged MCMC chain. The ESS is the estimate of the number of independent samples our dependent MCMC sample is equivalent to, and a large ESS is desirable. It is used to determine if the normalization constant is effective in improving the mixing of the MCMC chains. The correlations between all parameters are also computed to determine if the normalization constant reduces the correlation between α and β. Computation time is recorded to determine the effect of the normalization constant on efficiency. Using the ESS and computation time, the run-time ratio (RTR) is calculated for both parameters. RTR is a measure of the average time taken to produce one pseudoindependent sample (McKinley et al., 2009) for both α and β. It is derived by dividing the computation time by the ESS for each parameter Results Table 2.1 shows the estimated posterior correlation of α and β averaged over the 10 replicate epidemics for the four models for both sets of epidemics. As expected, the normalization constant reduces this correlation. Comparing P (ø) it to P (N) it as well as 14

21 P it () to P it (N,), we see the correlation approximately halved for both Sets 1 and 2. Table 2.1 also shows that the correlation between β and is similar under P it () and P it (N,), but the correlation between α and is reduced. This result is observed in all 20 epidemics separately, as well as on average (results not shown). Set 1 Model Corr(α,β) Corr(α,) Corr(,β) (ø) P it (0.0075) NA NA (N) P it (0.0096) NA NA () P it (0.0065) ( ) (0.0204) (N,) P it (0.0164) (0.0073) (0.0233) Set 2 Model Corr(α,β) Corr(α,) Corr(,β) (ø) P it (0.0062) NA NA (N) P it (0.0093) NA NA () P it (0.0068) (0.0129) (0.0217) (N,) P it (0.0182) (0.0091) (0.0151) Table 2.1: Average correlations between parameters (standard errors in parentheses) averaged over the results from the ten epidemics of Set 1 and Set 2, respectively. Table 2.2 displays the average ESS for α and β for both sets of epidemics. Including a normalization constant appears to improve mixing for both parameters. Comparing P (ø) it to P (N) it, the ESS of both α and β are seen to increase showing that mixing is improved under P (N) it. Similarly, the ESS of both parameters increases from P () it to P (N,) it showing that mixing is improved in P (N,) it. The sparks term has a negative effect on mixing. This is demonstrated by the decrease in ESS for both α and β from P (ø) it to P () it, as well as from P (N) it to P (N,) it. Table 2.3 summarizes the computation times as well as the run-time ratio (RTR) for both α and β. Calculating the normalization constant requires a summation of a geometric transformation of the Euclidean distance between each individual and every other individual in the population. As expected, this calculation causes a great increase 15

22 Set 1 Set 2 Model ESS (α) ESS (β) ESS (α) ESS (β) (ø) P it (12.174) (4.723) (13.057) (10.583) (N) P it (35.978) (7.981) (35.804) (20.581) () P it (9.767) (9.492) (21.671) (14.603) (N,) P it (25.506) (12.068) (49.135) (15.748) Table 2.2: Average effective sample sizes for α and β (standard errors in parentheses) averaged over the results from the ten epidemics of Set 1 and Set 2, respectively. Set 1 Model Time (s) RTR (α) RTR (β) (ø) P it (1.15) (0.010) (0.004) (N) P it (0.93) (0.031) (0.011) () P it (1.45) (0.022) (0.018) (N,) P it (4.14) (0.038) (0.034) Set 2 Model Time (s) RTR (α) RTR (β) (ø) P it (2.61) (0.010) (0.010) (N) P it (2.75) (0.023) (0.036) () P it (3.88) (0.033) (0.040) (N,) P it (10.94) (0.063) (0.044) Table 2.3: Average computation time in seconds and the run-time ratio for α and β averaged over the results from the ten epidemics of Set 1 and Set 2, respectively (standard errors are in parentheses). in computation time as the normalization constant must be re-calculated at each MCMC iteration for each new candidate β value. For Set 1, the ratio of computation time for P (ø) it to P (N) it is 1:2.31, and for Set 2 it is 1:2.50. Similarly, the ratio of computation time for P () it to P (N,) it is 1:2.27 for Set 1 and 1:2.48 for Set 2. This implies that the negative impact the normalization constant has on computation time worsens even further with population size. The RTR indicates that for both parameters the reduction in correlation is not worth the additional computation time required to calculate the normalization constant. The 16

23 ratios of RTR for P (ø) it to P (N) it for Set 1 are 1:1.380 for α and 1:1.787 for β. Also from Set 1, the ratios of RTR for P () it to P (N,) it are 1:1.406 and 1:2.032 for α and β, respectively. This shows that it takes more time to obtain one pseudo-independent sample for both parameters in models which include a normalization constant. The same is shown for Set 2, where the ratios of RTR for P (ø) it to P (N) it are 1:1.319 for α and 1:1.939 for β, and the ratios of RTR for P () it to P (N,) it for α and β are 1:1.422 and 1:1.917, respectively. Again, there appears to be an increasing negative effect with population size, however this is only seen for β when comparing P (ø) it to P (N) it and for α when comparing P () it to P (N,) it. The computation times in Table 2.3 indicate that including a sparks term increases efficiency, which is contradictory to what might be expected since the inclusion of a third parameter,, might be expected to make MCMC mixing worse. However, further analysis shows that this is an artifact of the way the prior is coded and the mixing properties of the MCMC chain. To save time, if a proposed value of any parameter, θ (.) < 0 at any MCMC iteration, then the proposed value is rejected before the Metropolis Hastings acceptance probability is calculated. It turns out that the tuned proposals used for models containing a sparks term result in a relatively high rejection rate due to proposed values of < 0. Thus, although the inclusion of the parameter leads to worse mixing, since the likelihood is not calculated for < 0 there is a reduction in computation time. This would not stand if the marginal posterior of had most of its mass further from zero, of course. Analyzing the RTR values for both α and β in Table 2.3 shows that the effect of including a sparks term is an improvement for larger population sizes. The ratios of RTR for P (ø) it to P () it for α and β are 1:1.323 and 1:1.320 respectively for Set 1, and 1:1.208 and 1:1.317 for Set 2. Similarly, the ratios of RTR for P (N) it to P (N,) it for α and 17

24 β are 1:1.351 and 1:1.501 respectively for Set 1, and are 1:1.302 and 1:1.302 for Set 2. Therefore, the decrease in computation time reduces the relative negative effect that the sparks term has on mixing for large populations. 2.5 Study 2 The purpose of this study is to examine the effect of using an upper-truncated distance kernel that is, a distance kernel for which κ(d ij ) = 0 for all d ij >δwhere δ is some given constant. It is possible to utilize such a kernel to speed up the computation of the likelihood, with this reduction in computation time becoming greater for decreasing δ. Of course, although the computation time is reduced as δ decreases, it is likely that model fit will also deteriorate if the true underlying kernel is not truncated. The purpose of this study is therefore to examine this trade-off between reduced computation time and reduced goodness-of-fit Methods Here we introduce the remaining four models which are considered in addition to those described in Section 2.4 to form the full set of models for Study 2. Models for Study 2 Setting κ(i, j) = κ 2 (d ij ) yields the model: P (δ) it = 1 exp α κ 2 (d ij ), α, β > 0, where j I(t) 18

25 d β ij 0 < d ij δ κ 2 (d ij ) = 0 otherwise Here, (i, t) is again assumed to be 0 and δ is the upper-truncation distance. We use the notation P it (δ k ) to indicate that the distance δ = k in that particular instance of P it (δ). Note that P it (δ) P it (ø) as δ. P it (N,δ) also contains κ 2 (d ij ) and also incorporates its appropriate normalization constant, N (2), to give: P (N,δ) it = 1 exp α κ2 (d ij ) N (2), α, β > 0, j I(t) where and, once again, (i, t) = 0. m N (2) = ji m κ 2 (d ij ) i=1 1 P (δ,) it also contains the kernel κ(i, j) = κ 2 (d ij ), as well as the sparks term (i, t) =. The final model, P (N,δ,) it, has κ(i, j) = κ 2 (d ij ) N (2) and uses the non-zero (i, t): P (δ,) it = 1 exp α κ2 (d ij ) +, α, β, > 0 j I(t) and, P (N,δ,) it = 1 exp α κ2 (d ij ) N (2) +, α, β, > 0. j I(t) For models which utilize an upper-truncated distance kernel but do not contain a sparks term, it is potentially the case that the likelihood will equal zero and thus, the model cannot be fitted to the observed data. This is because an infected individual s probability of being infected under the fitted model might be zero (P it = 0) if there are 19

26 no infected individuals within a distance δ. As δ decreases, this possibility becomes an increasingly likely eventuality. Similarly, there is a high chance of avoiding this issue in practice for large δ. The inclusion of the sparks term,, allows for transmission of the disease to be possible beyond the upper-truncation distance, which makes the probability of infection for each individual non-zero, thus resulting in a non-zero likelihood. A sparks term is therefore often a practically necessary component of the model (depending on the specific data set being analyzed), as well as an intuitively and theoretically desirable one. Here, the likelihood is computed in the same way as is described in Section 2.3, but P it is now replaced in (2.2) by whichever of the eight appropriate models P (ø) (N,δ,) it,...,p it is being fit to the observed data. The Upper-Truncated Distance Kernel The upper-truncated distance kernel is introduced primarily to reduce computation time. Its ability to do so is as a result of storing information on the distances between individuals so as to reduce the amount of computation required when computing the likelihood. This is done in the code before the MCMC analysis is commenced. This works as follows. First, a series of m vectors, A i, of length n i, i=1,...,m, where n i < m (and m is the total number of individuals in the population), are created. The Euclidean distance between an individual i and another individual j, d i, j, is calculated. If d i, j <δ, j is recorded as an element in A i, where A i is the vector corresponding to individual i. This indicates that individual j is within the specified distance δ of individual i, and that transmission of disease from individual j to i is possible. The distance between i and 20

27 the next individual j + 1 is then computed. If d i, j+1 <δthe value j + 1 is appended to A i, and if not the next individual, j + 2, is considered. This is repeated until every individual that is within δ of individual i is identified in A i = [ j : d i, j <δ]. This is then done for each i = 1,...,m. These A i can be used in the calculation of P it. As δ decreases, so does n i, the length of the A i vectors. This results in a smaller set of infected individuals, I(t), at each time t who are considered as potential transmitters of the disease to individual i, allowing for a quicker calculation of P it and therefore the likelihood. Note that if a naive approach is taken and the search for all j: d i, j <δfor each i is done in the likelihood calculation itself, the computational advantage is, of course, lost Simulation Study The purpose of this simulation study is again to evaluate and compare each of the models from Study 2. Fitting the models Each model is fit to the same twenty epidemics of Study 1, from Set 1 and Set 2. The models which contain κ 2 (d ij ) are fit three times for each of δ = 2, 4 and 6. All models of Study 2 are fit in the same way as is described for Study 1, with the exception that (N,δ 125,000 MCMC iterations are required for model P 2,) it. The small δ value makes it seemingly difficult to find proposal variances to enable the chain to reach convergence in fewer iterations for this model. Additionally, the issue concerning the use of an upper-truncated distance kernel without incorporating a sparks term, as described in Section 2.5.1, poses a problem when fitting two of the models to the simulated data. Models P (δ) it and P (N,δ) it cannot be 21

28 fitted to any of the simulated epidemics for δ<6. For δ = 6, they are able to be fitted to only 3 of the 10 epidemics from Set 1, and only 1 epidemic from Set 2. As a result of this, models P it (δ) and P it (N,δ) with δ = 2 and 4 are excluded from further analysis. Model Evaluation Two methods of evaluation are implemented for Study 2 in addition to the computation time and run-time-ratio (RTR) for α and β that are used in Study 1. First, after fitting the models to the observed data, the posterior predictive distribution (Gelman et al., 2000) of the epidemic curves is simulated to assess goodness-of-fit. Observations from this distribution are obtained by generating an epidemic with a randomly sampled θ from the MCMC-estimated posterior and recording the number of newly infected individuals at each time point. Here, θ = (α, β) or θ = (α, β, ), depending on the model being fitted. This is repeated 100 times for newly sampled θ, to give an estimate of the posterior predictive distribution of the epidemic curves. Second, the Deviance Information Criterion (DIC) (Spiegelhalter et al., 2001) and Akaike Information Criterion (AIC) (Akaike, 1974) are both computed as a means of comparing the model fit. The AIC is calculated using the number of model parameters (k) and the maximized likelihood function value for each model which is denoted below as L: AIC = 2k 2ln(L). In order to calculate the DIC, the deviance, D(θ), the expectation, D, and the effective number of parameters, p D, must be defined: D(θ) = 2log(π(D θ)) + C, 22

29 D = E[D(θ)] and p D = D D( θ), where π(d θ) is the likelihood function for given θ, as before, D is the observed epidemic data and θ is defined above in Section 2.5.2; C is a constant which cancels out in the calculations done to compare each model, and θ denotes the expectation of θ. From this information we can calculate the DIC: DIC = p D + D. Favoured models will be those which have a relatively low computation time, precise posterior predictive epidemic curve densities in line with the data, and low DIC and AIC values Results Table 2.4 summarizes the computation times for fitting the various models to the epidemics of both Set 1 and Set 2. The upper-truncated distance kernel does appear to (δ have a positive effect on computation time, as expected. Fitting P 6,) it reduces the average time to seconds from the seconds it takes to fit P () it to the epidemics (δ of Set 1. The ratio of computation time for P 6,) it to P () (N,δ it is 1:2.00. Fitting P 6,) it to these epidemics reduces the average time to seconds from the seconds it takes to fit P (N,) (N,δ it. The computation time ratio between P 6,) it and P (N,) it is (δ 1:1.17. For Set 2 these ratios increase. The computation time ratio for P 6,) () it to P it (N,δ becomes 1:3.12, and for P 6,) it and P (N,) it it becomes 1:1.27. This not only shows that the upper-truncated distance kernel improves computation time, but also that it has 23

30 a stronger effect with larger population sizes. The asterisks in this, and other, tables indicate those models that are able to be fit to only a subset of the epidemics, as described in Section Of the few occurrences where these models can be fit to the data, the upper-truncated distance kernel did reduce the computation time on average. With model P it (δ,), computation time decreases with increasing δ. This is not the case, however for P it (N,δ,). The computation time varies very little when fitting model P it (N,δ 4,) compared to P it (N,δ 6,) across all epidemics for both sets. Six of the 10 epidemics from both Set 1 and Set 2 follow the trend seen with P it (δ,) and decreases with δ. The other 4 epidemics from each set show the reverse. These changes in computation time, regardless of the direction, are small. Model P it (N,δ 2,) has a much higher computation time due to the fact that 125,000 MCMC iterations are run instead of the 20,000 used for the other models. Model Time (s) Set 1 Set 2 (δ P 6 ) it (0.71)* (0)* (ø) P it (1.15) (2.61) (N,δ P 6 ) it (0.70)* (0)* (N) P it (0.93) (2.75) (δ P 2,) it 8.55 (0.13) (0.13) (δ P 4,) it (0.32) (0.30) (δ P 6,) it (1.24) (1.92) () P it (1.45) (3.88) (N,δ P 2,) it (0.77) (0.86) (N,δ P 4,) it (2.19) (0.80) (N,δ P 6,) it (5.32) (13.68) (N,) P it (4.14) (10.94) Table 2.4: Computation time in seconds (standard errors are in parentheses) averaged over the results from the ten epidemics of Set 1 and Set 2, respectively. *These results are for only 4 out of 20 epidemics that could be fitted under these models. MCMC was run for 125,000, rather than 20,000, iterations here. 24

31 Table 2.5 summarizes the run-time ratio (RTR) for parameters α and β for each model averaged over the ten epidemics of each of Set 1 and Set 2. Including the normalization constant worsens the RTR for both parameters, a result which is consistent with Study 1. The upper-truncated distance kernel, however, appears to improve the RTR (α) and RTR (β) as can be seen by, for example, comparing P (N,) it to P (N,δ 6,) it. With the exception of P (δ,) it, the RTR worsens with decreasing δ. This might be expected as a result of the poorer model fit and, thus, increased posterior variance that might be obtained with smaller δ values. Overall, model P (δ,) it performs best in terms of RTR (α) and RTR (β) with values lower than the standard model for all values of δ. Further, this model improves with decreasing δ, which might not be expected since the ESS of both parameters decreases with decreasing δ. Further analysis shows that the reduction in computation time overpowers this reduction in ESS, resulting in improved RTR (α) and RTR (β). The inclusion of the upper-truncated distance kernel appears to improve the RTR for the models which can be fit only to a subset of the epidemics. However, since they are averaged over a non-random sample of epidemics they cannot be compared fairly with the other models in terms of these results. It is also, of course, undesirable to be fitting a model which can only be fit to a subset of possible epidemic realizations. Table 2.6 contains the average DIC and AIC values for each model for both sets of epidemics. A low value of the AIC and DIC is an indication of good fit. The values marked by asterisks again indicate those models which are able to be fitted to only a subset of the simulated epidemics, and although models P it (δ 6 ) and P it (N,δ 6 ) have very small AIC and DIC values they cannot be compared fairly with the other models for reasons described above. Of the remaining models, P it (ø) is ranked first in terms of both AIC and DIC under 25

32 Model Set 1 Set 2 RTR (α) RTR (β) RTR (α) RTR (β) (δ P 6 ) it (0.017)* (0.005)* (0)* (0)* (ø) P it (0.010) (0.004) (0.010) (0.010) (N,δ P 6 ) it (0.161)* (0.055)* (0)* (0)* (N) P it (0.031) (0.011) (0.023) (0.036) (δ P 2,) it (0.004) (0.010) (0.004) (0.004) (δ P 4,) it (0.009) (0.008) (0.009) (0.009) (δ P 6,) it (0.016) (0.015) (0.007) (0.008) () P it (0.022) (0.018) (0.033) (0.040) (N,δ P 2,) it (8.766) (1.919) (44.610) (12.296) (N,δ P 4,) it (0.035) (0.043) (0.112) (0.051) (N,δ P 6,) it (0.034) (0.038) (0.029) (0.027) (N,) P it (0.038) (0.034) (0.063) (0.044) Table 2.5: Run-time ratios for α and β averaged over the results from the ten epidemics of Set 1 and Set 2, respectively (standard errors are shown in parentheses). *These results are for only 4 out of 20 epidemics that could be fitted under these models. MCMC was run for 125,000, rather than 20,000, iterations here. the upper-truncated distance kernel models. This is to be expected since it is the true model from which the epidemics are generated. The DIC and AIC increase with decreasing δ. This is also expected since the fitted model diverges from the true model as the truncation distance becomes smaller. Figure 2.1 shows four posterior predictive densities using an arbitrarily selected (and typical) epidemic from Set 1 for P (ø) it and P (N,δ,) it with δ = 6, 4 and 2. The posterior predictive distribution moves further from the true data as δ decreases, which is reflective of the deviation from the true kernel as δ decreases. This result is consistent among all epidemics from both Set 1 and Set 2 (results not shown). Both the AIC and DIC rank models P (ø) it, P (N) it and P (N,) it as the top three best fitting models. The RTR also suggests that they are reasonable models. (N,δ In fourth place, the DIC results put P 6,) it. The AIC results put P () it in fourth 26

33 Model Set 1 Set 2 Average DIC Average AIC Average DIC Average AIC (δ P 6 ) it * * * * (ø) P it (N,δ P 6 ) it * * * * (N) P it (δ P 2,) it (δ P 4,) it (δ P 6,) it () P it (N,δ P 2,) it (N,δ P 4,) it (N,δ P 6,) it (N,) P it Table 2.6: Average DIC and AIC values averaged over the results from the ten epidemics of Set 1 and Set 2, respectively. *These results are for only 4 out of 20 epidemics that could be fitted under these models. MCMC was run for 125,000, rather than 20,000, iterations here. place in Set 1, but P (N,) it in Set 2. However the former model has a much lower RTR for both parameters than the latter suggesting it is to be preferred. The RTR (α) is smaller (N,δ for P 6,) it than P () (N,δ it, but only slightly. The RTR (β) for P 6,) it, however, is nearly twice that of the model P () it. This may suggest that the model P () it is to be favoured (N,δ over P 6,) it. (δ,) (N,δ,) Models P it and P it have nearly equivalent AIC values for each different upper-truncated distance radius δ. However, in analyzing the RTR (α) and RTR (β) results for both models at each value of δ, P (δ,) it consistently out performs P (N,δ,) it. The DIC still ranks P (δ,) it as the poorest fitting model (specifically for δ = 2) despite the fact that it has the most desirable RTR values for both α and β. 27

34 Number of New Cases Fitted Model True Model Number of New Cases Fitted Model True Model Time Time Number of New Cases Fitted Model True Model Number of New Cases Fitted Model True Model Time Time Figure 2.1: Posterior predictive plots for model P it (ø) (top left) and P it (N,δ,) with δ = 6 (top right), δ = 4 (bottom left) and δ = 2 (bottom right). 2.6 Discussion and Future Work The goal of this paper was to evaluate the inclusion of a normalization constant, and/or the use of an upper-truncated distance kernel, in a spatial individual-level infectious disease model. Each model was evaluated in terms of MCMC efficiency and goodness-offit via posterior predictive plots, run-time ratios, AIC and DIC, when fitted to simulated data. The inclusion of the normalization constant, although resulting in improved mixing and reduced correlation between parameters as expected, proves not to be worthwhile as these improvements do not outweigh the increase in computation time. Furthermore, this negative effect worsens with increasing population size suggesting that the inclusion 28

35 of the normalization constant in models for large populations is misguided. Here, block updates that assume independence between parameters are used in a Metropolis-Hastings MCMC algorithm. Adjusting these block updates to account for correlation between parameters may be a better method to be used to improve MCMC efficiency. Of course, although such an approach would forgo the need to include a normalization constant there would be a downside; the need to run at least two MCMC chains sequentially in order to estimate the correlation structure before using the adjusted block updates. To assist in reducing computation time, the upper-truncated distance kernel was considered. We show that the use of the upper-truncated distance kernel, κ 2 (d ij ), leads to a model which behaves as expected with a decrease in required computation time and run-time ratios for both parameters. However, this comes at the cost of increasingly poorer model fit as δ is reduced. Overall, the results suggest that it is not computationally worthwhile to include the normalization constant, but the truncation of the distance kernel is a good method for reducing computation time. However, care must be taken in selecting sensible truncation distances. Additionally, a sparks term is required for models which incorporate the upper-truncated distance kernel in order that the models can be fit to all data that the true model can generate. There is certainly room for more work in this area. All models were fit to simulated data where the infectious and latent periods were known. It would of course be more realistic to estimate the latent and/or infectious period to account for such uncertainty. Another approach would be to implement data augmented MCMC to incorporate uncertainty about infection times and infectious times for each individual, as is done by, for example, Jewell et al. (2009). 29

36 For both simulation studies, the population was set as a square grid where individuals remained stationary, which may be realistic for certain plant populations or some controlled animal experiments. However, considering different population structures, for example clusters of individuals where the cluster size could aid in giving an indication of sensible values of δ, would be well-reasoned. Exploring the use of methods which partition the data and likelihood through linearization of the kernel, as is done in Deardon et al. (2010), have also proven useful in decreasing computation time. However, the nature of these methods means they are unable to be used in a data-augmented setting, whereas the upper-truncated distance kernel can be. Approximate Bayesian methods, for example estimating the likelihood at each MCMC iteration via simulation, are increasingly being considered in infectious disease modelling (McKinley et al., 2009; Toni et al., 2009). Such methods can lead to a drastic reduction in computation time if data can be simulated from the model much more quickly than the likelihood can be calculated. As far as the authors are aware, these methods have not yet been used for individual-based models, making them a potential area for future work. Of course, the downside to such an approach is that final posterior results are with respect to an approximated posterior rather than the true posterior. Further, the degree of accuracy of this approximation is often difficult, if not impossible, to ascertain in practice. Finally, it would be interesting to apply the models to real data, comparing results and conclusions derived under a full model, alongside upper-truncated distance kernel varieties. 30

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

An Introduction to Markov Chain Monte Carlo

An Introduction to Markov Chain Monte Carlo An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Phil Gregory Physics and Astronomy Univ. of British Columbia Introduction Martin Weinberg reported

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Approximate Bayesian Computation using Auxiliary Models

Approximate Bayesian Computation using Auxiliary Models Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi, Malcolm Faddy Queensland University of Technology Brisbane MCQMC February 2012 Tony Pettitt () ABC using

More information

10.4 Linear interpolation method Newton s method

10.4 Linear interpolation method Newton s method 10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by

More information

Modeling of Complex Social. MATH 800 Fall 2011

Modeling of Complex Social. MATH 800 Fall 2011 Modeling of Complex Social Systems MATH 800 Fall 2011 Complex SocialSystems A systemis a set of elements and relationships A complex system is a system whose behavior cannot be easily or intuitively predicted

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Case Study IV: Bayesian clustering of Alzheimer patients

Case Study IV: Bayesian clustering of Alzheimer patients Case Study IV: Bayesian clustering of Alzheimer patients Mike Wiper and Conchi Ausín Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School 2nd - 6th

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Nested Sampling: Introduction and Implementation

Nested Sampling: Introduction and Implementation UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ

More information

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Approximate Bayesian Computation. Alireza Shafaei - April 2016 Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc. Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization

More information

Using the DATAMINE Program

Using the DATAMINE Program 6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection

More information

[spa-temp.inf] Spatial-temporal information

[spa-temp.inf] Spatial-temporal information [spa-temp.inf] Spatial-temporal information VI Table of Contents for Spatial-temporal information I. Spatial-temporal information........................................... VI - 1 A. Cohort-survival method.........................................

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

A spatio-temporal model for extreme precipitation simulated by a climate model.

A spatio-temporal model for extreme precipitation simulated by a climate model. A spatio-temporal model for extreme precipitation simulated by a climate model. Jonathan Jalbert Joint work with Anne-Catherine Favre, Claude Bélisle and Jean-François Angers STATMOS Workshop: Climate

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Galin L. Jones 1 School of Statistics University of Minnesota March 2015 1 Joint with Martin Bezener and John Hughes Experiment

More information

Hierarchical Bayesian Modeling with Ensemble MCMC. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014

Hierarchical Bayesian Modeling with Ensemble MCMC. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014 Hierarchical Bayesian Modeling with Ensemble MCMC Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014 Simple Markov Chain Monte Carlo Initialise chain with θ 0 (initial

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

GiRaF: a toolbox for Gibbs Random Fields analysis

GiRaF: a toolbox for Gibbs Random Fields analysis GiRaF: a toolbox for Gibbs Random Fields analysis Julien Stoehr *1, Pierre Pudlo 2, and Nial Friel 1 1 University College Dublin 2 Aix-Marseille Université February 24, 2016 Abstract GiRaF package offers

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

Epidemic spreading on networks

Epidemic spreading on networks Epidemic spreading on networks Due date: Sunday October 25th, 2015, at 23:59. Always show all the steps which you made to arrive at your solution. Make sure you answer all parts of each question. Always

More information

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM EXERCISES Prepared by Natashia Boland 1 and Irina Dumitrescu 2 1 Applications and Modelling 1.1

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen 1 Background The primary goal of most phylogenetic analyses in BEAST is to infer the posterior distribution of trees and associated model

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

MRF-based Algorithms for Segmentation of SAR Images

MRF-based Algorithms for Segmentation of SAR Images This paper originally appeared in the Proceedings of the 998 International Conference on Image Processing, v. 3, pp. 770-774, IEEE, Chicago, (998) MRF-based Algorithms for Segmentation of SAR Images Robert

More information

Project Paper Introduction

Project Paper Introduction Project Paper Introduction Tracey Marsh Group Health Research Institute University of Washington, Department of Biostatistics Paper Estimating and Projecting Trends in HIV/AIDS Generalized Epidemics Using

More information

Optimization Methods III. The MCMC. Exercises.

Optimization Methods III. The MCMC. Exercises. Aula 8. Optimization Methods III. Exercises. 0 Optimization Methods III. The MCMC. Exercises. Anatoli Iambartsev IME-USP Aula 8. Optimization Methods III. Exercises. 1 [RC] A generic Markov chain Monte

More information

GT "Calcul Ensembliste"

GT Calcul Ensembliste GT "Calcul Ensembliste" Beyond the bounded error framework for non linear state estimation Fahed Abdallah Université de Technologie de Compiègne 9 Décembre 2010 Fahed Abdallah GT "Calcul Ensembliste" 9

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Adaptive spatial resampling as a Markov chain Monte Carlo method for uncertainty quantification in seismic reservoir characterization

Adaptive spatial resampling as a Markov chain Monte Carlo method for uncertainty quantification in seismic reservoir characterization 1 Adaptive spatial resampling as a Markov chain Monte Carlo method for uncertainty quantification in seismic reservoir characterization Cheolkyun Jeong, Tapan Mukerji, and Gregoire Mariethoz Department

More information

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Thomas Mejer Hansen, Klaus Mosegaard, and Knud Skou Cordua 1 1 Center for Energy Resources

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation November 2010 Nelson Shaw njd50@uclive.ac.nz Department of Computer Science and Software Engineering University of Canterbury,

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Global modelling of air pollution using multiple data sources

Global modelling of air pollution using multiple data sources Global modelling of air pollution using multiple data sources Matthew Thomas M.L.Thomas@bath.ac.uk Supervised by Dr. Gavin Shaddick In collaboration with IHME and WHO June 14, 2016 1/ 1 MOTIVATION Air

More information

Approximating Square Roots

Approximating Square Roots Math 560 Fall 04 Approximating Square Roots Dallas Foster University of Utah u0809485 December 04 The history of approximating square roots is one of the oldest examples of numerical approximations found.

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Bayesian Modelling with JAGS and R

Bayesian Modelling with JAGS and R Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained

More information

Modified Metropolis-Hastings algorithm with delayed rejection

Modified Metropolis-Hastings algorithm with delayed rejection Modified Metropolis-Hastings algorithm with delayed reection K.M. Zuev & L.S. Katafygiotis Department of Civil Engineering, Hong Kong University of Science and Technology, Hong Kong, China ABSTRACT: The

More information

A recursive point process model for infectious diseases.

A recursive point process model for infectious diseases. A recursive point process model for infectious diseases. Frederic Paik Schoenberg, UCLA Statistics Collaborators: Marc Hoffmann, Ryan Harrigan. Also thanks to: Project Tycho, CDC, and WHO datasets. 1 1.

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Discovery of the Source of Contaminant Release

Discovery of the Source of Contaminant Release Discovery of the Source of Contaminant Release Devina Sanjaya 1 Henry Qin Introduction Computer ability to model contaminant release events and predict the source of release in real time is crucial in

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

A Bayesian approach to artificial neural network model selection

A Bayesian approach to artificial neural network model selection A Bayesian approach to artificial neural network model selection Kingston, G. B., H. R. Maier and M. F. Lambert Centre for Applied Modelling in Water Engineering, School of Civil and Environmental Engineering,

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Tutorial using BEAST v2.4.7 MASCOT Tutorial Nicola F. Müller

Tutorial using BEAST v2.4.7 MASCOT Tutorial Nicola F. Müller Tutorial using BEAST v2.4.7 MASCOT Tutorial Nicola F. Müller Parameter and State inference using the approximate structured coalescent 1 Background Phylogeographic methods can help reveal the movement

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation Unit 5 SIMULATION THEORY Lesson 39 Learning objective: To learn random number generation. Methods of simulation. Monte Carlo method of simulation You ve already read basics of simulation now I will be

More information

This chapter explains two techniques which are frequently used throughout

This chapter explains two techniques which are frequently used throughout Chapter 2 Basic Techniques This chapter explains two techniques which are frequently used throughout this thesis. First, we will introduce the concept of particle filters. A particle filter is a recursive

More information

Global modelling of air pollution using multiple data sources

Global modelling of air pollution using multiple data sources Global modelling of air pollution using multiple data sources Matthew Thomas SAMBa, University of Bath Email: M.L.Thomas@bath.ac.uk November 11, 015 1/ 3 OUTLINE Motivation Data Sources Existing Approaches

More information

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Robotics. Lecture 5: Monte Carlo Localisation. See course website  for up to date information. Robotics Lecture 5: Monte Carlo Localisation See course website http://www.doc.ic.ac.uk/~ajd/robotics/ for up to date information. Andrew Davison Department of Computing Imperial College London Review:

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

Modeling Plant Succession with Markov Matrices

Modeling Plant Succession with Markov Matrices Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine

More information

Dealing with Categorical Data Types in a Designed Experiment

Dealing with Categorical Data Types in a Designed Experiment Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of

More information

Hierarchical Modelling for Large Spatial Datasets

Hierarchical Modelling for Large Spatial Datasets Hierarchical Modelling for Large Spatial Datasets Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics,

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

Calibration and emulation of TIE-GCM

Calibration and emulation of TIE-GCM Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)

More information

Variability in Annual Temperature Profiles

Variability in Annual Temperature Profiles Variability in Annual Temperature Profiles A Multivariate Spatial Analysis of Regional Climate Model Output Tamara Greasby, Stephan Sain Institute for Mathematics Applied to Geosciences, National Center

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here.

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. Contents About this Book...ix About the Authors... xiii Acknowledgments... xv Chapter 1: Item Response

More information

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn (5) Multi-parameter models - Initial values and convergence diagnostics Tuning the MCMC algoritm MCMC is beautiful because it can handle virtually any statistical model and it is usually pretty easy to

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization

Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization 10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Detecting Piecewise Linear Networks Using Reversible Jump Markov Chain Monte Carlo

Detecting Piecewise Linear Networks Using Reversible Jump Markov Chain Monte Carlo Clemson University TigerPrints All Theses Theses 8-2010 Detecting Piecewise Linear Networks Using Reversible Jump Markov Chain Monte Carlo Akshay Apte Clemson University, apte.aa@gmail.com Follow this

More information

INLA: an introduction

INLA: an introduction INLA: an introduction Håvard Rue 1 Norwegian University of Science and Technology Trondheim, Norway May 2009 1 Joint work with S.Martino (Trondheim) and N.Chopin (Paris) Latent Gaussian models Background

More information

Statistical Physics of Community Detection

Statistical Physics of Community Detection Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Statistical Analysis of the 3WAY Block Cipher

Statistical Analysis of the 3WAY Block Cipher Statistical Analysis of the 3WAY Block Cipher By Himanshu Kale Project Report Submitted In Partial Fulfilment of the Requirements for the Degree of Master of Science In Computer Science Supervised by Professor

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information