Approximate Bayesian Computation using Auxiliary Models

Size: px

Start display at page:

Download "Approximate Bayesian Computation using Auxiliary Models"

Lynette Short
5 years ago
Views:

1 Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi, Malcolm Faddy Queensland University of Technology Brisbane MCQMC February 2012 Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

2 Outline 1 Motivating Problem Application to modelling Macroparasite Immunity 2 Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC 3 Macroparasite Population Evolution Summary Statistics Auxiliary models 4 Results Posterior results ABC fits to Data 5 Conclusions Macroparasite model ABC Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

3 Motivating Problem 1 Motivating Problem Application to modelling Macroparasite Immunity 2 Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC 3 Macroparasite Population Evolution Summary Statistics Auxiliary models 4 Results Posterior results ABC fits to Data 5 Conclusions Macroparasite model ABC Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

4 Motivating Problem Application to modelling Macroparasite Immunity Macroparasite Immunity Estimate parameters of a Markov process model explaining macroparasite population development with host immunity 212 hosts (cats) i = 1,...,212. Each cat injected with l i juvenile Brugia pahangi larvae (approximately 100 or 200). At time t i host is sacrificed and the number of mature worms are recorded Host assumed to develop an immunity Three discrete variables: M(t) matures, L(t) juveniles, I(t) immunity. Only L(0) and M at sacrifice time are observed for each host Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

5 Motivating Problem Application to modelling Macroparasite Immunity Macroparasite Immunity data, proportion of mature vs sacrifice time Proporton of Matures Time Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

6 Motivating Problem Application to modelling Macroparasite Immunity Trivariate Markov Process of Riley et al (2003) Invisible Invisible Mature Parasites M(t) Maturation γl(t) Juvenile Parasites L(t) µmm(t) Natural death Immunity µll(t) Natural death βi(t)l(t) Death due to immunity Gain of immunity νl(t) I(t) Loss of immunity µii(t) Invisible Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

7 Motivating Problem Application to modelling Macroparasite Immunity The Model and Intractable Likelihood L,M,I are discrete counts, I hypothesised variable Deterministic form of the model dl dt = µ LL βil γl, dm dt = γl µ MM, di dt = νl µ II, µ m,γ fixed. ν,µ L,µ I,β require estimation Likelihood based on Markov process is intractable Simulation of process L,M,I using Gillespie s algorithm (Gillespie, 1977) A common mathematical model: epidemics, chemical kinetics, systems biology... Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

8 Approximate Bayesian Computation Introduction to ABC 1 Motivating Problem Application to modelling Macroparasite Immunity 2 Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC 3 Macroparasite Population Evolution Summary Statistics Auxiliary models 4 Results Posterior results ABC fits to Data 5 Conclusions Macroparasite model ABC Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

9 Approximate Bayesian Computation Introduction to ABC ABC and the Approximate Posterior I Bayesian statistics involves inference based on the posterior distribution p(θ y) p(y θ)p(θ). and if the likelihood cannot be evaluated? but easy to simulate, x, from the likelihood Applications... genetics, biology, finance,... Involves a joint posterior distribution for θ and simulated data x p(θ,x y) g(y x)p(x θ)p(θ) where g(y x) measures closeness of observed data y to simulated data x. (Reeves and Pettitt, 2005) Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

10 Approximate Bayesian Computation Introduction to ABC ABC and the Approximate Posterior II Compare simulated values, x, and observed data, y through summary statistics S(.) = S 1 (.),...,S p (.) One choice, set ρ(y,x) = S(y) S(x) g(y x) = 1(ρ(y,x) ǫ) Choice of ǫ trade off between accuracy and computational time Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

11 Approximate Bayesian Computation Three ABC Algorithms Rejection Sampling ABC - RS-ABC Rejection Sampling (RS-ABC) (Beaumont et al, 2002) Sample θ p(θ) Simulate x p(. θ ) Accept θ if ρ(y, x) ǫ Repeat the above until we have N values, θ 1,...,θ N Advantages: Simplicity, Independent values, parallelizable Disadvantage: Inefficient. Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

12 Approximate Bayesian Computation Three ABC Algorithms MCMC ABC Majoram et al, 2003 Advantages: theoretical understanding, use in SMC Disadvantages: Dependent Samples, Markov chain convergence, sampler can get stuck, inefficient, needs tuning Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

13 Approximate Bayesian Computation Sequential Monte Carlo ABC Sequential Monte Carlo for ABC Chopin(2002), Del Moral et al (2006),Sisson et al (2007, 2009), Beaumont et al (2009) Approximate posterior by weighted sample {θ i,w i } N i=1, N particles Define sequence of joint targets p t (θ,x y) p(x θ)p(θ)1(ρ(x,y) ǫ t ), for t = 1,...,T, and a sequence of decreasing tolerances ǫ 1 ǫ 2 ǫ T. At t = 1 draw particles from prior, set ǫ 1 = Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

14 Approximate Bayesian Computation Sequential Monte Carlo ABC A Fully Adaptive SMC ABC Algorithm I Drovandi and Pettitt (2011), Del Moral et al (2011) Reweight particles, either zero or proportional to 1, W i t 1(ρ(xi t,y) ǫ t) 1(ρ(x i t,y) ǫ t 1 ), Choose ǫ t so that M have zero weights and N M have non-zero weights Replenish population by resampling M from N M alive particles. Diversify the particles with an MCMC kernel, q t (..), that is stationary for the current target. q t (..) determined adaptively using the alive set. Apply MCMC kernel R t times so that overall acceptance close to 1. Learn R t adaptively. Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

15 Macroparasite Population Evolution 1 Motivating Problem Application to modelling Macroparasite Immunity 2 Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC 3 Macroparasite Population Evolution Summary Statistics Auxiliary models 4 Results Posterior results ABC fits to Data 5 Conclusions Macroparasite model ABC Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

16 Macroparasite Population Evolution Macroparasite Population Evolution Proporton of Matures Time Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

17 Macroparasite Population Evolution Summary Statistics Developing Summary Statistics Nunes and Balding (2009) For all sets and subsets of summary statistics, carry out AS-ABC for a fixed acceptance rate Compare ABC approximations in terms of concentration of posterior distribution. Use a non-parametric measure of entropy Fearnhead and Prangle (2012) suitable for iid cases Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

18 Macroparasite Population Evolution Summary Statistics Developing Summary Statistics using Indirect Inference Summary statistics that efficiently summarize data! Different numbers of juveniles L, different sacrifice times. An approach based on indirect inference (Gouriéroux and Ronchetti, 1993) Propose an auxiliary model p a (y θ a ) where parameter θ a is easily estimated (eg easy MLE) Auxiliary model is flexible enough to provide a good description of the data Simulate x θ from target intractable likelihood p( θ) and find ˆθ a (x θ ) Estimate θ using ˆθ a (x θ ) closest to ˆθ a (y) Alternative to ABC Estimates of parameters of the auxiliary model fitted to the data become the summary statistics in ABC. Models based on Beta-Binomial to capture variability Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

19 Macroparasite Population Evolution Summary Statistics ABC Summary Statistics from auxiliary models How to choose between different auxiliary models? Either use data analytical tools for the original data set, eg AIC Or use the Nunes and Balding approach based on ABC approximations for each model/ summary statistic choice Former is far less computer intensive but does not consider the ABC approximation Compare and contrast different Beta Binomial models Models fitted using MLE and AIC used to the rank models The models range over about 33 units of AIC How are the different fits (AIC) reflected in the ABC posteriors? Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

20 Macroparasite Population Evolution Auxiliary models Auxiliary Beta-Binomial model The data show too much variation for Binomial A Beta-Binomial model has an extra parameter to capture dispersion ( ) li B(mi + α i,l i m i + β i ) p(m i α i,β i ) =, B(α i,β i ) m i Use reparameterisation p i = α i /(α i + β i ) and θ i = 1/(α i + β i ) Relate the proportion, p i to time, t i, and over dispersion parameter θ i, to initial larvae, l i Compare various models, best (AIC=1897) with others with AIC=1911 and AIC=1930 Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

21 Macroparasite Population Evolution Auxiliary models Auxiliary Beta-Binomial model The data show too much variation for Binomial A Beta-Binomial model has an extra parameter to capture dispersion ( ) li B(mi + α i,l i m i + β i ) p(m i α i,β i ) =, B(α i,β i ) m i Use reparameterisation p i = α i /(α i + β i ) and θ i = 1/(α i + β i ) Relate the proportion, p i to time, t i, and over dispersion parameter θ i, to initial larvae, l i Compare various models, best (AIC=1897) with others with AIC=1911 and AIC=1930 Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

22 Macroparasite Population Evolution Auxiliary models Summary statistics For each model Compare the auxiliary estimates for simulated data x, θ x a, and observations y, ˆθ y a, with the Mahalanobis distance ρ(y,x) = ρ(ˆθ y a,θx a ) = (ˆθ y a θx a )T S 1 (ˆθ y a θx a ), where S is the covariance matrix for the MLE Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

23 Results 1 Motivating Problem Application to modelling Macroparasite Immunity 2 Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC 3 Macroparasite Population Evolution Summary Statistics Auxiliary models 4 Results Posterior results ABC fits to Data 5 Conclusions Macroparasite model ABC Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

24 Results Posterior results SMC Algorithm Settings and Posterior Results Algorithm Settings Take N = 1000 particles, discard half each iteration, finish with 3% acceptance for MCMC kernel. Repeat MCMC kernel to get about 99% acceptance at each iteration Fixed values γ = 0.04, µ M = Prior choices: ν: U(0,1) µ L : U(0,1) µ I : U(0,2), β: U(0,2), Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

25 Results Posterior results SMC Algorithm Settings and Posterior Results Algorithm Settings Take N = 1000 particles, discard half each iteration, finish with 3% acceptance for MCMC kernel. Repeat MCMC kernel to get about 99% acceptance at each iteration Fixed values γ = 0.04, µ M = Prior choices: ν: U(0,1) µ L : U(0,1) µ I : U(0,2), β: U(0,2), Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

26 Results Posterior results SMC Algorithm Settings and Posterior Results Algorithm Settings Take N = 1000 particles, discard half each iteration, finish with 3% acceptance for MCMC kernel. Repeat MCMC kernel to get about 99% acceptance at each iteration Fixed values γ = 0.04, µ M = Prior choices: ν: U(0,1) µ L : U(0,1) µ I : U(0,2), β: U(0,2), Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

27 Results Posterior results Posterior Density for ν AIC 1930 AIC 1911 AIC 1897 density ν x 10 3 Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

28 Results Posterior results Posterior Density for µ L AIC 1930 AIC 1911 AIC density µ L Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

29 Results Posterior results Posterior Density Differences More concentrated ABC posterior for worse (bigger) AIC Very different modes Nunes-Balding criterion for summary statistics: prefer bigger AIC models. Real test: the predictions from the different ABC fitted models? Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

30 Results ABC fits to Data ABC Fit to data 95 percent prediction interval from the stochastic model using ABC modal estimates: with best AIC auxiliary model on left, worst on right 1 95% prediction intervals based on the auxiliary Beta Binomial model 1 95% prediction intervals based on the auxiliary Binomial mixture model mature count mature count Autopsy time Autopsy time ABC model fit based on best AIC auxiliary model captures variability of data Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

31 Conclusions 1 Motivating Problem Application to modelling Macroparasite Immunity 2 Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC 3 Macroparasite Population Evolution Summary Statistics Auxiliary models 4 Results Posterior results ABC fits to Data 5 Conclusions Macroparasite model ABC Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

32 Conclusions Macroparasite model Conclusions: Macroparasite stochastic model µ L,ν more associated with variance of process, ABC differences between auxiliary models over range of AIC Most concentrated ABC posteriors are not best for model predictions The stochastic model using the ABC fit with the best AIC auxiliary model for summary statistics gives good fit to data Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

33 Conclusions ABC Conclusions: ABC and summary statistics Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for observed data - statisticians strength! Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes otherwise ad hoc choices of summaries made. Presented an adaptive self-tuning ABC SMC algorithm Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

34 Conclusions ABC Conclusions: ABC and summary statistics Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for observed data - statisticians strength! Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes otherwise ad hoc choices of summaries made. Presented an adaptive self-tuning ABC SMC algorithm Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

35 Conclusions ABC Conclusions: ABC and summary statistics Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for observed data - statisticians strength! Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes otherwise ad hoc choices of summaries made. Presented an adaptive self-tuning ABC SMC algorithm Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

36 Conclusions ABC Conclusions: ABC and summary statistics Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for observed data - statisticians strength! Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes otherwise ad hoc choices of summaries made. Presented an adaptive self-tuning ABC SMC algorithm Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

37 Conclusions ABC Conclusions: ABC and summary statistics Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for observed data - statisticians strength! Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes otherwise ad hoc choices of summaries made. Presented an adaptive self-tuning ABC SMC algorithm Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

38 Conclusions ABC Conclusions: ABC and summary statistics Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for observed data - statisticians strength! Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes otherwise ad hoc choices of summaries made. Presented an adaptive self-tuning ABC SMC algorithm Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

39 Conclusions ABC Acknowledgements Australian Research Council, Chris Drovandi, Malcolm Faddy Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

40 Conclusions ABC Foresight It will be maintained that the end of an era has now been reached, as regards both statistical methods and computational techniques, and an outline of the way in which biometric techniques in genetical demography may be expected to develop will be given. Particular emphasis will be placed on the need to formulate sound methods of estimation by simulation on complex models. Edwards AWF, 1967, Biometrics, 23, 176 Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

41 Conclusions ABC Foresight It will be maintained that the end of an era has now been reached, as regards both statistical methods and computational techniques, and an outline of the way in which biometric techniques in genetical demography may be expected to develop will be given. Particular emphasis will be placed on the need to formulate sound methods of estimation by simulation on complex models. Edwards AWF, 1967, Biometrics, 23, 176 Tony Pettitt () ABC using Auxiliary Models MCQMC February / 32

MCMC Methods for data modeling

MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms