Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee sampling Nested sampling Situation At End Of Last Week We can evaluate a likelihood + prior as function of model parameters Can t evaluate everywhere in parameter space: num grid evaluations ~ (grid points) number params Would like some parameter constraints nonetheless, not just ML Monte Carlo Methods Collection of methods involving repeated sampling to get numerical results e.g. chance of dealing two pairs in five cards? Multi-dimensional integrals, graphics, fluid dynamics, genetics & evolution, AI, finance

Sampling Overview Sampling Motivation Simulate systems Sample = Generate values with given distribution Easy: distribution written down analytically Harder: can only evaluate distribution for given parameter choice Short-cut to expectations if you hate maths Approximate & characterize distributions Estimate distributions mean, std, etc. Make constraint plots Compare with other data Direct Sampling Simple analytic distribution: can generate samples pseudo-randomly Actually incredibly complicated import numpy as np np.random.poisson(lam=3.4, size=1000)" Be aware of random seeds

Direct Sampling Monte Carlo Markov Chains Often cannot directly sample from a distribution e.g. approx solution to Lecture 1 homework problem But can evaluate P(x) for any given x Markov chain = memory-less sequence x n = f(x n 1 ) Often easier than doing sums/integrals MCMC: Random generation rule for f(x) which yields xn with stationary distribution P(x) i.e. chain histogram tends to P MCMC Jump around parameter space Number of jumps near x ~ P(x) Given current sample xn in parameter space Generate new proposed sample p n Q(p n x n ) For simplest algorithm Q(p x) =Q(x p) Standard algorithms: x = single parameter set Advanced algorithms: x = multiple parameter sets x n+1 = ( p n x n with probability max P (pn ) P (x n ), 1 otherwise

Q(p n x n ) x n P (p n ) >P(x n ) Definite accept P (p n ) <P(x n ) Let = P (p n )/P (x n ) Generate u U(0, 1) Accept if >u

unlikely to accept maybe accept Proposal MCMC will always converge in samples Proposal function Q determines efficiency of MCMC Ideal proposal has Q ~ P (Multivariate) Gaussian centred on xn usually good Tune σ (+covariance) to match P Convergence Have I taken enough samples? Many convergence tests available Easiest is visual Ideal acceptance fraction ~ 0.2-0.3

Convergence Convergence Bad Mixing Structure and correlations visible x Good Mixing Looks like noise Must be true for all parameters in space x Chain Position Chain Position Convergence Burn in Run several chains and compare results (Variance of means) / (Mean of variances) Less than e.g. 3% Chain will explore peak only after finding it Cut off start of chain Probability Chain Position

Analysis Analysis Probability proportional to chain multiplicity Can also get expectations of derived parameters Histogram chain, in 1D or 2D Can also thin e.g. remove every other sample Probability x Z E[f(x)] = P (x)f(x)dx 1 N X f(x i ) i Importance Sampling Importance Sampling Re-sampling from re-weighted existing samples Changed prior / likelihood New data Z E[f(x)] = P 1 (x)f(x)dx 1 X f(x i ) N Chain 1 Z P1 (x) = P 2 (x) f(x) P 2 (x)dx 1 X N Chain 2 f(x i ) P 1(x i ) P 2 (x i ) i.e. Take a chain you sampled from some distribution P2 Give each sample a weight P1(x)/P2(x) for some new distribution P1 Make your histograms, estimates, etc, using these weights

Importance Sampling Gibbs Sampling Works better the more similar P2 is to P1 Won t work if P2 small where P1 isn t So better for extra data than different data Applicable when have >1 parameters a, b, c, z And can directly sample from conditional likelihoods: P(a bcd ), P(b acd ), P(c abd ), P(z abc y) Can be very efficient when possible Gibbs Sampling Gibbs Sampling Very simple algorithm - just each parameter in turn 2D version with parameters (a,b): for i =1... a i+1 P (a i+1 b i ) b i+1 P (b i+1 a i+1 )

Gibbs Sampling Gibbs Sampling Gibbs Sampling Emcee Sampling Multi-parameter case - not as bad as it looks: for i =1... for k =1...n param x i+1 k P (x i+1 k x i+1 1,x i+1 2,...,x i+1 k 1,xi k+1,x i k+2,...,x i n param ) Can also block groups of parameters together and update as vectors Goodman & Weare algorithm Group of live walkers in parameter space Parallel update rule on connecting lines - affine invariant Popular in astronomy for nice and friendly python package

Emcee Sampling Model Selection Goodman & Weare algorithm Group of live walkers in parameter space Parallel update rule on connecting lines - affine invariant Popular in astronomy for nice and friendly python package Given two models, how can we compare them? Simplest approach = compare ML Does not include uncertainty or Occam s Razor Recall that all our probabilities have been conditional on the model, as in Bayes: P (p M) = P (d pm)p (p M) P (d M) Model Selection: Bayesian Evidence Can use Bayes Theorem again, on model level: Model Selection: Bayesian Evidence Likelihood of parameters within model: P (M d) = P (d M)P (M) P (d) P (d pm) Only really meaningful when comparing models: Evidence of model: P (M 1 d) P (M 2 d) = P (d M 1) P (M 1 ) P (d M 2 ) P (M 2 ) Model Priors P (d p) Bayesian Evidence Values

Model Selection: Bayesian Evidence Evidence is the bit we ignored before when doing parameter estimation Given by an integral over prior space Z P (d M) = P (d pm)p (p M)dp Hard to evaluate - posterior usually small compared to prior Model Selection: Evidence Approximations Nice evidence approximations for some cases: Savage-Dickey Density ratio (for when one model is a subset of another) Akaike information criterion AIC Bayesian information criterion BIC Work in various circumstances Model Selection: Nested Sampling Model Selection: Nested Sampling

Model Selection: Nested Sampling Also uses ensemble of live points Computes constraints as well as evidence Each iteration, replace lowest likelihood point with one higher up By cleverly sampling into envelope of active points Multinest software is extremely clever C, F90, Python bindings Some Conclusions For a quick problems use Metropolis Hastings. Code up yourself to understand Maybe try pymc. If that s a pain, try emcee In parallel if slow evaluation For model selection or complex spaces use nested sampling with multinest Problem With a prior s ~ N(0.5, 0.1) write a Metropolis Hastings sampler to draw from the problem we did last week. Plot 1D and 2D constraints on the parameters