Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Similar documents
An Introduction to Markov Chain Monte Carlo

MCMC Methods for data modeling

Quantitative Biology II!

Markov chain Monte Carlo methods

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Estimation of Item Response Models

Markov chain Monte Carlo sampling

Hierarchical Bayesian Modeling with Ensemble MCMC. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 12, 2014

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

Bayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea

Markov Chain Monte Carlo (part 1)

Chapter 1. Introduction

Monte Carlo for Spatial Models

10.4 Linear interpolation method Newton s method

Probabilistic Graphical Models

Statistical techniques for data analysis in Cosmology

Multi-Modal Metropolis Nested Sampling For Inspiralling Binaries

Linear Modeling with Bayesian Statistics

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here.

WALLABY kinematic parametre extraction:

Markov Chain Monte Carlo on the GPU Final Project, High Performance Computing

1 Methods for Posterior Simulation

Markov Random Fields and Gibbs Sampling for Image Denoising

Warm-up as you walk in

Computer vision: models, learning and inference. Chapter 10 Graphical Models

An Introduction to PDF Estimation and Clustering

CS281 Section 9: Graph Models and Practical MCMC

What is machine learning?

MSA101/MVE Lecture 5

Tutorial using BEAST v2.4.1 Troubleshooting David A. Rasmussen

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Clustering Relational Data using the Infinite Relational Model

Monte Carlo Methods. Lecture slides for Chapter 17 of Deep Learning Ian Goodfellow Last updated

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Probabilistic Robotics

Nested Sampling: Introduction and Implementation

10-701/15-781, Fall 2006, Final

STAT 203 SOFTWARE TUTORIAL

RJaCGH, a package for analysis of

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard

A Bayesian approach to artificial neural network model selection

Analysis of Incomplete Multivariate Data

Convexization in Markov Chain Monte Carlo

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

Approximate (Monte Carlo) Inference in Bayes Nets. Monte Carlo (continued)

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Sampling and Monte-Carlo Integration

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Note Set 4: Finite Mixture Models and the EM Algorithm

Short-Cut MCMC: An Alternative to Adaptation

10703 Deep Reinforcement Learning and Control

Machine Learning (BSMC-GA 4439) Wenke Liu

GiRaF: a toolbox for Gibbs Random Fields analysis

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Expectation Maximization: Inferring model parameters and class labels

Mixture Models and the EM Algorithm

Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling

Detecting Piecewise Linear Networks Using Reversible Jump Markov Chain Monte Carlo

Approximate Bayesian Computation using Auxiliary Models

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Bootstrapping Methods

Clustering web search results

Monte Carlo Integration

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

L10. PARTICLE FILTERING CONTINUED. NA568 Mobile Robotics: Methods & Algorithms

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Probability and Statistics for Final Year Engineering Students

Missing variable problems

Stochastic Simulation: Algorithms and Analysis

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Discovery of the Source of Contaminant Release

Missing Data and Imputation

Graphical Models, Bayesian Method, Sampling, and Variational Inference

CS281 Section 3: Practical Optimization

Supplementary Material sppmix: Poisson point process modeling using normal mixture models

Integration. Volume Estimation

Case Study IV: Bayesian clustering of Alzheimer patients

Adaptive spatial resampling as a Markov chain Monte Carlo method for uncertainty quantification in seismic reservoir characterization

Segmentation Computer Vision Spring 2018, Lecture 27

Parallel Adaptive Importance Sampling. Cotter, Colin and Cotter, Simon and Russell, Paul. MIMS EPrint:

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Particle Filters for Visual Tracking

Statistical Matching using Fractional Imputation

Clustering Lecture 5: Mixture Model

Bayesian Modelling with JAGS and R

PIANOS requirements specifications

Recap: The E-M algorithm. Biostatistics 615/815 Lecture 22: Gibbs Sampling. Recap - Local minimization methods

CS 664 Image Matching and Robust Fitting. Daniel Huttenlocher

A Short History of Markov Chain Monte Carlo

Probabilistic Robotics

Transcription:

Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee sampling Nested sampling Situation At End Of Last Week We can evaluate a likelihood + prior as function of model parameters Can t evaluate everywhere in parameter space: num grid evaluations ~ (grid points) number params Would like some parameter constraints nonetheless, not just ML Monte Carlo Methods Collection of methods involving repeated sampling to get numerical results e.g. chance of dealing two pairs in five cards? Multi-dimensional integrals, graphics, fluid dynamics, genetics & evolution, AI, finance

Sampling Overview Sampling Motivation Simulate systems Sample = Generate values with given distribution Easy: distribution written down analytically Harder: can only evaluate distribution for given parameter choice Short-cut to expectations if you hate maths Approximate & characterize distributions Estimate distributions mean, std, etc. Make constraint plots Compare with other data Direct Sampling Simple analytic distribution: can generate samples pseudo-randomly Actually incredibly complicated import numpy as np np.random.poisson(lam=3.4, size=1000)" Be aware of random seeds

Direct Sampling Monte Carlo Markov Chains Often cannot directly sample from a distribution e.g. approx solution to Lecture 1 homework problem But can evaluate P(x) for any given x Markov chain = memory-less sequence x n = f(x n 1 ) Often easier than doing sums/integrals MCMC: Random generation rule for f(x) which yields xn with stationary distribution P(x) i.e. chain histogram tends to P MCMC Jump around parameter space Number of jumps near x ~ P(x) Given current sample xn in parameter space Generate new proposed sample p n Q(p n x n ) For simplest algorithm Q(p x) =Q(x p) Standard algorithms: x = single parameter set Advanced algorithms: x = multiple parameter sets x n+1 = ( p n x n with probability max P (pn ) P (x n ), 1 otherwise

Q(p n x n ) x n P (p n ) >P(x n ) Definite accept P (p n ) <P(x n ) Let = P (p n )/P (x n ) Generate u U(0, 1) Accept if >u

unlikely to accept maybe accept Proposal MCMC will always converge in samples Proposal function Q determines efficiency of MCMC Ideal proposal has Q ~ P (Multivariate) Gaussian centred on xn usually good Tune σ (+covariance) to match P Convergence Have I taken enough samples? Many convergence tests available Easiest is visual Ideal acceptance fraction ~ 0.2-0.3

Convergence Convergence Bad Mixing Structure and correlations visible x Good Mixing Looks like noise Must be true for all parameters in space x Chain Position Chain Position Convergence Burn in Run several chains and compare results (Variance of means) / (Mean of variances) Less than e.g. 3% Chain will explore peak only after finding it Cut off start of chain Probability Chain Position

Analysis Analysis Probability proportional to chain multiplicity Can also get expectations of derived parameters Histogram chain, in 1D or 2D Can also thin e.g. remove every other sample Probability x Z E[f(x)] = P (x)f(x)dx 1 N X f(x i ) i Importance Sampling Importance Sampling Re-sampling from re-weighted existing samples Changed prior / likelihood New data Z E[f(x)] = P 1 (x)f(x)dx 1 X f(x i ) N Chain 1 Z P1 (x) = P 2 (x) f(x) P 2 (x)dx 1 X N Chain 2 f(x i ) P 1(x i ) P 2 (x i ) i.e. Take a chain you sampled from some distribution P2 Give each sample a weight P1(x)/P2(x) for some new distribution P1 Make your histograms, estimates, etc, using these weights

Importance Sampling Gibbs Sampling Works better the more similar P2 is to P1 Won t work if P2 small where P1 isn t So better for extra data than different data Applicable when have >1 parameters a, b, c, z And can directly sample from conditional likelihoods: P(a bcd ), P(b acd ), P(c abd ), P(z abc y) Can be very efficient when possible Gibbs Sampling Gibbs Sampling Very simple algorithm - just each parameter in turn 2D version with parameters (a,b): for i =1... a i+1 P (a i+1 b i ) b i+1 P (b i+1 a i+1 )

Gibbs Sampling Gibbs Sampling Gibbs Sampling Emcee Sampling Multi-parameter case - not as bad as it looks: for i =1... for k =1...n param x i+1 k P (x i+1 k x i+1 1,x i+1 2,...,x i+1 k 1,xi k+1,x i k+2,...,x i n param ) Can also block groups of parameters together and update as vectors Goodman & Weare algorithm Group of live walkers in parameter space Parallel update rule on connecting lines - affine invariant Popular in astronomy for nice and friendly python package

Emcee Sampling Model Selection Goodman & Weare algorithm Group of live walkers in parameter space Parallel update rule on connecting lines - affine invariant Popular in astronomy for nice and friendly python package Given two models, how can we compare them? Simplest approach = compare ML Does not include uncertainty or Occam s Razor Recall that all our probabilities have been conditional on the model, as in Bayes: P (p M) = P (d pm)p (p M) P (d M) Model Selection: Bayesian Evidence Can use Bayes Theorem again, on model level: Model Selection: Bayesian Evidence Likelihood of parameters within model: P (M d) = P (d M)P (M) P (d) P (d pm) Only really meaningful when comparing models: Evidence of model: P (M 1 d) P (M 2 d) = P (d M 1) P (M 1 ) P (d M 2 ) P (M 2 ) Model Priors P (d p) Bayesian Evidence Values

Model Selection: Bayesian Evidence Evidence is the bit we ignored before when doing parameter estimation Given by an integral over prior space Z P (d M) = P (d pm)p (p M)dp Hard to evaluate - posterior usually small compared to prior Model Selection: Evidence Approximations Nice evidence approximations for some cases: Savage-Dickey Density ratio (for when one model is a subset of another) Akaike information criterion AIC Bayesian information criterion BIC Work in various circumstances Model Selection: Nested Sampling Model Selection: Nested Sampling

Model Selection: Nested Sampling Also uses ensemble of live points Computes constraints as well as evidence Each iteration, replace lowest likelihood point with one higher up By cleverly sampling into envelope of active points Multinest software is extremely clever C, F90, Python bindings Some Conclusions For a quick problems use Metropolis Hastings. Code up yourself to understand Maybe try pymc. If that s a pain, try emcee In parallel if slow evaluation For model selection or complex spaces use nested sampling with multinest Problem With a prior s ~ N(0.5, 0.1) write a Metropolis Hastings sampler to draw from the problem we did last week. Plot 1D and 2D constraints on the parameters