BART STAT8810, Fall 2017
|
|
- Barbara Hunter
- 5 years ago
- Views:
Transcription
1 BART STAT8810, Fall 2017 M.T. Pratola November 1, 2017
2 Today BART: Bayesian Additive Regression Trees
3 BART: Bayesian Additive Regression Trees Additive model generalizes the single-tree regression model: Y (x) = f (x) + ɛ, ɛ N(0, σ 2 ) where m f (x) = g(x; T j, M j ). j=1 We viewed each tree as representing a map g(x; T j, M j ) : R p R. Can get a richer class of models by considering the sum of many such maps. We will see that each individual function g(x; T j, M j ) is constrained to be a simplistic function that explains only a small portion of the response variability. so-called sum of weak-learners assumption.
4 BART: Bayesian Additive Regression Trees Figure 1: BART uses a sum of many simple trees.
5 BART: Bayesian Additive Regression Trees There is, of course, nothing new about a GAM-like formulation. However, one advantage of the tree-based approach is the tree s natural ability to capture interactions, possibly of a high-dimensional form. Or, to not capture such behavior if not present. That is, in the tree based approach we are learning the form of predictor functions themselves rather than assuming a fixed class of bases with a particular form.
6 BART Model The data is modeled as m Y (x) {(T j, M j )} m j=1, σ2 N g(x; T j, M j ), σ 2 giving our likelihood, j=1 L({(T j, M j )} m j=1, σ2 Y) = 2 1 (2πσ 2 exp ) n/2 1 n m y(x 2σ 2 i ) g(x i T j, M j ) i=1 j=1 where Y = (y(x 1 ),..., y(x n )).
7 BART Model Default number of trees is m = 200, which seems to work well in many problems. Increasing m allows one to have a model with greater fidelity to more complex responses. Interpretation is as follows. We can view g(x; T j, M j ) as a function that assigns a terminal node scalar µ jb for a given input x. And so, the expected response E[Y (x) {(T j, M j )} m j=1 ] is simply the sum of all such µ jb s that is assigned to x by each tree (T j, M j ).
8 BART Priors Similar to our single-tree model, the BART prior is factored as m π({(t j, M j )} m j=1, σ2 ) = π(σ 2 ) π(m j T j )π(t j ) j=1 where for each j, π(m j T j ) = B j b=1 π(µ jb ), where B j = M j is the number of terminal nodes in tree T j.
9 BART Priors: π(t j ) The interior of the tree T j is made up of split rules, {(v ji, c ji )} T j i=1 with discrete uniform priors as in our single-tree model, π(v j ) = π(v ji ) i and π(c j ) = i π(c ji v ji, T j \v ji ). And the tree is regularized via the depth-penalizing prior from before as well, π(η ji splits) = α(1 + d(η ji, η j1 )) β where d(η ji, η j1 ) is the depth from node η ji to the root node in tree T j. The default prior is the same as our single-tree model: α = 0.95, β = 2.
10 BART Priors: π(µ ji T j ) The prior on the scalar terminal node parameters is the conjugate Normal, µ ji N(µ µ, σ 2 µ). Note that this prior implies a priori that the prior on E[Y (x) {(T j, M j )} m j=1 ] is N(mµ µ, mσ 2 µ). In practice, data is usually centered to have mean zero and scaled so (y min, y max ) = ( 0.5, 0.5) and the prior used is µ ji N(0, σ 2 µ).
11 BART Priors: π(µ ji T j ) The induced prior on E[Y (x) {(T j, M j )} m j=1 ] is N(0, mσ 2 µ). The strategy to calibrate this prior is the same as the single-tree model: choose a value k such that k mσ µ = 0.5 which implies that the prior is µ ji N(0, 0.5 k m ). As in the single-tree model, the default value recommended is k = 2.
12 BART Priors: π(µ ji T j ) Prior with m=200, k=2 Density µ ji
13 BART Priors: π(µ ji T j ) This means that we induce further shrinkage of the µ ji s towards zero by increasing k, or by increasing m. However, for m fixed, increasing k implies more of the response variability ends up in σ 2. While for k fixed, increasing m implies more of the response variability ends up in f (x). Besides the default choice of k = 2, one might try tuning this prior hyperparameter using cross-validation.
14 BART Priors: π(σ 2 ) The variance prior is again σ 2 χ 2 (ν, τ 2 ) and is calibrated similarly as in the single-tree model. ν is selected to get an appropriate shape. Typical values are between 3 and 10, with ν = 3 being the default.
15 BART Priors: π(σ 2 ) The scale parameter τ 2 is selected in the following way. Provide an initial estimate of the standard deviation of your data, ˆσ. Typically the sample standard deviation. Provide an upper quantile q, with q = 0.90 being the default. τ 2 is selected so that, a priori, P(σ < ˆσ) = q. The idea is that our data is unlikely all noise, so a conservative approach is to setup the prior such that it is very unlikely to estimate the variance to be greater than the sample variance of our data. The smaller ν the more concentrated on small σ the prior becomes.
16 Sampling BART s Posterior The posterior is π({(t j, M j )} m j=1, σ2 Y) m L({(T j, M j )} m j=1, σ2 Y)π(σ 2 ) π(m j T j )π(t j ) j=1
17 Sampling BART s Posterior First, note the following: = = L({(T j, M j )} m j=1, σ2 Y) 2 1 (2πσ 2 exp ) n/2 1 n m y(x 2σ 2 i ) g(x i T j, M j ) i=1 j=1 ( ) 1 1 n (2πσ 2 exp ) n/2 2σ 2 (r j (x i ) g(x i T j, M j )) 2 i=1 where r j (x i ) = y(x i ) m k j g(x i T k, M k ).
18 Sampling BART s Posterior Our MCMC algorithm will perform the following steps: 1. For j = 1,..., m: 1.1 Draw T j σ 2, R j where R j = (r j (x 1 ),..., r j (x n )) Metropolis-Hastings step via proposal distribution 1.2 Draw M j T j, σ 2, R j Gibbs step using conjugate prior 2. Draw σ 2 {(T j, M j )} m j=1, Y Gibbs step using conjugate prior So once we have the R j s, the algorithm proceeds similarly as the single-tree algorithm.
19 We have So, π(σ 2 ν, τ 2 ) = Draw σ 2 {(T j, M j )} m j=1, Y ( ) ντ 2 ν/2 2 Γ ( ν 2 ) σ ν+2 exp ( ντ 2 ) 2σ 2 1 ( exp ντ 2 ) σν+2 2σ 2 π(σ 2 {(T j, M j )} m j=1, Y) 1 ( σ n exp 1 ) n 2σ 2 (y(x i ) f (x i )) 2 i=1 1 ( exp ντ 2 ) σν+2 2σ 2 ( ( 1 (ν + n) ντ 2 + ns 2 ) = exp σ (ν+n)+2 2σ 2 ν + n where s 2 = 1 n ni=1 (y i f (x i )) 2 and f (x i ) = m j=1 g(x i ; T j, M j ).
20 Draw σ 2 {(T j, M j )} m j=1, Y ( ( )) 1 And we recognize exp (ν+n) ντ 2 +ns 2 σ (ν+n)+2 2σ 2 ν+n as the kernel of a scaled-inverse-chisquared distribution, so ( σ 2 {(T j, M j )} m j=1, Y χ 2 ν + n, ντ 2 + ns 2 ) ν + n So we know how to perform the Gibbs step for σ 2.
21 Draw M j T j, σ 2, R j Suppose there are B terminal nodes in tree T j, η b j1,..., ηb jb. Using the same factorization as the single-tree case: ( L(σ 2, T j, M j R j ) exp 1 ) n 2σ 2 (r j (x i ) g(x i T j, M j )) 2 i=1 = exp 1 B n k 2σ 2 (r j (x i ) µ jk ) 2 k=1 i:r j (x i ) ηk b B = exp 1 n k 2σ 2 (r j (x i ) µ jk ) 2 k=1 i:r j (x i ) ηk b where n k is the number of observations mapping to terminal nodes η b jk and k n k = n.
22 Draw M T, σ 2, y In other words, conditional on T j, R j, the scalar terminal node parameters are independent. So, we can simply write down the full conditional for each µ jk and draw them sequentially using Gibbs steps.
23 Draw µ jk T j, σ 2, R j Assuming mean-centered observations, our prior is π(µ jk T j ) = N(0, σ 2 µ). Based on our Normal-Normal conjugacy results, the full conditional is π(µ jk σ 2 nk, T j, R j ) N ( σ ) 1 ) ( (nk r jk nk σµ 2 σ 2, σ ) 1 σµ 2 where r jk = 1 n k i:r j (x i ) ηk b r j (x i ).
24 Draw T j σ 2, R j Figure 2: Tree Moves
25 Marginal Likelihood We will again need to marginalize our likelihood over the µ parameters. Marginalizing the portion of the likelihood associated with terminal node ηk b, we have L(η b jk σ 2, R j ) = µ jk L(η b jk µ jk, σ 2, R j )π(µ jk )dµ jk
26 Birth Proposal 1. Randomly select a terminal node k {1,..., B j } with probability 1 B j where B j = M j. 2. Introduce a new rule v jk π v (v jk ) and cutpoint c jk π c (c jk ) where π v, π c are typically discrete Uniform on the available variable, cutpoints. 3. Calculate { α = min 1, π(t j σ2, R j )q(t j T j ) } π(t j σ 2, R j )q(t j T j) 4. Generate u Uniform(0, 1). If u < α then accept T j otherwise reject. As mentioned in the single-tree model, death proposals work similarly.
27 MCMC Algorithm Let s recap our sampling algorithm. 1. For j = 1,..., m: 1.1 Draw T j σ 2, R j With probability π b do a birth proposal, otherwise a death proposal. More complex moves possible, such as changing variable/cutpoints of existing tree. 1.2 Draw M j T j, σ 2, R j For k = 1,..., B j, perform our Gibbs steps by drawing ( ( µ jk σ 2 nk, T j, R j N σ ) 1 ( ) ( nk r jk nk σµ 2 σ 2, σ ) ) 1 σµ 2 2. Draw σ 2 {(T j, M j )} m j=1, Y Perform our Gibbs step by drawing ( σ 2 {(T j, M j )} m j=1, Y χ 2 ν + n, ντ 2 + ns 2 ) ν + n
28 Example source("dace.sim.r") # Generate response: set.seed(88) n=5; k=1; rhotrue=0.2; lambdatrue=1 design=as.matrix(runif(n)) l1=list(m1=outer(design[,1],design[,1],"-")) l.dez=list(l1=l1) R=rhogeodacecormat(l.dez,c(rhotrue))$R L=t(chol(R)) u=rnorm(nrow(r)) z=l%*%u # Our observed data: y=as.vector(z)
29 Example library(bayestree) preds=matrix(seq(0,1,length=100),ncol=1) # Variance prior shat=sd(y) nu=3 q=0.90 # Mean prior k=2 # Tree prior m=1 alpha=0.95 beta=2 nc=100 # MCMC settings N=1000 burn=1000
30 ## ## ## Running BART with numeric y ## ## number of trees: 1 ## Prior: ## k: ## degrees of freedom in sigma prior: 3 ## quantile in sigma prior: ## power and base for tree prior: ## use quantiles for rule cut points: 0 ## data: ## number of training observations: 5 Example fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
31 Example plot(design,y,pch=20,col="red",cex=2,xlim=c(0,1), ylim=c(2.3,3.7),xlab="x", main="predicted mean response +/- 2s.d.") for(i in 1:nrow(fit$yhat.test)) lines(preds,fit$yhat.test[i,],col="grey",lwd=0.25) mean=apply(fit$yhat.test,2,mean) sd=apply(fit$yhat.test,2,sd) lines(preds,mean-1.96*sd,lwd=0.75,col="black") lines(preds,mean+1.96*sd,lwd=0.75,col="black") lines(preds,mean,lwd=2,col="blue") points(design,y,pch=20,col="red")
32 Example Predicted mean response +/ 2s.d. y x
33 ## ## ## Running BART with numeric y ## ## number of trees: 10 ## Prior: ## k: ## degrees of freedom in sigma prior: 3 ## quantile in sigma prior: ## power and base for tree prior: ## use quantiles for rule cut points: 0 Example # Try m=10 trees m=10 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
34 Example Predicted mean response +/ 2s.d. y x
35 ## ## ## Running BART with numeric y ## ## number of trees: 20 ## Prior: ## k: ## degrees of freedom in sigma prior: 3 ## quantile in sigma prior: ## power and base for tree prior: ## use quantiles for rule cut points: 0 Example # Try m=20 trees m=20 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
36 Example Predicted mean response +/ 2s.d. y x
37 ## ## ## Running BART with numeric y ## ## number of trees: 100 ## Prior: ## k: ## degrees of freedom in sigma prior: 3 ## quantile in sigma prior: ## power and base for tree prior: ## use quantiles for rule cut points: 0 Example # Try m=100 trees m=100 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
38 Example Predicted mean response +/ 2s.d. y x
39 ## ## ## Running BART with numeric y ## ## number of trees: 200 ## Prior: ## k: ## degrees of freedom in sigma prior: 3 ## quantile in sigma prior: ## power and base for tree prior: ## use quantiles for rule cut points: 0 Example # Try m=200 trees, the recommended default m=200 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
40 Example Predicted mean response +/ 2s.d. y x
41 ## ## ## Running BART with numeric y ## ## number of trees: 200 ## Prior: ## k: ## degrees of freedom in sigma prior: 3 ## quantile in sigma prior: Example # Try m=200 trees, the recommended default m=200 # And k=1 k=1 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
42 Example Predicted mean response +/ 2s.d. y x
43 ## ## ## Running BART with numeric y ## ## number of trees: 200 Example # Try m=200 trees, the recommended default m=200 # And k=1 k=1 # And nu=3, q=.99 nu=3 q=0.99 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
44 Example Predicted mean response +/ 2s.d. y x
45 ## ## ## Running BART with numeric y ## ## number of trees: 200 Example # Try m=200 trees, the recommended default m=200 # And k=1 k=1 # And nu=2, q=.99 nu=2 q=0.99 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
46 Example Predicted mean response +/ 2s.d. y x
47 ## ## ## Running BART with numeric y ## ## number of trees: 200 Example # Try m=200 trees, the recommended default m=200 # And k=1 k=1 # And nu=1, q=.99 nu=1 q=0.99 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
48 Example Predicted mean response +/ 2s.d. y x
49 ## ## ## Running BART with numeric y Example # Try m=200 trees, the recommended default m=200 # And k=1 k=1 # And nu=1, q=.99 nu=1 q=0.99 # And numcuts=1000 nc=1000 fit=bart(design,y,preds,sigest=shat,sigdf=nu,sigquant=q, k=k,power=beta,base=alpha,ntree=m,numcut=nc, ndpost=n,nskip=burn)
50 Example Predicted mean response +/ 2s.d. y x
51 library(rgl) load("co2plume.dat") plot3d(co2plume) rgl.snapshot("co2a.png") Example
52 Example
53 Example y=co2plume$co2 x=co2plume[,1:2] preds=as.data.frame(expand.grid(seq(0,1,length=20), seq(0,1,length=20))) colnames(preds)=colnames(x) shat=sd(y) # Try m=200 trees, the recommended default m=200 # And k=1 k=1 # And nu=1, q=.99 nu=1 q=0.99 # And numcuts=1000 nc=1000 fit=bart(x,y,preds,sigest=shat,sigdf=nu,sigquant=q,
54 Example plot(fit$sigma,type='l',xlab="iteration", ylab=expression(sigma))
55 Example σ Iteration
56 Example ym=fit$yhat.test.mean ysd=apply(fit$yhat.test,2,sd) persp3d(x=seq(0,1,length=20),y=seq(0,1,length=20),z=matrix( col="grey",xlab="stack_inerts",ylab="time",zlab="co plot3d(co2plume,add=true) rgl.snapshot("co2b.png")
57 Example
58 Example persp3d(x=seq(0,1,length=20),y=seq(0,1,length=20),z=matrix( col="grey",xlab="stack_inerts",ylab="time",zlab="co persp3d(x=seq(0,1,length=20),y=seq(0,1,length=20), z=matrix(ym+2*ysd,20,20),col="green",add=true) persp3d(x=seq(0,1,length=20),y=seq(0,1,length=20), z=matrix(ym-2*ysd,20,20),col="green",add=true) plot3d(co2plume,add=true) rgl.snapshot("co2c.png")
59 Example
Calibration and emulation of TIE-GCM
Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)
More informationParallel Bayesian Additive Regression Trees
Parallel Bayesian Additive Regression Trees M. T. Pratola, H. Chipman, J. Gattiker, D. Higdon, R. McCulloch, and W. Rust August 8, 2012 Abstract Bayesian Additive Regression Trees (BART) is a Bayesian
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationMarkov Chain Monte Carlo (part 1)
Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationSemiparametric Mixed Effecs with Hierarchical DP Mixture
Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationISyE8843A, Brani Vidakovic Handout 14
ISyE8843A, Brani Vidakovic Handout 4 BUGS BUGS is freely available software for constructing Bayesian statistical models and evaluating them using MCMC methodology. BUGS and WINBUGS are distributed freely
More informationBayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri
Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Galin L. Jones 1 School of Statistics University of Minnesota March 2015 1 Joint with Martin Bezener and John Hughes Experiment
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationParticle Gibbs for Bayesian Additive Regression Trees
Particle Gibbs for Bayesian Additive Regression Trees Balai Lakshminarayanan Daniel M. Roy Yee Whye Teh Gatsby Unit Department of Statistical Sciences Department of Statistics University College London
More informationBeviMed Guide. Daniel Greene
BeviMed Guide Daniel Greene 1 Introduction BeviMed [1] is a procedure for evaluating the evidence of association between allele configurations across rare variants, typically within a genomic locus, and
More informationBayesian Modelling with JAGS and R
Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained
More informationA quick introduction to First Bayes
A quick introduction to First Bayes Lawrence Joseph October 1, 2003 1 Introduction This document very briefly reviews the main features of the First Bayes statistical teaching package. For full details,
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationLaboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling
Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling Luca Calatroni Dipartimento di Matematica, Universitá degli studi di Genova May 19, 2016. Luca Calatroni (DIMA, Unige)
More informationA Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data
A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data Seyoung Kim, Padhraic Smyth, and Hal Stern Bren School of Information and Computer Sciences University of California,
More informationPackage beast. March 16, 2018
Type Package Package beast March 16, 2018 Title Bayesian Estimation of Change-Points in the Slope of Multivariate Time-Series Version 1.1 Date 2018-03-16 Author Maintainer Assume that
More informationIntroduction to WinBUGS
Introduction to WinBUGS Joon Jin Song Department of Statistics Texas A&M University Introduction: BUGS BUGS: Bayesian inference Using Gibbs Sampling Bayesian Analysis of Complex Statistical Models using
More informationAn Introduction to Markov Chain Monte Carlo
An Introduction to Markov Chain Monte Carlo Markov Chain Monte Carlo (MCMC) refers to a suite of processes for simulating a posterior distribution based on a random (ie. monte carlo) process. In other
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationBayesian Estimation for Skew Normal Distributions Using Data Augmentation
The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationContent-based image and video analysis. Machine learning
Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all
More informationFUNCTIONAL magnetic resonance imaging (fmri) is
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. X, NO. X, MARCH 2 A Bayesian Mixture Approach to Modeling Spatial Activation Patterns in Multi-site fmri Data Seyoung Kim, Padhraic Smyth, Member, IEEE, and Hal
More informationINTRO TO THE METROPOLIS ALGORITHM
INTRO TO THE METROPOLIS ALGORITHM A famous reliability experiment was performed where n = 23 ball bearings were tested and the number of revolutions were recorded. The observations in ballbearing2.dat
More informationGeneral Factorial Models
In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface
More informationOne Factor Experiments
One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal
More informationPackage SafeBayes. October 20, 2016
Type Package Package SafeBayes October 20, 2016 Title Generalized and Safe-Bayesian Ridge and Lasso Regression Version 1.1 Date 2016-10-17 Depends R (>= 3.1.2), stats Description Functions for Generalized
More informationCurve Sampling and Geometric Conditional Simulation
Curve Sampling and Geometric Conditional Simulation Ayres Fan Joint work with John Fisher, William Wells, Jonathan Kane, and Alan Willsky S S G Massachusetts Institute of Technology September 19, 2007
More informationRecap: The E-M algorithm. Biostatistics 615/815 Lecture 22: Gibbs Sampling. Recap - Local minimization methods
Recap: The E-M algorithm Biostatistics 615/815 Lecture 22: Gibbs Sampling Expectation step (E-step) Given the current estimates of parameters λ (t), calculate the conditional distribution of latent variable
More informationMini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class
Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability
More informationGraphical Models, Bayesian Method, Sampling, and Variational Inference
Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University
More information(Not That) Advanced Hierarchical Models
(Not That) Advanced Hierarchical Models Ben Goodrich StanCon: January 10, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 13 Obligatory Disclosure Ben is an employee of Columbia University,
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationApproximate Bayesian Computation. Alireza Shafaei - April 2016
Approximate Bayesian Computation Alireza Shafaei - April 2016 The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested in. The Problem Given a dataset, we are interested
More informationOutline. Bayesian Data Analysis Hierarchical models. Rat tumor data. Errandum: exercise GCSR 3.11
Outline Bayesian Data Analysis Hierarchical models Helle Sørensen May 15, 2009 Today: More about the rat tumor data: model, derivation of posteriors, the actual computations in R. : a hierarchical normal
More informationChapter 1. Introduction
Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationMonte Carlo Methods and Statistical Computing: My Personal E
Monte Carlo Methods and Statistical Computing: My Personal Experience Department of Mathematics & Statistics Indian Institute of Technology Kanpur November 29, 2014 Outline Preface 1 Preface 2 3 4 5 6
More informationGeneral Factorial Models
In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 1 1 / 31 It is possible to have many factors in a factorial experiment. We saw some three-way factorials earlier in the DDD book (HW 1 with 3 factors:
More informationPackage UnivRNG. R topics documented: January 10, Type Package
Type Package Package UnivRNG January 10, 2018 Title Univariate Pseudo-Random Number Generation Version 1.1 Date 2018-01-10 Author Hakan Demirtas, Rawan Allozi Maintainer Rawan Allozi
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationDigital Image Processing Laboratory: MAP Image Restoration
Purdue University: Digital Image Processing Laboratories 1 Digital Image Processing Laboratory: MAP Image Restoration October, 015 1 Introduction This laboratory explores the use of maximum a posteriori
More informationNested Sampling: Introduction and Implementation
UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ
More informationIntroduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard 1 Motivation Recall: Discrete filter Discretize the continuous state space High memory complexity
More informationQuantitative Biology II!
Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationMoving Beyond Linearity
Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationNo more questions will be added
CSC 2545, Spring 2017 Kernel Methods and Support Vector Machines Assignment 2 Due at the start of class, at 2:10pm, Thurs March 23. No late assignments will be accepted. The material you hand in should
More informationwinbugs and openbugs
Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 04/19/2017 Bayesian estimation software Several stand-alone applications and add-ons to estimate Bayesian models Stand-alone applications:
More informationBayesian model selection and diagnostics
Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider
More informationADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION
ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of
More informationMarkov Random Fields and Segmentation with Graph Cuts
Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out
More informationDPP: Reference documentation. version Luis M. Avila, Mike R. May and Jeffrey Ross-Ibarra 17th November 2017
DPP: Reference documentation version 0.1.2 Luis M. Avila, Mike R. May and Jeffrey Ross-Ibarra 17th November 2017 1 Contents 1 DPP: Introduction 3 2 DPP: Classes and methods 3 2.1 Class: NormalModel.........................
More informationApproximate Bayesian Computation using Auxiliary Models
Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi, Malcolm Faddy Queensland University of Technology Brisbane MCQMC February 2012 Tony Pettitt () ABC using
More informationStat 342 Exam 3 Fall 2014
Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationRJaCGH, a package for analysis of
RJaCGH, a package for analysis of CGH arrays with Reversible Jump MCMC 1. CGH Arrays: Biological problem: Changes in number of DNA copies are associated to cancer activity. Microarray technology: Oscar
More informationDetecting Piecewise Linear Networks Using Reversible Jump Markov Chain Monte Carlo
Clemson University TigerPrints All Theses Theses 8-2010 Detecting Piecewise Linear Networks Using Reversible Jump Markov Chain Monte Carlo Akshay Apte Clemson University, apte.aa@gmail.com Follow this
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationDealing with Categorical Data Types in a Designed Experiment
Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationPREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS
PREDICTION AND CALIBRATION USING OUTPUTS FROM MULTIPLE COMPUTER SIMULATORS by Joslin Goh M.Sc., Simon Fraser University, 29 B.Math., University of Waterloo, 27 a thesis submitted in partial fulfillment
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationA GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM
A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationStat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution
Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal
More informationManual for the R gaga package
Manual for the R gaga package David Rossell Department of Bioinformatics & Biostatistics IRB Barcelona Barcelona, Spain. rosselldavid@gmail.com 1 Introduction Newton et al. [2001] and Kendziorski et al.
More informationPackage bayescl. April 14, 2017
Package bayescl April 14, 2017 Version 0.0.1 Date 2017-04-10 Title Bayesian Inference on a GPU using OpenCL Author Rok Cesnovar, Erik Strumbelj Maintainer Rok Cesnovar Description
More informationApproximate Bayesian Computation methods and their applications for hierarchical statistical models. University College London, 2015
Approximate Bayesian Computation methods and their applications for hierarchical statistical models University College London, 2015 Contents 1. Introduction 2. ABC methods 3. Hierarchical models 4. Application
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationPackage treethresh. R topics documented: June 30, Version Date
Version 0.1-11 Date 2017-06-29 Package treethresh June 30, 2017 Title Methods for Tree-Based Local Adaptive Thresholding Author Ludger Evers and Tim Heaton
More informationINLA: an introduction
INLA: an introduction Håvard Rue 1 Norwegian University of Science and Technology Trondheim, Norway May 2009 1 Joint work with S.Martino (Trondheim) and N.Chopin (Paris) Latent Gaussian models Background
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More informationParallel Adaptive Importance Sampling. Cotter, Colin and Cotter, Simon and Russell, Paul. MIMS EPrint:
Parallel Adaptive Importance Sampling Cotter, Colin and Cotter, Simon and Russell, Paul 2015 MIMS EPrint: 2015.82 Manchester Institute for Mathematical Sciences School of Mathematics The University of
More informationBagging for One-Class Learning
Bagging for One-Class Learning David Kamm December 13, 2008 1 Introduction Consider the following outlier detection problem: suppose you are given an unlabeled data set and make the assumptions that one
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationBayesian data analysis using R
Bayesian data analysis using R BAYESIAN DATA ANALYSIS USING R Jouni Kerman, Samantha Cook, and Andrew Gelman Introduction Bayesian data analysis includes but is not limited to Bayesian inference (Gelman
More informationMachine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 13W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence a) [1 P] For the probability distribution P (A, B, C, D) with the factorization P (A, B,
More informationMissing Data and Imputation
Missing Data and Imputation Hoff Chapter 7, GH Chapter 25 April 21, 2017 Bednets and Malaria Y:presence or absence of parasites in a blood smear AGE: age of child BEDNET: bed net use (exposure) GREEN:greenness
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationA stochastic approach of Residual Move Out Analysis in seismic data processing
A stochastic approach of Residual ove Out Analysis in seismic data processing JOHNG-AY T.,, BORDES L., DOSSOU-GBÉTÉ S. and LANDA E. Laboratoire de athématique et leurs Applications PAU Applied Geophysical
More informationFiltering Images. Contents
Image Processing and Data Visualization with MATLAB Filtering Images Hansrudi Noser June 8-9, 010 UZH, Multimedia and Robotics Summer School Noise Smoothing Filters Sigmoid Filters Gradient Filters Contents
More informationThe Multi Stage Gibbs Sampling: Data Augmentation Dutch Example
The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 8 1 Example: Data augmentation / Auxiliary variables A commonly-used
More informationMRF-based Algorithms for Segmentation of SAR Images
This paper originally appeared in the Proceedings of the 998 International Conference on Image Processing, v. 3, pp. 770-774, IEEE, Chicago, (998) MRF-based Algorithms for Segmentation of SAR Images Robert
More informationProbabilistic Robotics
Probabilistic Robotics Discrete Filters and Particle Filters Models Some slides adopted from: Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras and Probabilistic Robotics Book SA-1 Probabilistic
More information