Outline. Bayesian Data Analysis Hierarchical models. Rat tumor data. Errandum: exercise GCSR 3.11

Size: px

Start display at page:

Download "Outline. Bayesian Data Analysis Hierarchical models. Rat tumor data. Errandum: exercise GCSR 3.11"

Clement Lucas
5 years ago
Views:

1 Outline Bayesian Data Analysis Hierarchical models Helle Sørensen May 15, 2009 Today: More about the rat tumor data: model, derivation of posteriors, the actual computations in R. : a hierarchical normal model, derivation of posteriors. The actual comutations will be an exercise next week. Future: Chapter 10 11: MCMC simulation. Tuesday May 26: Irene Mantzouni. Be there! Bayes (Lecture 6) Hierarchical models 1 / 22 Bayes (Lecture 6) Hierarchical models 2 / 22 Errandum: exercise GCSR 3.11 Rat tumor data Remember the problems with in exercise 3.11 (bioassay data)? Posteriors Posteriors for α: Normal prior: 0.96 (-0.71, 2.82) Uniform prior: 1.32 (-0.57,3.80) As we would expect from the figure. More information in normal prior. Results not very robust (small dataset)! GCSR, page and Interested in θ the probability of tumor in a population of female lab rats of type F344 that receice a zero dose of a drug. Data: 71 experiments with female type F344 rats In j th study: n j rats of which y j developed a tumor NB.: In Section 5.1, data only consists of experiment 71 only, the rest are used as prior information. I find the section confusing and non-illuminating Bayes (Lecture 6) Hierarchical models 3 / 22 Bayes (Lecture 6) Hierarchical models 4 / 22

2 Rat tumor data: the hierarchical model Rat tumor data Three levels: The data: conditionally on θ 1,...,θ the data y 1,...,y are independent with y j p(y j θ j ). The random effects: conditionally of φ the θ s are iid. with θ j p(θ j φ). Recall: θ j s are not observed. The (hyper)prior distribution p(φ) Here: φ = (α,β) and y j θ j Bin(n j,θ j ); θ j Beta(α,β); p(α,β) = (α + β) 5/2. Note: for now, the conditional dist n of y j given θ j depends on θ j only (no covariates). The strategy will be the following. 0. Set up the model: p(y θ) and p(θ α,β). Note: the hyperprior p(α,β) is not yet specified. Enters as p(α,β) in oint posterior of (α,β,θ) 2. Posterior of θ for fixed α and β 3. Posterior of (α,β) Finally comes the actual prior and the actual computations: 4. Specify a hyperprior p(α,β) 5. Contours of posterior p(α,β y) 6. Posterior simulation of (α,β) 7. Posterior simulation of θ Bayes (Lecture 6) Hierarchical models 5 / 22 Bayes (Lecture 6) Hierarchical models 6 / Model 1 3. Posteriors 1. oint posterior of (α,β,θ): Recall the three levels: Given θ the y j s are independent and y j θ j Bin(n j,θ j ). Given (α,β) the θ j s are independent and θ j α,β Beta(α,β). Hyperprior (α,β) p(α,β) 1 p(α,β,θ y) p(α,β) B(α,β) n θ α+y j 1 j (1 θ j ) β +n j y j Posterior dist n of θ for fixed values of (α,β): independent and 3. Posterior of (α,β): p(α,β y) = θ j α,β,y Beta(α + y j,β + n j y j ) p(α,β,θ y)dθ 1 = p(α,β) B(α,β) n Beta(α + y j,β + n j y j ) Bayes (Lecture 6) Hierarchical models 7 / 22 Bayes (Lecture 6) Hierarchical models 8 / 22

3 4. Hyperprior distribution 4. Hyperprior distribution Recall: θ j α,β Beta(α,β) so E(θ j α,β) = α α+β. What is a reasonable prior distibution of (α,β)? Three different useful parameterizations: (α,β) ( ) (ξ 1,ξ 2 ) = α α+β,(α + β) 1/2 ( ) ) (η 1,η 2 ) = log( α β,log(α + β) Second try: flat in (ξ 1,ξ 2 ), ie. p(ξ 1,ξ 2 ) 1. What does this correspond to in the other parameterizations? By the transformation theorem multiply with the correct acobian we find that p(ξ 1,ξ 2 ) 1 corresponds to p(α,β) (α + β) 5/2 p(η 1,η 2 ) αβ(α + β) 5/2 First try: flat in (η 1,η 2 ), ie. p(η 1,η 2 ) 1. Problem: this yields an improper posterior. Bayes (Lecture 6) Hierarchical models 9 / 22 Bayes (Lecture 6) Hierarchical models 10 / Contour of posterior for (α,β) 6. Posterior simulation of (α,β) Want contour plot of the posterior of p(η 1,η 2 y). Strategy: Recall what is p(α,β y). Multiply with αβ due to the η-parameterization Need the density as a function of η rather than (α,β). Compute log p(η 1,η 2 y) and then p(η 1,η 2 y) on η-grid Since we now have (unnormalized) posterior density p(η 1,η 2 y) we can apply the grid simulation strategy : Simulate η 1 from its marginal posterior p(η 1 y) on grid-values Simulate η 2 from its marginal conditional p(η 2 y,eta 1 ) given η 1 on grid-values Add random jitter to get a continuous distribution Transform the simulated values of (η 1,η 2 ) to (α,β). Bayes (Lecture 6) Hierarchical models 11 / 22 Bayes (Lecture 6) Hierarchical models 12 / 22

4 7. Posterior simulation of θ Recall: conditional on (α,β,y) the θ s are independent with p(θ j α,β,y) Beta(α + y j,β + n j y j ). Hence, for each (α,β)-draw, draw θ 1,...,θ 71 according the relevant Beta-distribution. Plot of posterior means E(θ j y) or medians against y j /n j (page 131) shows that extreme values of y j /n j are shrunk towards the pooled mean Compromise between the complete pooling and the separate analyses. GCSR Section 5.5, page Model and analysis method described in Section schools Treatment is preparation for SAT-V test. Two groups of students at each school: treated and controls For each student: result of test Data: estimated treatment effect from each school (y 1,...,y 8 ) and corresponding SE s (σ 1,...,σ 8 ) Think of y j as average over treated students minus average over control students (although estimated from a regression model). The σ j are considered known (although estimated from student data) Interested in the treatment effect (θ) Bayes (Lecture 6) Hierarchical models 13 / 22 Bayes (Lecture 6) Hierarchical models 14 / 22 : complete pooling and separate analyses : hierarchical model Complete pooling: same θ for all schools, y j θ N(θ,σ 2 j ) MLE would be weighted average of y 1,...,y 8 Bayesian analysis would combine this average and a prior for θ Completely separate analysis: different θ s in all schools, y j θ N(θ j,σ 2 j ) Data from each shool estimated separately MLE s would be the observations themselves, ˆθ j = y j Bayesian analysis would combine data y j with a prior for θ j In the hierarchical model the schools has different treatment effects but they are drawn from the same distribtion. Three levels: Conditional on θ: y 1,...,y 8 are independent and θ y j θ N(θ j,σj 2 ). In particular, the distribution of y j depends on θ only via θ j. Conditional on hyperprior (µ,τ): θ 1,...,θ 8 are independent and θ j N(µ,τ 2 ). Hyperpriors: (µ,τ) p(µ,τ) = p(τ)p(µ τ). Will assume that p(µ τ) 1 and later also that p(τ) 1. Drawing of model structure similar to that on page 119. NB. Likelihood analysis of the mixed model without the hyperprior level would yield a negative estimate of τ 2. Unpleasant... Bayes (Lecture 6) Hierarchical models 15 / 22 Bayes (Lecture 6) Hierarchical models 16 / 22

5 : analysis 0. Model sepcification: that s what we just did. 1. Posterior of random effects and hyperparameters: p(θ, µ,τ y) p(µ,τ) N(θ j µ,τ 2 ) 2. Posterior of random effects given hyperparameters: As a function of θ for fixed (µ,τ): [ p(θ µ,τ,y) = e 1 2τ 2 (θ j µ) 2 e 1 2σ 2 j N(y j θ j,σ 2 j ) (y j θ j ) 2] Look at each term separately and recognize the simple model with normal prior and normal prior. Hence p(θ j µ,τ,y) = N(ˆθ j,v 2 j ) where 3. Posterior of hyperparameter, p(µ,τ 2 y) p(y µ,τ)p(µ,τ). In this particular case we can find p(y µ,τ) explicitly: conditional on (µ,τ) the y j s are independent and Why is that so? Hence, p(y j µ,τ) = N(µ,σ 2 j + τ 2 ) p(µ,τ y) = p(µ,τ) N(µ,σ 2 j + τ 2 ) ˆθ j = y j/σj 2 + µ/τ 2 1/σj 2 + 1/τ ; 1 = 1 2 V j j σ τ 2 Bayes (Lecture 6) Hierarchical models 17 / 22 Bayes (Lecture 6) Hierarchical models 18 / 22 3.a Posterior of µ given τ, p(µ τ,y): Assume that p(µ τ) 1 so that p(µ,τ) = p(τ). As a function if µ for fixed τ: p(µ τ,y) ( ) 1 exp 2σj 2 + τ (y 2 j µ) 2 Recognize for each term the posterior for the normal model with known variance σ 2 j + τ 2 and uniform prior. Hence p(µ τ,y) = N(ˆµ,V µ ) where ˆµ = 1 y σj 2 j +τ2 1 ; σj 2+τ2 1 = V µ 1 σj 2 + τ 2 3.b Posterior of τ: p(µ,τ y) = p(µ τ,y)p(τ y) so p(τ y) = p(µ,τ y) p(µ τ,y) = p(ˆµ,τ y) p(ˆµ τ,y) = p(τ) N(µ,σj 2 + τ 2 ) 1/ V µ Note that ˆµ and V µ are functions of τ. 4. Hyperprior distribution. Already specified p(µ τ) 1. Moreover, let p(τ) 1. NB. p(logτ) 1 leads to improper posterior try it out yourself! A bit confusion in the book regarding standard deviation (τ) and variance (τ 2 ). Bayes (Lecture 6) Hierarchical models 19 / 22 Bayes (Lecture 6) Hierarchical models 20 / 22

6 Computations and conclusions 5. Contours of p(µ,τ y). 6. Posterior simulation of (µ, τ). Two steps: Simulate τ from p(τ y) by a grid approximation Simulate µ from p(µ τ,y) for the simluted τ-values 7. Posterior simulation of θ: Simulate each θ j separately from p(θ j µ,τ,y) for the simulated values of (µ, τ). You will do the actual computations in an exercise next week, so just a few comments at this point. Figure 5.6 in the book: which analysis does τ = 0 correspond to? Which analysis does a large τ correspond to? Table 5.2 and 5.3: compare y j to posterior means of θ j. What has happened? What would you do to see if there is a treatment effect? Bayes (Lecture 6) Hierarchical models 21 / 22 Bayes (Lecture 6) Hierarchical models 22 / 22

Bayesian Methods. David Rosenberg. April 11, New York University. David Rosenberg (New York University) DS-GA 1003 April 11, / 19

Bayesian Methods. David Rosenberg. April 11, New York University. David Rosenberg (New York University) DS-GA 1003 April 11, / 19 Bayesian Methods David Rosenberg New York University April 11, 2017 David Rosenberg (New York University) DS-GA 1003 April 11, 2017 1 / 19 Classical Statistics Classical Statistics David Rosenberg (New