8.3 simulating from the fitted model Chris Parrish July 3, 2016

Size: px
Start display at page:

Download "8.3 simulating from the fitted model Chris Parrish July 3, 2016"

Transcription

1 8. simulating from the fitted model Chris Parrish July, 6 Contents speed of light (Simon Newcomb, 88) simulate data, fit the model, and check the coverage of the conf intervals model fit post create the replicated data create fake data figure roaches 7 data model fit post figure model fit post switch to glm figure simulating from the fitted model reference: - ARM chapter 8, github library(arm) # for sim library(rstan) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectcores()) library(ggplot) library(reshape) # for melt speed of light (Simon Newcomb, 88) algorithm for model checking : compare distribution of data y to distributions of simulated ỹ. data : y. use data to fit model : y with parameters β and σ find beta and sigma such that y = Xβ + ɛ. use model to generate n hypothetical values of ỹ ỹ = Xβ + ɛ

2 . replicate step n.sims times result is a matrix with dims n.sims, n 5. compare the distribution of the real data y with the distributions of the simulated ỹ check the model : if the distributions of the simulated ỹ do not correspond to the distribution of the original data y, then the model is suspect simulate data, fit the model, and check the coverage of the conf intervals source("lightspeed.data.r", echo = TRUE) ## ## > "y" <- c(8, 6,,,, -, 7, 6,, -, ## + 9,,,, 5,,, 9,, 9,,, 6,, 6, ## + 8, 5,, 8, 9, 7,... [TRUNCATED] ## ## > "N" <- 66 str(y) ## num [:66] model lightspeed.stan data { int<lower=> N; vector[n] y; parameters { vector[] beta; real<lower=> sigma; model { y ~ normal(beta[], sigma); fit ## Model fit (lightspeed.stan) ## lm (y ~ ) datalist. <- c("n","y") lightspeed.sf <- stan(file='lightspeed.stan', data=datalist., iter=, chains=) plot(lightspeed.sf) ## ci_level:.8 (8% intervals) ## outer_level:.95 (95% intervals)

3 beta[] sigma 5 5 pairs(lightspeed.sf) 6 beta[] 6 6 sigma lp print(lightspeed.sf) ## Inference for Stan model: lightspeed. ## chains, each with iter=; warmup=5; thin=; ## post-warmup draws per chain=5, total post-warmup draws=. ## ## mean se_mean sd.5% 5% 5% 75% 97.5% n_eff ## beta[] ## sigma ## lp ## Rhat ## beta[] ## sigma ## lp ## ## Samples were drawn using NUTS(diag_e) at Fri Jul 8 :9: 6. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=). ## The estimated Bayesian Fraction of Missing Information is a measure of ## the efficiency of the sampler with values close to being ideal. ## For each chain, these estimates are

4 ## post post <- extract(lightspeed.sf) str(post) ## List of ## $ beta : num [:, ] ##..- attr(*, "dimnames")=list of ##....$ iterations: NULL ##....$ : NULL ## $ sigma: num [:(d)] ##..- attr(*, "dimnames")=list of ##....$ iterations: NULL ## $ lp : num [:(d)] ##..- attr(*, "dimnames")=list of ##....$ iterations: NULL create the replicated data ## Create the replicated data n.sims <- create fake data ## Create fake data n <- 5 y.rep <- array (NA, c(n.sims, n)) for (s in :n.sims){ y.rep[s,] <- rnorm (n, post$beta[s], post$sigma[s]) str(y.rep) ## num [:, :5] ## Histogram of replicated data (Figure 8.) y.new <- melt(y.rep) y.new$var <- factor(y.new$var, levels=c('','','','','5','6','7','8','9','','','','','','5'), labels=c('replication #','Replication #','Replication #','Replication #', 'Replication #5','Replication #6','Replication #7','Replication #8', 'Replication #9','Replication #','Replication #','Replication #', 'Replication #','Replication #','Replication #5')) str(y.new) ## 'data.frame': 5 obs. of variables: ## $ Var : int ## $ Var : Factor w/ 5 levels "Replication #",..:... ## $ value: num

5 p <- ggplot(y.new, aes(value)) + geom_histogram(colour = "seashell", fill = "wheat", binwidth=5) + theme_gray() + facet_wrap( ~ Var, ncol=5) + theme(axis.title.y = element_blank(), axis.title.x=element_blank()) print(p) Replication # Replication # Replication # Replication # Replication #5 5 5 Replication #6 Replication #7 Replication #8 Replication #9 Replication # 5 5 Replication # Replication # Replication # Replication # Replication # ## Write a function to make histograms with specified bin widths and ranges Hist.preset <- function (a, width, xtitle,ytitle,maintitle){ # dev.new() a.hi <- max (a, na.rm=true) a.lo <- min (a, na.rm=true) if (is.null(width)) width <- min (sqrt(a.hi-a.lo), e-5) bin.hi <- width*ceiling(a.hi/width) bin.lo <- width*floor(a.lo/width) frame = data.frame(x=a) p <- ggplot(frame,aes(x=x)) + geom_histogram(colour = "seashell", fill = "wheat", binwidth=width) + 5

6 theme_gray() + scale_x_continuous(xtitle) + scale_y_continuous(ytitle) + labs(title=maintitle) print(p) ## Run the function for (s in :){ Hist.preset (y.rep[s,], width=5, "","",paste("replication #",s,sep="")) Replication # 6

7 Replication # 5 Replication # 7

8 Replication # 5 Replication #5 8

9 Replication #6 6 Replication #7 6 9

10 Replication #8 Replication #9

11 Replication # Replication #

12 Replication # 5 6 Replication #

13 Replication # 5 Replication #5 5

14 Replication #6 5 5 Replication #7 5

15 Replication #8 5 Replication #9 5

16 Replication # ## Numerical test Test <- function (y){ min (y) test.rep <- rep (NA, n.sims) for (s in :n.sims){ test.rep[s] <- Test (y.rep[s,]) str(test.rep) ## num [:] figure 8.5 ## Histogram Figure 8.5 # dev.new() frame = data.frame(x = test.rep) frame <- data.frame(x = Test(y)) p <- ggplot(frame, aes(x = x)) + geom_histogram(colour = "seashell", fill = "wheat") + geom_segment(aes(x = x, y =, xend = x, yend =, color = "saddlebrown"), data = frame) + theme_gray() + theme(legend.position="none") + labs(title="observed T(y) and distribution of T(y.rep)") print(p) 6

17 Observed T(y) and distribution of T(y.rep) 5 count x roaches data ############################################################################## ## Read the cleaned data # All data are at # if bad initial values, this model fails # NOTE: can't find same exact data set as ARM book uses.. roachdata <- read.csv ("roachdata.csv") str(roachdata) ## 'data.frame': 6 obs. of 6 variables: ## $ X : int ## $ y : int ## $ roach : num ## $ treatment: int... ## $ senior : int... ## $ exposure: num attach(roachdata) ## The following object is masked _by_.globalenv: ## ## y 7

18 model roaches.stan data { int<lower=> N; vector[n] exposure; vector[n] roach; vector[n] senior; vector[n] treatment; int y[n]; transformed data { vector[n] log_expo; log_expo = log(exposure); parameters { vector[] beta; model { y ~ poisson_log(log_expo + beta[] + beta[] * roach + beta[] * treatment + beta[] * senior); fit datalist. <- list(n=length(roachdata$y), y=roachdata$y,roach=roachdata$roach, treatment=roachdata$treatment,exposure=roachdata$exposure, senior=roachdata$senior) roaches.sf <- stan(file='roaches.stan', data=datalist., iter=5, chains=) print(roaches.sf) ## Inference for Stan model: roaches. ## chains, each with iter=5; warmup=5; thin=; ## post-warmup draws per chain=5, total post-warmup draws=. ## ## mean se_mean sd.5% 5% 5% 75% ## beta[] ## beta[] ## beta[] ## beta[] ## lp ## 97.5% n_eff Rhat ## beta[]..9 ## beta[]..7 ## beta[] -.. ## beta[] ## lp ## ## Samples were drawn using NUTS(diag_e) at Fri Jul 8 :5:6 6. 8

19 ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=). ## The estimated Bayesian Fraction of Missing Information is a measure of ## the efficiency of the sampler with values close to being ideal. ## For each chain, these estimates are ##.9 post post <- extract(roaches.sf) ## Comparing the data to a replicated dataset n <- length(roachdata$y) X <- cbind (rep(,n), roach, treatment, senior) y.hat <- exposure * exp (X %*% colmeans(post$beta)) y.rep <- rpois (n, y.hat) print (mean (roachdata$y==)) ## [] print (mean (y.rep==)) ## [] ## Comparing the data to replicated datasets n.sims <- y.rep <- array (NA, c(n.sims, n)) for (s in :n.sims){ y.hat <- exposure * exp (X %*% post$beta[s,]) y.rep[s,] <- rpois (n, y.hat) # test statistic Test <- function (y){ mean (y==) test.rep <- rep (NA, n.sims) for (s in :n.sims){ test.rep[s] <- Test (y.rep[s,]) # p-value print (mean (test.rep > Test(roachdata$y))) ## [] figure ## Histogram Figure # dev.new() frame = data.frame(x = test.rep) frame5 = data.frame(x = Test(roachdata$y)) 9

20 p <- ggplot(frame, aes(x=x)) + geom_histogram(colour = "seashell", fill = "wheat") + geom_segment(aes(x = x, y =, xend = x, yend = 5, color = "saddlebrown"), data = frame5) + theme_gray() + theme(legend.position="none") + labs(title="observed T(y) and distribution of T(y.rep)") print(p) ## `stat_bin()` using `bins = `. Pick better value with `binwidth`. Observed T(y) and distribution of T(y.rep) 75 count x T(y) =.6, but all the values of test.rep are much smaller. summary(test.rep) ## Min. st Qu. Median Mean rd Qu. Max. ## model roaches_overdispersion.stan data { int<lower=> N; vector[n] exposure; vector[n] roach; vector[n] senior; vector[n] treatment;

21 int y[n]; transformed data { vector[n] log_expo; log_expo = log(exposure); parameters { vector[] beta; vector[n] lambda; real<lower=> tau; transformed parameters { real<lower=> sigma; sigma =. / sqrt(tau); model { tau ~ gamma(.,.); for (i in :N) { lambda[i] ~ normal(, sigma); y[i] ~ poisson_log(lambda[i] + log_expo[i] + beta[] + beta[]*roach[i] + beta[]*senior[i] + beta[]*treatment[i]); fit ## Checking the overdispersed model # NOTE: can't find same exact data set as ARM book uses.. roaches_overdispersion.sf <- stan(file='roaches_overdispersion.stan', data=datalist., iter=, chains=) # print(roaches_overdispersion.sf) post post <- extract(roaches_overdispersion.sf) switch to glm glm. <- glm(y ~ roach + treatment + senior, data = roachdata, family=quasipoisson, offset=log(exposure)) sim. <- sim(glm., n.sims) # replicated datasets y.rep <- array (NA, c(n.sims, n)) overdisp <- summary(glm.)$dispersion

22 for (s in :n.sims){ y.hat <- exposure * exp (X %*% sim.@coef[s,]) a <- y.hat/(overdisp-) # using R's parametrization for the y.rep[s,] <- rnegbin (n, y.hat, a) # negative binomial distribution test.rep <- rep (NA, n.sims) for (s in :n.sims){ test.rep[s] <- Test (y.rep[s,]) compare each value of test.rep with the number Test(roachdata$y) # p-value summary(test.rep) ## Min. st Qu. Median Mean rd Qu. Max. ## print (mean (test.rep > Test(roachdata$y))) ## [].68 Test(roachdata$y) ## [] figure ## Histogram Figure # dev.new() frame = data.frame(x = test.rep) frame5 = data.frame(x = Test(roachdata$y)) p5 <- ggplot(frame, aes(x=x)) + geom_histogram(colour = "seashell", fill = "wheat") + geom_segment(aes(x = x, y =, xend = x, yend =, color = "saddlebrown"), data = frame5) + theme_gray() + theme(legend.position="none") + labs(title="observed T(y) and distribution of T(y.rep)") print(p5) ## `stat_bin()` using `bins = `. Pick better value with `binwidth`.

23 Observed T(y) and distribution of T(y.rep) 75 count x

8.1 fake data simulation Chris Parrish July 3, 2016

8.1 fake data simulation Chris Parrish July 3, 2016 8.1 fake data simulation Chris Parrish July 3, 2016 Contents fake-data simulation 1 simulate data, fit the model, and check the coverage of the conf intervals............... 1 model....................................................

More information

GLM Poisson Chris Parrish August 18, 2016

GLM Poisson Chris Parrish August 18, 2016 GLM Poisson Chris Parrish August 18, 2016 Contents 3. Introduction to the generalized linear model (GLM) 1 3.3. Poisson GLM in R and WinBUGS for modeling time series of counts 1 3.3.1. Generation and analysis

More information

Bayesian Workflow. How to structure the process of your analysis to maximise [sic] the odds that you build useful models.

Bayesian Workflow. How to structure the process of your analysis to maximise [sic] the odds that you build useful models. Bayesian Workflow How to structure the process of your analysis to maximise [sic] the odds that you build useful models. -Jim Savage Sean Talts Core Stan Developer Bayesian Workflow Scope out your problem

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Exercises R For Simulations Columbia University EPIC 2015 (no answers)

Exercises R For Simulations Columbia University EPIC 2015 (no answers) Exercises R For Simulations Columbia University EPIC 2015 (no answers) C DiMaggio June 10, 2015 Contents 1 Sampling and Simulations 2 2 Drawing Statistical Inferences on a Continuous Variable 2 2.1 Simulations

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

(Not That) Advanced Hierarchical Models

(Not That) Advanced Hierarchical Models (Not That) Advanced Hierarchical Models Ben Goodrich StanCon: January 10, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 13 Obligatory Disclosure Ben is an employee of Columbia University,

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Old Faithful Chris Parrish

Old Faithful Chris Parrish Old Faithful Chris Parrish 17-4-27 Contents Old Faithful eruptions 1 data.................................................. 1 duration................................................ 1 waiting time..............................................

More information

Informative Priors for Regularization in Bayesian Predictive Modeling

Informative Priors for Regularization in Bayesian Predictive Modeling Informative Priors for Regularization in Bayesian Predictive Modeling Kyle M. Lang Institute for Measurement, Methodology, Analysis & Policy Texas Tech University Lubbock, TX November 23, 2016 Outline

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

Simulation and resampling analysis in R

Simulation and resampling analysis in R Simulation and resampling analysis in R Author: Nicholas G Reich, Jeff Goldsmith, Andrea S Foulkes, Gregory Matthews This material is part of the statsteachr project Made available under the Creative Commons

More information

brms: An R Package for Bayesian Multilevel Models using Stan

brms: An R Package for Bayesian Multilevel Models using Stan brms: An R Package for Bayesian Multilevel Models using Stan Paul Bürkner Institut für Psychologie Westfälische Wilhelms-Universität Münster 26.02.2016 Agenda Agenda 1 Short introduction to Stan 2 The

More information

Introduction to Python 2

Introduction to Python 2 Introduction to Python 2 Chang Y. Chung Office of Population Research 01/14/2014 Algorithms + Data Structures = Programs Niklaus Wirth (1976)[3] 1 / 36 Algorithms + Data Structures = Programs Niklaus Wirth

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

Package citools. October 20, 2018

Package citools. October 20, 2018 Type Package Package citools October 20, 2018 Title Confidence or Prediction Intervals, Quantiles, and Probabilities for Statistical Models Version 0.5.0 Maintainer John Haman Functions

More information

A brief introduction to econometrics in Stan James Savage

A brief introduction to econometrics in Stan James Savage A brief introduction to econometrics in Stan James Savage 2017-04-30 2 Contents About 5 The structure................................................ 6 1 Modern Statistical Workflow 7 1.1 Modern Statistical

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Package ggmcmc. August 29, 2016

Package ggmcmc. August 29, 2016 Package ggmcmc August 29, 2016 Title Tools for Analyzing MCMC Simulations from Bayesian Inference Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for

More information

=!"#$%!! 2! 2 (1 +!! )! (1 +!! )!! 2!! 2!!

=!#$%!! 2! 2 (1 +!! )! (1 +!! )!! 2!! 2!! MCEM is not a good choice when you are able to comput a close form of pdf/cdf. What's worse? If one only knows the kernel of pdf, things getts very very boring. For example, assmume!!!"#!!,!!!! =!"#$%!!

More information

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives Introduction to R and R-Studio 2018-19 Toy Program #2 Basic Descriptives Summary The goal of this toy program is to give you a boiler for working with your own excel data. So, I m hoping you ll try!. In

More information

CITS4009 Introduction to Data Science

CITS4009 Introduction to Data Science School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data

More information

PSS718 - Data Mining

PSS718 - Data Mining Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

Canopy Light: Synthesizing multiple data sources

Canopy Light: Synthesizing multiple data sources Canopy Light: Synthesizing multiple data sources Tree growth depends upon light (previous example, lab 7) Hard to measure how much light an ADULT tree receives Multiple sources of proxy data Exposed Canopy

More information

How to Use (R)Stan to Estimate Models in External R Packages. Ben Goodrich of Columbia University

How to Use (R)Stan to Estimate Models in External R Packages. Ben Goodrich of Columbia University How to Use (R)Stan to Estimate Models in External R Packages Ben Goodrich of Columbia University (benjamin.goodrich@columbia.edu) July 6, 2017 Obligatory Disclosure Ben is an employee of Columbia University,

More information

Reading and wri+ng data

Reading and wri+ng data An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look

More information

BAYESIAN OUTPUT ANALYSIS PROGRAM (BOA) VERSION 1.0 USER S MANUAL

BAYESIAN OUTPUT ANALYSIS PROGRAM (BOA) VERSION 1.0 USER S MANUAL BAYESIAN OUTPUT ANALYSIS PROGRAM (BOA) VERSION 1.0 USER S MANUAL Brian J. Smith January 8, 2003 Contents 1 Getting Started 4 1.1 Hardware/Software Requirements.................... 4 1.2 Obtaining BOA..............................

More information

CS&s/STAT 566 Class Lab 3 January 22, 2016

CS&s/STAT 566 Class Lab 3 January 22, 2016 CS&s/STAT 566 Class Lab 3 January 22, 2016 (1) Fisher s randomization test; continuous response rm(list=ls()) #data trt

More information

The Bolstad Package. July 9, 2007

The Bolstad Package. July 9, 2007 The Bolstad Package July 9, 2007 Version 0.2-12 Date 2007-09-07 Title Bolstad functions Author James Curran Maintainer James M. Curran A set of

More information

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR

NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR NONPARAMETRIC REGRESSION SPLINES FOR GENERALIZED LINEAR MODELS IN THE PRESENCE OF MEASUREMENT ERROR J. D. Maca July 1, 1997 Abstract The purpose of this manual is to demonstrate the usage of software for

More information

Bayesian data analysis using R

Bayesian data analysis using R Bayesian data analysis using R BAYESIAN DATA ANALYSIS USING R Jouni Kerman, Samantha Cook, and Andrew Gelman Introduction Bayesian data analysis includes but is not limited to Bayesian inference (Gelman

More information

Text Mining with R: Building a Text Classifier

Text Mining with R: Building a Text Classifier Martin Schweinberger July 28, 2016 This post 1 will exemplify how to create a text classifier with R, i.e. it will implement a machine-learning algorithm, which classifies texts as being either a speech

More information

The rv Package. R topics documented: November 17, Title Simulation-based random variable object class in R. Version

The rv Package. R topics documented: November 17, Title Simulation-based random variable object class in R. Version The rv Package November 17, 2005 Title Simulation-based random variable object class in R Version 0.911 Date 2005/11/17 Author Jouni Kerman Maintainer Jouni Kerman

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Package nullabor. February 20, 2015

Package nullabor. February 20, 2015 Version 0.3.1 Package nullabor February 20, 2015 Tools for visual inference. Generate null data sets and null plots using permutation and simulation. Calculate distance metrics for a lineup, and examine

More information

Notes for week 3. Ben Bolker September 26, Linear models: review

Notes for week 3. Ben Bolker September 26, Linear models: review Notes for week 3 Ben Bolker September 26, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

Using the package glmbfp: a binary regression example.

Using the package glmbfp: a binary regression example. Using the package glmbfp: a binary regression example. Daniel Sabanés Bové 3rd August 2017 This short vignette shall introduce into the usage of the package glmbfp. For more information on the methodology,

More information

Ben Baumer Instructor

Ben Baumer Instructor MULTIPLE AND LOGISTIC REGRESSION What is logistic regression? Ben Baumer Instructor A categorical response variable ggplot(data = hearttr, aes(x = age, y = survived)) + geom_jitter(width = 0, height =

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Hoff Chapter 7, GH Chapter 25 April 21, 2017 Bednets and Malaria Y:presence or absence of parasites in a blood smear AGE: age of child BEDNET: bed net use (exposure) GREEN:greenness

More information

options(width = 65) suppressmessages(library(mi)) data(nlsyv, package = "mi")

options(width = 65) suppressmessages(library(mi)) data(nlsyv, package = mi) An Example of mi Usage Ben Goodrich and Jonathan Kropko, for this version, based on earlier versions written by Yu-Sung Su, Masanao Yajima, Maria Grazia Pittau, Jennifer Hill, and Andrew Gelman 06/16/2014

More information

Standard Errors in OLS Luke Sonnet

Standard Errors in OLS Luke Sonnet Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Bayesian model selection and diagnostics

Bayesian model selection and diagnostics Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider

More information

Chapter 5: Joint Probability Distributions and Random

Chapter 5: Joint Probability Distributions and Random Chapter 5: Joint Probability Distributions and Random Samples Curtis Miller 2018-06-13 Introduction We may naturally inquire about collections of random variables that are related to each other in some

More information

STAT 203 SOFTWARE TUTORIAL

STAT 203 SOFTWARE TUTORIAL STAT 203 SOFTWARE TUTORIAL PYTHON IN BAYESIAN ANALYSIS YING LIU 1 Some facts about Python An open source programming language Have many IDE to choose from (for R? Rstudio!) A powerful language; it can

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Bayesian Modelling with JAGS and R

Bayesian Modelling with JAGS and R Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained

More information

EBSeq: An R package for differential expression analysis using RNA-seq data

EBSeq: An R package for differential expression analysis using RNA-seq data EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John A. Dawson, Christina Kendziorski October 9, 2012 Contents 1 Introduction 2 2 The Model 3 2.1 Two conditions............................

More information

Pair-Wise Multiple Comparisons (Simulation)

Pair-Wise Multiple Comparisons (Simulation) Chapter 580 Pair-Wise Multiple Comparisons (Simulation) Introduction This procedure uses simulation analyze the power and significance level of three pair-wise multiple-comparison procedures: Tukey-Kramer,

More information

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information

Section 2. Stan Components. Bob Carpenter. Columbia University

Section 2. Stan Components. Bob Carpenter. Columbia University Section 2. Stan Components Bob Carpenter Columbia University Part I Stan Top Level Stan s Namesake Stanislaw Ulam (1909 1984) Co-inventor of Monte Carlo method (and hydrogen bomb) Ulam holding the Fermiac,

More information

The GLMMGibbs Package

The GLMMGibbs Package The GLMMGibbs Package April 22, 2002 Version 0.5-1 Author Jonathan Myles and David Clayton Maintainer Jonathan Myles Depends R (>= 1.0) Date 2001/22/01 Title

More information

IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 16)

IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 16) IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 6) Bonnie Lin and Nicholas Horton (nhorton@amherst.edu) July, 8 Introduction and background These documents are intended to help describe how

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Package ggqc. R topics documented: January 30, Type Package Title Quality Control Charts for 'ggplot' Version Author Kenith Grey

Package ggqc. R topics documented: January 30, Type Package Title Quality Control Charts for 'ggplot' Version Author Kenith Grey Type Package Title Quality Control Charts for 'ggplot' Version 0.0.2 Author Kenith Grey Package ggqc January 30, 2018 Maintainer Kenith Grey Plot single and faceted type quality

More information

The glmmml Package. August 20, 2006

The glmmml Package. August 20, 2006 The glmmml Package August 20, 2006 Version 0.65-1 Date 2006/08/20 Title Generalized linear models with clustering A Maximum Likelihood and bootstrap approach to mixed models. License GPL version 2 or newer.

More information

BUGS: Language, engines, and interfaces

BUGS: Language, engines, and interfaces BUGS: Language, engines, and interfaces Patrick Breheny January 17 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/18 The BUGS framework The BUGS project (Bayesian inference using Gibbs Sampling)

More information

R Programming: Worksheet 6

R Programming: Worksheet 6 R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.

More information

An introduction to R WS 2013/2014

An introduction to R WS 2013/2014 An introduction to R WS 2013/2014 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Dr. Martin Hutzenthaler (previously AG Metzler, now University of Frankfurt) course development,

More information

Logistic Regression. (Dichotomous predicted variable) Tim Frasier

Logistic Regression. (Dichotomous predicted variable) Tim Frasier Logistic Regression (Dichotomous predicted variable) Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

More information

Multiple Comparisons of Treatments vs. a Control (Simulation)

Multiple Comparisons of Treatments vs. a Control (Simulation) Chapter 585 Multiple Comparisons of Treatments vs. a Control (Simulation) Introduction This procedure uses simulation to analyze the power and significance level of two multiple-comparison procedures that

More information

Package SafeBayes. October 20, 2016

Package SafeBayes. October 20, 2016 Type Package Package SafeBayes October 20, 2016 Title Generalized and Safe-Bayesian Ridge and Lasso Regression Version 1.1 Date 2016-10-17 Depends R (>= 3.1.2), stats Description Functions for Generalized

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information

Package MetaLasso. R topics documented: March 22, Type Package

Package MetaLasso. R topics documented: March 22, Type Package Type Package Package MetaLasso March 22, 2018 Title Integrative Generlized Linear Model for Group/Variable Selections over Multiple Studies Version 0.1.0 Depends glmnet Author Quefeng Li Maintainer Quefeng

More information

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn (5) Multi-parameter models - Initial values and convergence diagnostics Tuning the MCMC algoritm MCMC is beautiful because it can handle virtually any statistical model and it is usually pretty easy to

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1) Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the

More information

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

Rearranging and manipula.ng data

Rearranging and manipula.ng data An introduc+on to Rearranging and manipula.ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 7 Course outline Review Checking and cleaning data Rearranging and manipula+ng

More information

Introduction to Applied Bayesian Modeling A brief JAGS and R2jags tutorial

Introduction to Applied Bayesian Modeling A brief JAGS and R2jags tutorial Introduction to Applied Bayesian Modeling A brief JAGS and R2jags tutorial Johannes Karreth University of Georgia jkarreth@uga.edu ICPSR Summer Program 2011 Last updated on July 7, 2011 1 What are JAGS,

More information

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering Encoding UTF-8 Version 1.0.3 Date 2018-03-25 Title Generalized Linear Models with Clustering Package glmmml March 25, 2018 Binomial and Poisson regression for clustered data, fixed and random effects with

More information

Dealing with Categorical Data Types in a Designed Experiment

Dealing with Categorical Data Types in a Designed Experiment Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of

More information

BART::wbart: BART for Numeric Outcomes

BART::wbart: BART for Numeric Outcomes BART::wbart: BART for Numeric Outcomes Robert McCulloch and Rodney Sparapani Contents 1 BART 1 1.1 Boston Housing Data......................................... 2 1.2 A Quick Look at the Data......................................

More information

Solution to Tumor growth in mice

Solution to Tumor growth in mice Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly

More information

Bayesian Approaches to Content-based Image Retrieval

Bayesian Approaches to Content-based Image Retrieval Bayesian Approaches to Content-based Image Retrieval Simon Wilson Georgios Stefanou Department of Statistics Trinity College Dublin Background Content-based Image Retrieval Problem: searching for images

More information

Package ANOVAreplication

Package ANOVAreplication Type Package Version 1.1.2 Package ANOVAreplication September 30, 2017 Title Test ANOVA Replications by Means of the Prior Predictive p- Author M. A. J. Zondervan-Zwijnenburg Maintainer M. A. J. Zondervan-Zwijnenburg

More information

Package simex. R topics documented: September 7, Type Package Version 1.7 Date Imports stats, graphics Suggests mgcv, nlme, MASS

Package simex. R topics documented: September 7, Type Package Version 1.7 Date Imports stats, graphics Suggests mgcv, nlme, MASS Type Package Version 1.7 Date 2016-03-25 Imports stats, graphics Suggests mgcv, nlme, MASS Package simex September 7, 2017 Title SIMEX- And MCSIMEX-Algorithm for Measurement Error Models Author Wolfgang

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Test Run to Check the Installation of JAGS & Rjags

Test Run to Check the Installation of JAGS & Rjags John Miyamoto File = D:\bugs\test.jags.install.docm 1 Test Run to Check the Installation of JAGS & Rjags The following annotated code is extracted from John Kruschke's R scripts, "E:\btut\r\BernBetaBugsFull.R"

More information

Supplementary tutorial for A Practical Guide and Power Analysis for GLMMs: Detecting Among Treatment Variation in Random Effects using R

Supplementary tutorial for A Practical Guide and Power Analysis for GLMMs: Detecting Among Treatment Variation in Random Effects using R Supplementary tutorial for A Practical Guide and Power Analysis for GLMMs: Detecting Among Treatment Variation in Random Effects using R Kain, Morgan, Ben M. Bolker, and Michael W. McCoy Introduction The

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1)

YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1) YEAR 12 Core 1 & 2 Maths Curriculum (A Level Year 1) Algebra and Functions Quadratic Functions Equations & Inequalities Binomial Expansion Sketching Curves Coordinate Geometry Radian Measures Sine and

More information

Package DPBBM. September 29, 2016

Package DPBBM. September 29, 2016 Type Package Title Dirichlet Process Beta-Binomial Mixture Version 0.2.5 Date 2016-09-21 Author Lin Zhang Package DPBBM September 29, 2016 Maintainer Lin Zhang Depends R (>= 3.1.0)

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

Spatio-temporal Under-five Mortality Methods for Estimation

Spatio-temporal Under-five Mortality Methods for Estimation Spatio-temporal Under-five Mortality Methods for Estimation Load Package and Data DemoData contains model survey data provided by DHS. Note that this data is fake, and does not represent any real country

More information

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors Today s Lecture Factors & Sampling Jarrett Byrnes September 8, 2014 1. A little bit about Factors 2. Sampling 3. Describing your sample Quick Review of Last Week s Computational Concepts Numbers we Understand

More information

StatsMate. User Guide

StatsMate. User Guide StatsMate User Guide Overview StatsMate is an easy-to-use powerful statistical calculator. It has been featured by Apple on Apps For Learning Math in the App Stores around the world. StatsMate comes with

More information

Simulating power in practice

Simulating power in practice Simulating power in practice Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Week 1 R Warm-Ups for Finance

Week 1 R Warm-Ups for Finance Week 1 R Warm-Ups for Finance Copyright 2016, William G. Foote. All rights reserved. Copyright 2016, William G. Foote. All rights reserved. Week 1 R Warm-Ups for Finance 1 / 97 Imagine this... You work

More information

Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident.

Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. CDT R Review Sheet Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. 1. Vectors (a) Generate 100 standard normal random variables,

More information