9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Size: px
Start display at page:

Download "9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10"

Transcription

1 /MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models Constructed data Consumer preference mapping of carrots Random coefficients models Analysis of random coefficients models is performed using the function lme Constructed data The simple linear regression analyses of the two response y1 and y2 in the data set randcoef are obtained using lm > model1y1 <- lm(y1 x, data = randcoef) > model1y2 <- lm(y2 x, data = randcoef) The parameter estimates with corresponding standard errors in the two models are > summary(model1y1) Call: lm(formula = y1 x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** x e-09 *** --- Signif. codes: 0 *** ** 0.01 * /Mixed Linear Models Last modified August 23, 2011

2 Module 9: R 2 > summary(model1y2) Call: lm(formula = y2 x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-12 *** x e-11 *** --- Signif. codes: 0 *** ** 0.01 * The raw scatter plots for the data with superimposed regression lines are obtained using the plot and abline functions par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) abline(model1y1) plot(x,y2) abline(model1y2)}) par(mfrow=c(1,1)) The individual patterns in the data can be seen from the next plot par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) for (i in 1:10) {lines(x[subject==i],y1[subject==i],lty=i)} plot(x,y2) for (i in 1:10) {lines(x[subject==i],y2[subject==i],lty=i)}}) par(mfrow=c(1,1)) The function lines connects points with line segments. Notice how the repetetive plotting is solved using a for loop: For each i between 1 and 10 the relevant subset of the data is plotted with a line type that changes as the subject changes. Alternatively we could have used 10 lines lines for each response. The fixed effects analysis is > model2y1 <- lm(y1 x + subject + x * subject, data = randcoef) > model2y2 <- lm(y2 x + subject + x * subject, data = randcoef) The two resulting ANOVA tables are

3 Module 9: R 3 y y x x Figure 9.1: > anova(model2y1) Analysis of Variance Table Response: y1 Df Sum Sq Mean Sq F value Pr(>F) x < 2.2e-16 *** subject < 2.2e-16 *** x:subject < 2.2e-16 *** Residuals Signif. codes: 0 *** ** 0.01 * > anova(model2y2) Analysis of Variance Table Response: y2 Df Sum Sq Mean Sq F value Pr(>F)

4 Module 9: R 4 y y x x Figure 9.2: x e-12 *** subject ** x:subject Residuals Signif. codes: 0 *** ** 0.01 * Compare with the results p. 4 in Module 9. A plot of the data with individual regression lines based on model2y1 and model2y2 is again produced using a for loop. First we fit the two models in a different parameterisation (to obtain the estimates in a convenient form of one intercept and one slope per subject) > model3y1 <- lm(y1 subject x * subject - x, data = randcoef) > model3y2 <- lm(y2 subject x * subject - x, data = randcoef) The plots are produced using

5 Module 9: R 5 y y x x Figure 9.3: par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) for (i in 1:10) {abline(coef(model3y1)[c(i,i+10)],lty=i)} plot(x,y2) for (i in 1:10) {abline(coef(model3y2)[c(i,i+10)],lty=i)}}) par(mfrow=c(1,1)) Explanation: Remember that coef extracts the parameter estimates. Now the first 10 estimates will be the intercept estimates and the next 10 will be the slope estimates. Thus the component pairs (1, 11), (2, 12),..., (10, 20) will be belong to the subjects 1, 2,..., 10, respectively. This is exploited in the for loop in the part [c(i,i+10)] which produces these pairs as i runs from 1 to 10. The equal slopes model for the second data set is > model4y2 <- lm(y2 subject + x, data = randcoef) with parameter estimates

6 Module 9: R 6 > summary(model4y2) Call: lm(formula = y2 subject + x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** subject * subject subject subject subject subject subject subject ** subject x e-13 *** --- Signif. codes: 0 *** ** 0.01 * The summary of the two step analysis can be obtained using the functions mean and sd (computing empirical mean and standard deviation of a vector, respectively) to the vector of intercept estimates and to the vector of slope estimates (from the different slopes models) to perform the computations on p. 5 in R Module 9. Here it comes from data set 1, but it is done similarly for data set 2. ainty1<-mean(coef(model3y1)[1:10]) sdinty1<-sd(coef(model3y1)[1:10])/sqrt(10) uinty1<-ainty1+2.26*sdinty1 linty1<-ainty1-2.26*sdinty1 asloy1<-mean(coef(model3y1)[11:20]) sdsloy1<-sd(coef(model3y1)[11:20])/sqrt(10) usloy1<-asloy1+2.26*sdsloy1 lsloy1<-asloy1-2.26*sdsloy1 [1] [1] [1] [1]

7 Module 9: R 7 The correlations between intercepts and between slopes in the two data set are computed using corr > cor(coef(model3y1)[1:10], coef(model3y1)[11:20]) [1] > cor(coef(model3y2)[1:10], coef(model3y2)[11:20]) [1] The random coefficients analysis is done with lme. The different slopes random coefficient model is model5y1 <- lme(y1 x, random = 1 + x subject, data = randcoef) model5y2 <- lme(y2 x, random = 1 +x subject, data = randcoef,control=lmecontrol(opt (Note that to make the second model fit, the default optimizer used by lme was changed to optim.) After random the part 1+x specifies the terms to which the random factors after are assigned. One way to think about is that 1 is multiplied by subject and that x is multiplied by subject yielding the terms 1 subject + x subject which corresponds to the random part in formula (9.2) p. 2 in Module 9. The (fixed effects) parameter estimates are > intervals(model5y1)[[1]] lower est. upper (Intercept) x attr(,"label") [1] "Fixed effects:" > intervals(model5y2)[[1]] lower est. upper (Intercept) x attr(,"label") [1] "Fixed effects:"

8 Module 9: R 8 Due to the difference in degrees of freedom used in R and in SAS the confidence intervals are not exactly identical. The variance parameter, including the correlations between intercept and slope, estimates are obtained using VarCorr > VarCorr(model5y1) subject = pdlogchol(1 + x) Variance StdDev Corr (Intercept) (Intr) x Residual > VarCorr(model5y2) subject = pdlogchol(1 + x) Variance StdDev Corr (Intercept) (Intr) x Residual The equal slopes models within the random coefficient framework are specified as > model6y1 <- lme(y1 x, random = 1 subject, data = randcoef) > model6y2 <- lme(y2 x, random = 1 subject, data = randcoef) Likelihood ratio tests for reduction from different slopes to equal slopes can be obtained using anova with two lme objects as arguments (the first argument (model) is less general than the second argument (model)). > anova(model6y1, model5y1) Model df AIC BIC loglik Test L.Ratio p-value model6y model5y vs <.0001 > anova(model6y2, model5y2) Model df AIC BIC loglik Test L.Ratio p-value model6y model5y vs Notice that one of the values of the test statistics differs somewhat from the value obtained in SAS, but the conclusion are the same. The (fixed effects) parameter estimates for data set 2 are

9 Module 9: R 9 > intervals(model6y2)[[1]] lower est. upper (Intercept) x attr(,"label") [1] "Fixed effects:" Consumer preference mapping of carrots Recall that the most general model ((9.8) to (9.11) in Module 9) states that for each level of Consumer the random intercept and random slopes of sens1 and sens2 are correlated in an arbitrary way (the specification in (9.11)). This model does not work in SAS prox mixed, but it works fine in R. It can be specified as follows carrots<- read.table("<mypersonalpath>carrots.txt",header=true,sep=",") carrots$const<-rep(1,length(carrots$preference)) carrots$const<-rep(1,length(carrots$preference)) carrots$homesize=factor(carrots$homesize) carrots$consumer=factor(carrots$consumer) carrots$product=factor(carrots$product) lmecontrol(maxiter=100,tolerance=0.0001) model1<-lme(preference Homesize+sens1+sens2+Homesize*sens1 +Homesize*sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens1+sens2)), data=carrots,na.action=na.omit,control=lmecontrol(opt="optim")) (Another optimizer than the default was used here through the "optim" option) The random part deserves some explanation. First, notice that const corresponds to the factor O. Second, notice that the terms pdident and pdlogchol denote two main structures for variance matrices: pdident is a matrix with variance components in the diagonal and 0 s outside the diagonal, and pdlogchol is a general variance matrix with variance components in the diagonal and covariances outside the diagonal. Therefore the structure (9.11) amounts to the term Consumer=pdLogChol( 1+sens1+sens2), for each level of Consumer we have 3 random effects, one intercept and two slopes, and they are arbitrarily correlated. In addition there is the random effect product, and const=pdident( product-1) means that for the single level of const a variance matrix with as many diagonal elements as there are levels in the factor product is constructed.

10 Module 9: R 10 The model without correlation between intercept and slopes (Model 1 in Module 9) is model2 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pddiag( 1 + sens1 + sens2)), data = carrots, na.action = na.omit) The estimated variance components for the intercepts and slopes are all 3 almost 0, which means that there are too many variance parameters (given the information available in the data) for this model to work. The model without the random slope on sens1 is model3 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pddiag( 1 + sens2)), data = carrots, na.action = na.omit) For the model without sens1 but with correlated intercept and slope for sens2 (Model 0 in Module 9) the parameter can be estimated by: model4 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pdlogchol( 1 + sens2)), data = carrots, na.action = na.omit) and the test for reduction from model1 to model4 is insignificant. The model without a random slope on sens1 (Model 2A in Module 9) is model5 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pdlogchol( 1)), data = carrots, na.action = na.omit) Another sub-model of model4 is the model without the random factor product (Model 2B in Module 9) model6 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = 1 + sens2 Consumer, data = carrots, na.action = na.omit) Reduction from model4 to either model5 and model6 is not possible > anova(model6, model4)

11 Module 9: R 11 Model df AIC BIC loglik Test L.Ratio p-value model model vs <.0001 > anova(model6, model5) Model df AIC BIC loglik Test L.Ratio p-value model model vs e-04 The final model (when using R) with regard to the covariance structure is model4. After having reduced the covariance structure in the model, we turn attention to the mean structure, ie the fixed effects. Using anova on model4 gives > anova(model4) numdf dendf F-value p-value (Intercept) <.0001 Homesize sens sens <.0001 Homesize:sens Homesize:sens The slope of sens1 does not depend significantly on the level of the factor Homesize and therefore it is omitted from the model, resulting in the reduced model model7<-lme(preference Homesize+sens1+sens2+Homesize*sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens2)),data=carrots,na.action=na.omit) From the ANOVA table > anova(model7) numdf dendf F-value p-value (Intercept) <.0001 Homesize sens sens <.0001 Homesize:sens it follows that the slope of sens2 also is independent of Homesize. The new reduced model is

12 Module 9: R 12 model8<-lme(preference Homesize+sens1+sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens2)),data=carrots,na.action=na.omit) Again, looking at the ANOVA table it follows that sens1 is insignificant. The final model (after having looked at another anova output) is model9<-lme(preference Homesize-1+sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens2)),data=carrots,na.action=na.omit) The estimated variance components are obtained using VarCorr > unique(varcorr(model9)[, 1]) [1] "pdident(product - 1)" " " "pdlogchol(1 + sens2)" [4] " " " " " " (only the distinct values are obtained with the function unique). The confidence intervals for the estimated slope on sens2 is > intervals(model9)[[1]][3, ] lower est. upper (the 3. row only is retrieved). Using the function estimable LSMEANS values and the estimated difference between the two levels of Homesize can be computed. The relevant contrast matrix is sens2mean<-mean(carrots$sens2) conmat<-matrix(0, 3, 3) conmat[1,]<-c(1,0,sens2mean) conmat[2,]<-c(0,1,sens2mean) conmat[3,]<-c(1,-1,0) rownames(conmat)=c("1","2","1-2") Now the function estimable gives the estimates > estmat <- estimable(model9, conmat, conf.int = 0.95) > estmat[, c(1, 6, 7)] Estimate Lower CI Upper CI NaN NaN NaN NaN Notice the Not a Number (NaN) values in the output. This is apparently because the average of sens2 is extremely close to 0 (try type sens2mean) and this causes a problem in the function estimable. The problem is solved by setting sens2mean<-0 and re-running the conmat and estmat statements.

Random coefficients models

Random coefficients models enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................

More information

Random coefficients models

Random coefficients models enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

A short explanation of Linear Mixed Models (LMM)

A short explanation of Linear Mixed Models (LMM) A short explanation of Linear Mixed Models (LMM) DO NOT TRUST M ENGLISH! This PDF is downloadable at "My learning page" of http://www.lowtem.hokudai.ac.jp/plantecol/akihiro/sumida-index.html ver 20121121e

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information

lme for SAS PROC MIXED Users

lme for SAS PROC MIXED Users lme for SAS PROC MIXED Users Douglas M. Bates Department of Statistics University of Wisconsin Madison José C. Pinheiro Bell Laboratories Lucent Technologies 1 Introduction The lme function from the nlme

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

Regression on the trees data with R

Regression on the trees data with R > trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76

More information

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison 610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

The lmekin function. Terry Therneau Mayo Clinic. May 11, 2018

The lmekin function. Terry Therneau Mayo Clinic. May 11, 2018 The lmekin function Terry Therneau Mayo Clinic May 11, 2018 1 Background The original kinship library had an implementation of linear mixed effects models using the matrix code found in coxme. Since the

More information

Output from redwing2.r

Output from redwing2.r Output from redwing2.r # redwing2.r library(lsmeans) library(nlme) #library(lme4) # not used #library(lmertest) # not used library(multcomp) # get the data # you may want to change the path to where you

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Module 3: SAS. 3.1 Initial explorative analysis 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE

Module 3: SAS. 3.1 Initial explorative analysis 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 3: SAS 3.1 Initial explorative analysis....................... 1 3.1.1 SAS JMP............................

More information

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,

More information

1 Lab 1. Graphics and Checking Residuals

1 Lab 1. Graphics and Checking Residuals R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows

More information

Solution to Bonus Questions

Solution to Bonus Questions Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

Stat 5303 (Oehlert): Response Surfaces 1

Stat 5303 (Oehlert): Response Surfaces 1 Stat 5303 (Oehlert): Response Surfaces 1 > data

More information

A Knitr Demo. Charles J. Geyer. February 8, 2017

A Knitr Demo. Charles J. Geyer. February 8, 2017 A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.

More information

Performing Cluster Bootstrapped Regressions in R

Performing Cluster Bootstrapped Regressions in R Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

PSY 9556B (Feb 5) Latent Growth Modeling

PSY 9556B (Feb 5) Latent Growth Modeling PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth

More information

Statistical Analysis of Series of N-of-1 Trials Using R. Artur Araujo

Statistical Analysis of Series of N-of-1 Trials Using R. Artur Araujo Statistical Analysis of Series of N-of-1 Trials Using R Artur Araujo March 2016 Acknowledgements I would like to thank Boehringer Ingelheim GmbH for having paid my tuition fees at the University of Sheffield

More information

Practical 4: Mixed effect models

Practical 4: Mixed effect models Practical 4: Mixed effect models This practical is about how to fit (generalised) linear mixed effects models using the lme4 package. You may need to install it first (using either the install.packages

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Getting Started in R

Getting Started in R Getting Started in R Phil Beineke, Balasubramanian Narasimhan, Victoria Stodden modified for Rby Giles Hooker January 25, 2004 1 Overview R is a free alternative to Splus: a nice environment for data analysis

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

Regression III: Lab 4

Regression III: Lab 4 Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would

More information

Introduction to Statistical Analyses in SAS

Introduction to Statistical Analyses in SAS Introduction to Statistical Analyses in SAS Programming Workshop Presented by the Applied Statistics Lab Sarah Janse April 5, 2017 1 Introduction Today we will go over some basic statistical analyses in

More information

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive

More information

Getting Started in R

Getting Started in R Getting Started in R Giles Hooker May 28, 2007 1 Overview R is a free alternative to Splus: a nice environment for data analysis and graphical exploration. It uses the objectoriented paradigm to implement

More information

Predicting Web Service Levels During VM Live Migrations

Predicting Web Service Levels During VM Live Migrations Predicting Web Service Levels During VM Live Migrations 5th International DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and the Cloud Helmut Hlavacs, Thomas Treutner

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

Homework set 4 - Solutions

Homework set 4 - Solutions Homework set 4 - Solutions Math 3200 Renato Feres 1. (Eercise 4.12, page 153) This requires importing the data set for Eercise 4.12. You may, if you wish, type the data points into a vector. (a) Calculate

More information

Comparing Fitted Models with the fit.models Package

Comparing Fitted Models with the fit.models Package Comparing Fitted Models with the fit.models Package Kjell Konis Acting Assistant Professor Computational Finance and Risk Management Dept. Applied Mathematics, University of Washington History of fit.models

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation: Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

Chemical Reaction dataset ( https://stat.wvu.edu/~cjelsema/data/chemicalreaction.txt )

Chemical Reaction dataset ( https://stat.wvu.edu/~cjelsema/data/chemicalreaction.txt ) JMP Output from Chapter 9 Factorial Analysis through JMP Chemical Reaction dataset ( https://stat.wvu.edu/~cjelsema/data/chemicalreaction.txt ) Fitting the Model and checking conditions Analyze > Fit Model

More information

Factorial ANOVA. Skipping... Page 1 of 18

Factorial ANOVA. Skipping... Page 1 of 18 Factorial ANOVA The potato data: Batches of potatoes randomly assigned to to be stored at either cool or warm temperature, infected with one of three bacterial types. Then wait a set period. The dependent

More information

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section

More information

Introduction to mixed-effects regression for (psycho)linguists

Introduction to mixed-effects regression for (psycho)linguists Introduction to mixed-effects regression for (psycho)linguists Martijn Wieling Department of Humanities Computing, University of Groningen Groningen, April 21, 2015 1 Martijn Wieling Introduction to mixed-effects

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

Salary 9 mo : 9 month salary for faculty member for 2004

Salary 9 mo : 9 month salary for faculty member for 2004 22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Bernt Arne Ødegaard. 15 November 2018

Bernt Arne Ødegaard. 15 November 2018 R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,

More information

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value. AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Additional Issues: Random effects diagnostics, multiple comparisons

Additional Issues: Random effects diagnostics, multiple comparisons : Random diagnostics, multiple Austin F. Frank, T. Florian April 30, 2009 The dative dataset Original analysis in Bresnan et al (2007) Data obtained from languager (Baayen 2008) Data describing the realization

More information

Data Analysis and Hypothesis Testing Using the Python ecosystem

Data Analysis and Hypothesis Testing Using the Python ecosystem ARISTOTLE UNIVERSITY OF THESSALONIKI Data Analysis and Hypothesis Testing Using the Python ecosystem t-test & ANOVAs Stavros Demetriadis Assc. Prof., School of Informatics, Aristotle University of Thessaloniki

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

E-Campus Inferential Statistics - Part 2

E-Campus Inferential Statistics - Part 2 E-Campus Inferential Statistics - Part 2 Group Members: James Jones Question 4-Isthere a significant difference in the mean prices of the stores? New Textbook Prices New Price Descriptives 95% Confidence

More information

Centering and Interactions: The Training Data

Centering and Interactions: The Training Data Centering and Interactions: The Training Data A random sample of 150 technical support workers were first given a test of their technical skill and knowledge, and then randomly assigned to one of three

More information

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables: Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum

More information

Gelman-Hill Chapter 3

Gelman-Hill Chapter 3 Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive, R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Factorial ANOVA with SAS

Factorial ANOVA with SAS Factorial ANOVA with SAS /* potato305.sas */ options linesize=79 noovp formdlim='_' ; title 'Rotten potatoes'; title2 ''; proc format; value tfmt 1 = 'Cool' 2 = 'Warm'; data spud; infile 'potato2.data'

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Example 5.25: (page 228) Screenshots from JMP. These examples assume post-hoc analysis using a Protected LSD or Protected Welch strategy.

Example 5.25: (page 228) Screenshots from JMP. These examples assume post-hoc analysis using a Protected LSD or Protected Welch strategy. JMP Output from Chapter 5 Factorial Analysis through JMP Example 5.25: (page 228) Screenshots from JMP. These examples assume post-hoc analysis using a Protected LSD or Protected Welch strategy. Fitting

More information

Package simr. April 30, 2018

Package simr. April 30, 2018 Type Package Package simr April 30, 2018 Title Power Analysis for Generalised Linear Mixed Models by Simulation Calculate power for generalised linear mixed models, using simulation. Designed to work with

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Exercise: Graphing and Least Squares Fitting in Quattro Pro Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.

More information

R Graphics. SCS Short Course March 14, 2008

R Graphics. SCS Short Course March 14, 2008 R Graphics SCS Short Course March 14, 2008 Archeology Archeological expedition Basic graphics easy and flexible Lattice (trellis) graphics powerful but less flexible Rgl nice 3d but challenging Tons of

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

STATISTICS FOR PSYCHOLOGISTS

STATISTICS FOR PSYCHOLOGISTS STATISTICS FOR PSYCHOLOGISTS SECTION: JAMOVI CHAPTER: USING THE SOFTWARE Section Abstract: This section provides step-by-step instructions on how to obtain basic statistical output using JAMOVI, both visually

More information

Laboratory for Two-Way ANOVA: Interactions

Laboratory for Two-Way ANOVA: Interactions Laboratory for Two-Way ANOVA: Interactions For the last lab, we focused on the basics of the Two-Way ANOVA. That is, you learned how to compute a Brown-Forsythe analysis for a Two-Way ANOVA, as well as

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information