9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Size: px

Start display at page:

Download "9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10"

Alexis Tate
5 years ago
Views:

1 /MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models Constructed data Consumer preference mapping of carrots Random coefficients models Analysis of random coefficients models is performed using the function lme Constructed data The simple linear regression analyses of the two response y1 and y2 in the data set randcoef are obtained using lm > model1y1 <- lm(y1 x, data = randcoef) > model1y2 <- lm(y2 x, data = randcoef) The parameter estimates with corresponding standard errors in the two models are > summary(model1y1) Call: lm(formula = y1 x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** x e-09 *** --- Signif. codes: 0 *** ** 0.01 * /Mixed Linear Models Last modified August 23, 2011

2 Module 9: R 2 > summary(model1y2) Call: lm(formula = y2 x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-12 *** x e-11 *** --- Signif. codes: 0 *** ** 0.01 * The raw scatter plots for the data with superimposed regression lines are obtained using the plot and abline functions par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) abline(model1y1) plot(x,y2) abline(model1y2)}) par(mfrow=c(1,1)) The individual patterns in the data can be seen from the next plot par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) for (i in 1:10) {lines(x[subject==i],y1[subject==i],lty=i)} plot(x,y2) for (i in 1:10) {lines(x[subject==i],y2[subject==i],lty=i)}}) par(mfrow=c(1,1)) The function lines connects points with line segments. Notice how the repetetive plotting is solved using a for loop: For each i between 1 and 10 the relevant subset of the data is plotted with a line type that changes as the subject changes. Alternatively we could have used 10 lines lines for each response. The fixed effects analysis is > model2y1 <- lm(y1 x + subject + x * subject, data = randcoef) > model2y2 <- lm(y2 x + subject + x * subject, data = randcoef) The two resulting ANOVA tables are

3 Module 9: R 3 y y x x Figure 9.1: > anova(model2y1) Analysis of Variance Table Response: y1 Df Sum Sq Mean Sq F value Pr(>F) x < 2.2e-16 *** subject < 2.2e-16 *** x:subject < 2.2e-16 *** Residuals Signif. codes: 0 *** ** 0.01 * > anova(model2y2) Analysis of Variance Table Response: y2 Df Sum Sq Mean Sq F value Pr(>F)

4 Module 9: R 4 y y x x Figure 9.2: x e-12 *** subject ** x:subject Residuals Signif. codes: 0 *** ** 0.01 * Compare with the results p. 4 in Module 9. A plot of the data with individual regression lines based on model2y1 and model2y2 is again produced using a for loop. First we fit the two models in a different parameterisation (to obtain the estimates in a convenient form of one intercept and one slope per subject) > model3y1 <- lm(y1 subject x * subject - x, data = randcoef) > model3y2 <- lm(y2 subject x * subject - x, data = randcoef) The plots are produced using

5 Module 9: R 5 y y x x Figure 9.3: par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) for (i in 1:10) {abline(coef(model3y1)[c(i,i+10)],lty=i)} plot(x,y2) for (i in 1:10) {abline(coef(model3y2)[c(i,i+10)],lty=i)}}) par(mfrow=c(1,1)) Explanation: Remember that coef extracts the parameter estimates. Now the first 10 estimates will be the intercept estimates and the next 10 will be the slope estimates. Thus the component pairs (1, 11), (2, 12),..., (10, 20) will be belong to the subjects 1, 2,..., 10, respectively. This is exploited in the for loop in the part [c(i,i+10)] which produces these pairs as i runs from 1 to 10. The equal slopes model for the second data set is > model4y2 <- lm(y2 subject + x, data = randcoef) with parameter estimates

6 Module 9: R 6 > summary(model4y2) Call: lm(formula = y2 subject + x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** subject * subject subject subject subject subject subject subject ** subject x e-13 *** --- Signif. codes: 0 *** ** 0.01 * The summary of the two step analysis can be obtained using the functions mean and sd (computing empirical mean and standard deviation of a vector, respectively) to the vector of intercept estimates and to the vector of slope estimates (from the different slopes models) to perform the computations on p. 5 in R Module 9. Here it comes from data set 1, but it is done similarly for data set 2. ainty1<-mean(coef(model3y1)[1:10]) sdinty1<-sd(coef(model3y1)[1:10])/sqrt(10) uinty1<-ainty1+2.26*sdinty1 linty1<-ainty1-2.26*sdinty1 asloy1<-mean(coef(model3y1)[11:20]) sdsloy1<-sd(coef(model3y1)[11:20])/sqrt(10) usloy1<-asloy1+2.26*sdsloy1 lsloy1<-asloy1-2.26*sdsloy1 [1] [1] [1] [1]

7 Module 9: R 7 The correlations between intercepts and between slopes in the two data set are computed using corr > cor(coef(model3y1)[1:10], coef(model3y1)[11:20]) [1] > cor(coef(model3y2)[1:10], coef(model3y2)[11:20]) [1] The random coefficients analysis is done with lme. The different slopes random coefficient model is model5y1 <- lme(y1 x, random = 1 + x subject, data = randcoef) model5y2 <- lme(y2 x, random = 1 +x subject, data = randcoef,control=lmecontrol(opt (Note that to make the second model fit, the default optimizer used by lme was changed to optim.) After random the part 1+x specifies the terms to which the random factors after are assigned. One way to think about is that 1 is multiplied by subject and that x is multiplied by subject yielding the terms 1 subject + x subject which corresponds to the random part in formula (9.2) p. 2 in Module 9. The (fixed effects) parameter estimates are > intervals(model5y1)[[1]] lower est. upper (Intercept) x attr(,"label") [1] "Fixed effects:" > intervals(model5y2)[[1]] lower est. upper (Intercept) x attr(,"label") [1] "Fixed effects:"

8 Module 9: R 8 Due to the difference in degrees of freedom used in R and in SAS the confidence intervals are not exactly identical. The variance parameter, including the correlations between intercept and slope, estimates are obtained using VarCorr > VarCorr(model5y1) subject = pdlogchol(1 + x) Variance StdDev Corr (Intercept) (Intr) x Residual > VarCorr(model5y2) subject = pdlogchol(1 + x) Variance StdDev Corr (Intercept) (Intr) x Residual The equal slopes models within the random coefficient framework are specified as > model6y1 <- lme(y1 x, random = 1 subject, data = randcoef) > model6y2 <- lme(y2 x, random = 1 subject, data = randcoef) Likelihood ratio tests for reduction from different slopes to equal slopes can be obtained using anova with two lme objects as arguments (the first argument (model) is less general than the second argument (model)). > anova(model6y1, model5y1) Model df AIC BIC loglik Test L.Ratio p-value model6y model5y vs <.0001 > anova(model6y2, model5y2) Model df AIC BIC loglik Test L.Ratio p-value model6y model5y vs Notice that one of the values of the test statistics differs somewhat from the value obtained in SAS, but the conclusion are the same. The (fixed effects) parameter estimates for data set 2 are

9 Module 9: R 9 > intervals(model6y2)[[1]] lower est. upper (Intercept) x attr(,"label") [1] "Fixed effects:" Consumer preference mapping of carrots Recall that the most general model ((9.8) to (9.11) in Module 9) states that for each level of Consumer the random intercept and random slopes of sens1 and sens2 are correlated in an arbitrary way (the specification in (9.11)). This model does not work in SAS prox mixed, but it works fine in R. It can be specified as follows carrots<- read.table("<mypersonalpath>carrots.txt",header=true,sep=",") carrots$const<-rep(1,length(carrots$preference)) carrots$const<-rep(1,length(carrots$preference)) carrots$homesize=factor(carrots$homesize) carrots$consumer=factor(carrots$consumer) carrots$product=factor(carrots$product) lmecontrol(maxiter=100,tolerance=0.0001) model1<-lme(preference Homesize+sens1+sens2+Homesize*sens1 +Homesize*sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens1+sens2)), data=carrots,na.action=na.omit,control=lmecontrol(opt="optim")) (Another optimizer than the default was used here through the "optim" option) The random part deserves some explanation. First, notice that const corresponds to the factor O. Second, notice that the terms pdident and pdlogchol denote two main structures for variance matrices: pdident is a matrix with variance components in the diagonal and 0 s outside the diagonal, and pdlogchol is a general variance matrix with variance components in the diagonal and covariances outside the diagonal. Therefore the structure (9.11) amounts to the term Consumer=pdLogChol( 1+sens1+sens2), for each level of Consumer we have 3 random effects, one intercept and two slopes, and they are arbitrarily correlated. In addition there is the random effect product, and const=pdident( product-1) means that for the single level of const a variance matrix with as many diagonal elements as there are levels in the factor product is constructed.

10 Module 9: R 10 The model without correlation between intercept and slopes (Model 1 in Module 9) is model2 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pddiag( 1 + sens1 + sens2)), data = carrots, na.action = na.omit) The estimated variance components for the intercepts and slopes are all 3 almost 0, which means that there are too many variance parameters (given the information available in the data) for this model to work. The model without the random slope on sens1 is model3 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pddiag( 1 + sens2)), data = carrots, na.action = na.omit) For the model without sens1 but with correlated intercept and slope for sens2 (Model 0 in Module 9) the parameter can be estimated by: model4 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pdlogchol( 1 + sens2)), data = carrots, na.action = na.omit) and the test for reduction from model1 to model4 is insignificant. The model without a random slope on sens1 (Model 2A in Module 9) is model5 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = list(const = pdident( product - 1), Consumer = pdlogchol( 1)), data = carrots, na.action = na.omit) Another sub-model of model4 is the model without the random factor product (Model 2B in Module 9) model6 <- lme(preference Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2, random = 1 + sens2 Consumer, data = carrots, na.action = na.omit) Reduction from model4 to either model5 and model6 is not possible > anova(model6, model4)

11 Module 9: R 11 Model df AIC BIC loglik Test L.Ratio p-value model model vs <.0001 > anova(model6, model5) Model df AIC BIC loglik Test L.Ratio p-value model model vs e-04 The final model (when using R) with regard to the covariance structure is model4. After having reduced the covariance structure in the model, we turn attention to the mean structure, ie the fixed effects. Using anova on model4 gives > anova(model4) numdf dendf F-value p-value (Intercept) <.0001 Homesize sens sens <.0001 Homesize:sens Homesize:sens The slope of sens1 does not depend significantly on the level of the factor Homesize and therefore it is omitted from the model, resulting in the reduced model model7<-lme(preference Homesize+sens1+sens2+Homesize*sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens2)),data=carrots,na.action=na.omit) From the ANOVA table > anova(model7) numdf dendf F-value p-value (Intercept) <.0001 Homesize sens sens <.0001 Homesize:sens it follows that the slope of sens2 also is independent of Homesize. The new reduced model is

12 Module 9: R 12 model8<-lme(preference Homesize+sens1+sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens2)),data=carrots,na.action=na.omit) Again, looking at the ANOVA table it follows that sens1 is insignificant. The final model (after having looked at another anova output) is model9<-lme(preference Homesize-1+sens2, random=list(const=pdident( product-1), Consumer=pdLogChol( 1+sens2)),data=carrots,na.action=na.omit) The estimated variance components are obtained using VarCorr > unique(varcorr(model9)[, 1]) [1] "pdident(product - 1)" " " "pdlogchol(1 + sens2)" [4] " " " " " " (only the distinct values are obtained with the function unique). The confidence intervals for the estimated slope on sens2 is > intervals(model9)[[1]][3, ] lower est. upper (the 3. row only is retrieved). Using the function estimable LSMEANS values and the estimated difference between the two levels of Homesize can be computed. The relevant contrast matrix is sens2mean<-mean(carrots$sens2) conmat<-matrix(0, 3, 3) conmat[1,]<-c(1,0,sens2mean) conmat[2,]<-c(0,1,sens2mean) conmat[3,]<-c(1,-1,0) rownames(conmat)=c("1","2","1-2") Now the function estimable gives the estimates > estmat <- estimable(model9, conmat, conf.int = 0.95) > estmat[, c(1, 6, 7)] Estimate Lower CI Upper CI NaN NaN NaN NaN Notice the Not a Number (NaN) values in the output. This is apparently because the average of sens2 is extremely close to 0 (try type sens2mean) and this causes a problem in the function estimable. The problem is solved by setting sens2mean<-0 and re-running the conmat and estmat statements.

Random coefficients models

enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................