The linear mixed model: modeling hierarchical and longitudinal data

Size: px

Start display at page:

Download "The linear mixed model: modeling hierarchical and longitudinal data"

Bertram Burns
6 years ago
Views:

1 The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44

2 Contents 1 Modeling Hierarchical Data Overview Example: the High School and Beyond data Further issues Modeling Longitudinal Data Example: Reaction times in a sleep deprivation study AED The linear mixed model: modeling hierarchical and longitudinal data 2 of 44

3 1 Modeling Hierarchical Data 1.1 Overview especially in education (think students in schools ), applications of mixed models to hierarchical data have become very popular when researchers in the social sciences talk about multilevel analysis, they typically mean the application of mixed models to hierarchical data typical for this type of data is the sharp distinction between levels : the student level, and the school level, where one level (student) is nested in the other (school) data often contains both student characteristics and school charateristics; in a multilevel analysis, both types of variables can be included in one analysis simultaneously AED The linear mixed model: modeling hierarchical and longitudinal data 3 of 44

4 1.2 Example: the High School and Beyond data the following example is borrowed from Raudenbusch and Bryk (2001, chapter 4). the data are from the 1982 High School and Beyond survey, and pertain to 7185 U.S. high-school students from 160 schools (70 catholic, 90 public) these are the variables in the dataset: school an ordered factor designating the school that the student attends. minrty a factor with 2 levels (not used) sx a factor with levels Male and Female (not used) ses a numeric vector of socio-economic scores mach a numeric vector of Mathematics achievement scores meanses a numeric vector of mean ses for the school sector a factor with levels Public and Catholic cses a numeric vector of centered ses values where the centering is with respect to the meanses for the school AED The linear mixed model: modeling hierarchical and longitudinal data 4 of 44

5 the aim of the analysis is to determine how students math achievement scores are related to their family socioeconomic status but this relationship may very well vary among schools if there is indeed variation among schools, can we find any school characteristics that explain this variation? the two school characteristics that we will use are: sector: public school or Catholic school meanses: the average SES of students in the school AED The linear mixed model: modeling hierarchical and longitudinal data 5 of 44

6 Exploring the data > library(mlmrev) > summary(hsb82) school minrty sx ses mach 2305 : 67 No :5211 Male :3390 Min. : Min. : : 66 Yes:1974 Female:3795 1st Qu.: st Qu.: : 65 Median : Median : : 64 Mean : Mean : : 64 3rd Qu.: rd Qu.: : 64 Max. : Max. : (Other):6795 meanses sector cses Min. : Public :3642 Min. :-3.651e+00 1st Qu.: Catholic:3543 1st Qu.:-4.479e-01 Median : Median : 1.600e-02 Mean : Mean :-9.144e-19 3rd Qu.: rd Qu.: 4.694e-01 Max. : Max. : 2.856e+00 > head(hsb82) school minrty sx ses mach meanses sector cses No Female Public No Female Public AED The linear mixed model: modeling hierarchical and longitudinal data 6 of 44

7 No Male Public No Male Public No Male Public No Male Public > tail(hsb82) school minrty sx ses mach meanses sector cses Yes Female Catholic No Female Catholic No Female Catholic No Female Catholic No Female Catholic No Female Catholic AED The linear mixed model: modeling hierarchical and longitudinal data 7 of 44

8 Exploring the data: math achievement versus ses for several schools > set.seed(1234) > library(lattice) > cat <- sample(unique(hsb82$school[hsb82$sector == "Catholic"]), + 18) > Cat.18 <- Hsb82[is.element(Hsb82$school, cat), ] > plot1 <- xyplot(mach ses school, data = Cat.18, main = "Catholic", + xlab = "student SES", ylab = "Math Achievement", layout = c(6, + 3), panel = function(x, y) { + panel.xyplot(x, y) + panel.lmline(x, y) + }) > pub <- sample(unique(hsb82$school[hsb82$sector == "Public"]), + 18) > Pub.18 <- Hsb82[is.element(Hsb82$school, pub), ] > plot2 <- xyplot(mach ses school, data = Pub.18, main = "Public", + xlab = "student SES", ylab = "Math Achievement", layout = c(6, + 3), panel = function(x, y) { + panel.xyplot(x, y) + panel.lmline(x, y) + }) AED The linear mixed model: modeling hierarchical and longitudinal data 8 of 44

9 Catholic student SES Math Achievement AED The linear mixed model: modeling hierarchical and longitudinal data 9 of 44

10 Public student SES Math Achievement AED The linear mixed model: modeling hierarchical and longitudinal data 10 of 44

11 Fitting a simple regression model for each school separately > cat.list <- lmlist(mach ses school, subset = sector == "Catholic", + data = Hsb82) > head(coef(cat.list)) (Intercept) ses > pub.list <- lmlist(mach ses school, subset = sector == "Public", + data = Hsb82) > head(coef(pub.list)) (Intercept) ses AED The linear mixed model: modeling hierarchical and longitudinal data 11 of 44

12 Catholic (Intercept) ses AED The linear mixed model: modeling hierarchical and longitudinal data 12 of 44

13 Public (Intercept) ses AED The linear mixed model: modeling hierarchical and longitudinal data 13 of 44

14 Intercepts Slopes Catholic Public Catholic Public AED The linear mixed model: modeling hierarchical and longitudinal data 14 of 44

15 Relationship intercepts and slopes and mean school SES Mean SES School Intercept Mean SES School Slope AED The linear mixed model: modeling hierarchical and longitudinal data 15 of 44

16 Model 1: a random-effects one-way ANOVA this is often called the empty model, since it containts no predictors, but simply reflects the nested structure no level-1 predictors, no level-2 predictors in the multilevel notation we specify a model for each level model for the first (student) level: y ij = α 0i + ɛ ij model for the second (school) level: α 0i = γ 00 + u 0i the combined model and the Laird-Ware form: y ij = γ 00 + u 0i + ɛ ij = β 0 + b 0i + ɛ ij AED The linear mixed model: modeling hierarchical and longitudinal data 16 of 44

17 this is an example of a random-effects one-way ANOVA model with one fixed effect (the intercept, β 0 ) representing the general population mean of math achievement, and two random effects: b 0i representing the deviation of math achievement in school i from the general mean ɛ ij representing the deviation of individual j s math achievement in school i from the school mean there are two variance components for this model: Var(b 0i ) = d 2 : the variance among school means Var(ɛ ij ) = σ 2 : the variance among individuals in the same school since b 0i and ɛ ij are assumed to be independent, the variation in math scores among individuals can be decomposed into these two variance components: Var(y ij ) = d 2 + σ 2 AED The linear mixed model: modeling hierarchical and longitudinal data 17 of 44

18 R code > fit.model1 <- lmer(mach 1 + (1 school), data = Hsb82) > summary(fit.model1) Linear mixed model fit by REML Formula: mach 1 + (1 school) Data: Hsb82 AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. school (Intercept) Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) AED The linear mixed model: modeling hierarchical and longitudinal data 18 of 44

19 Intra-class correlation the intra-class correlation coefficient is the proportion of variation in individuals scores due to differences among schools: d 2 Var(y ij ) = d2 d 2 + σ 2 = ρ ρ may also be interpreted as the correlation between the math scores of two individuals from the same school: Cor(y ij, y ij ) = > s <- summary(fit.model1) > d2 <- as.numeric(s@remat[, 3])[1] > s2 <- as.numeric(s@remat[, 3])[2] > rho <- d2/(d2 + s2) > rho [1] Cov(y ij, y ij ) Var(yij ) Var(y ij ) = d2 d 2 + σ 2 = ρ AED The linear mixed model: modeling hierarchical and longitudinal data 19 of 44

20 about 18 percent of the variation in students match-achievement scores is attributable to differences among schools this is often called the cluster effect AED The linear mixed model: modeling hierarchical and longitudinal data 20 of 44

21 Model 2: a random-effects one-way ANCOVA 1 level-1 predictor (SES, centered within school), no level-2 predictors random intercept, no random slopes model for the first (student) level: y ij = α 0i + α 1i cses ij + ɛ ij model for the second (school) level: α 0i = γ 00 + u 0i α 1i = γ 10 (the random intercept) (the constant slope) the combined model and the Laird-Ware form: y ij = (γ 00 + u 0i ) + γ 10 cses ij + ɛ ij = γ 00 + γ 10 cses ij + u 0i + ɛ ij = β 0 + β 1 x 1ij + b 0i + ɛ ij AED The linear mixed model: modeling hierarchical and longitudinal data 21 of 44

22 the fixed-effect coefficients β 0 and β 1 represent the average within-schools population intercept and slope respectively note: because SES is centered within schools, the intercept β 0 represents the average level of math achievement in the population the model has two variance-covariance components: Var(b 0i ) = d 2 : the variance among school intercepts Var(ɛ ij ) = σ 2 : the error variance around the within-school regressions AED The linear mixed model: modeling hierarchical and longitudinal data 22 of 44

23 R code > fit.model2 <- lmer(mach 1 + cses + (1 school), data = Hsb82) > summary(fit.model2) Linear mixed model fit by REML Formula: mach 1 + cses + (1 school) Data: Hsb82 AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. school (Intercept) Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) cses Correlation of Fixed Effects: (Intr) cses AED The linear mixed model: modeling hierarchical and longitudinal data 23 of 44

24 Model 3: a random-coefficients regression model 1 level-1 predictor (SES, centered within school), no level-2 predictors random intercept and random slopes model for the first (student) level: y ij = α 0i + α 1i cses ij + ɛ ij model for the second (school) level: α 0i = γ 00 + u 0i α 1i = γ 10 + u 1i (the random intercept) (the random slope) the combined model and the Laird-Ware form: y ij = (γ 00 + u 0i ) + (γ 10 + u 1i ) cses ij + ɛ ij = γ 00 + γ 10 cses ij + u 0i + u 1i cses ij + ɛ ij = β 0 + β 1 x 1ij + b 0i + b 0i z 1ij + ɛ ij AED The linear mixed model: modeling hierarchical and longitudinal data 24 of 44

25 the fixed-effect coefficients β 0 and β 1 again represent the average withinschools population intercept and slope respectively the model has four variance-covariance components: Var(b 0i ) = d 2 0: the variance among school intercepts Var(b 1i ) = d 2 1: the variance among school slopes Cov(b 0i, b 1i ) = d 01 : the covariance between within-school intercepts and slopes Var(ɛ ij ) = σ 2 : the error variance around the within-school regressions AED The linear mixed model: modeling hierarchical and longitudinal data 25 of 44

26 R code > fit.model3 <- lmer(mach 1 + cses + (1 + cses school), data = Hsb82) > summary(fit.model3) Linear mixed model fit by REML Formula: mach 1 + cses + (1 + cses school) Data: Hsb82 AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) cses Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) cses Correlation of Fixed Effects: (Intr) cses AED The linear mixed model: modeling hierarchical and longitudinal data 26 of 44

27 Is the random slope necessary? R code > anova(fit.model2, fit.model3) Data: Hsb82 Models: fit.model2: mach 1 + cses + (1 school) fit.model3: mach 1 + cses + (1 + cses school) Df AIC BIC loglik Chisq Chi Df Pr(>Chisq) fit.model fit.model ** --- Signif. codes: 0 *** ** 0.01 * note: you can only compare two models using likelihood-ratio tests if the two models have the same fixed effects! AED The linear mixed model: modeling hierarchical and longitudinal data 27 of 44

28 Model 4: random intercept + level-2 predictor we drop the level-1 predictor (cses), but add a level-2 predictor (meanses) random intercept, no random slopes model for the first (student) level: model for the second (school) level: y ij = α 0i + ɛ ij α 0i = γ 00 + γ 01 meanses i + u 0i the combined model and the Laird-Ware form: y ij = γ 00 + γ 01 meanses i + u 0i + ɛ ij = β 0 + β 1 x 1ij + b 0i + ɛ ij note that in the Laird-Ware notation, we use a double index for the fixedeffects x ij, even if the variabele (meanses) does not change over students AED The linear mixed model: modeling hierarchical and longitudinal data 28 of 44

29 this model has two fixed effects (β 0 and β 1 ) and two random effects (b 0i and ɛ ij ) the interpretation of b 0i has changed: whereas in the previous model it had been the deviation of school i s mean from the grand mean, it now represents the residual (α 0i γ 00 γ 10 meanses j ); correspondingly, the variance component d 2 is now a conditional variance (conditional on the school mean SES) in Raudenbush and Bryk, this is called a Regression with means-as-outcomes model, because the school s mean (α 0i ) is predicted by the means SES of the school note in the following output that the residual variance between schools (2.64) is substantially smaller than the original (8.61) AED The linear mixed model: modeling hierarchical and longitudinal data 29 of 44

30 R code > fit <- lmer(mach 1 + meanses + (1 school), data = Hsb82) > summary(fit) Linear mixed model fit by REML Formula: mach 1 + meanses + (1 school) Data: Hsb82 AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. school (Intercept) Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) meanses Correlation of Fixed Effects: (Intr) meanses AED The linear mixed model: modeling hierarchical and longitudinal data 30 of 44

31 Model 5: random intercept + level-1 predictor + level-2 predictor one level-1 predictor (cses), and a level-2 predictor (meanses) random intercept, nonrandomly varying slopes model for the first (student) level: y ij = α 0i + α 1i cses ij + ɛ ij model for the second (school) level: α 0i = γ 00 + γ 01 meanses i + u 0i α 1i = γ 10 + γ 11 meanses i the combined model and the Laird-Ware form: y ij = (γ 00 + γ 01 meanses i + u 0i ) + (γ 10 + γ 11 meanses i ) cses ij + ɛ ij = γ 00 + γ 01 meanses i + γ 10 cses ij + γ 11 meanses i cses ij + u 0i + ɛ ij = β 0 + β 1 x 1ij + β 2 x 2ij + β 3 (x 1ij x 2ij ) + b 0i + ɛ ij AED The linear mixed model: modeling hierarchical and longitudinal data 31 of 44

32 this model illustrates that in a mixed model, we combine predictors from both levels to construct an interaction term no random slopes are used in this model; the idea is that once we control for the level-2 predictor (meanses), little or no variance in the slopes remains to be explained; the slopes vary across schools, but as a strict function of meanses this results in a random-intercept only model AED The linear mixed model: modeling hierarchical and longitudinal data 32 of 44

33 R code > fit <- lmer(mach 1 + meanses * cses + (1 school), data = Hsb82) > summary(fit) Linear mixed model fit by REML Formula: mach 1 + meanses * cses + (1 school) Data: Hsb82 AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. school (Intercept) Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) meanses cses meanses:cses Correlation of Fixed Effects: (Intr) meanss cses meanses cses meanses:css AED The linear mixed model: modeling hierarchical and longitudinal data 33 of 44

34 Model 6: intercepts-and-slopes-as-outcomes model we expand the model by including two level-2 predictors: meanses and sector; the slopes are again allowed to vary randomly model for the first (student) level: y ij = α 0i + α 1i cses ij + ɛ ij model for the second (school) level: α 0i = γ 00 + γ 01 meanses i + γ 02 sector i + u 0i α 1i = γ 10 + γ 11 meanses i + γ 12 sector i + u 1i AED The linear mixed model: modeling hierarchical and longitudinal data 34 of 44

35 the combined model and the Laird-Ware form: y ij = (γ 00 + γ 01 meanses i + γ 02 sector i + u 0i )+ (γ 10 + γ 11 meanses i + γ 12 sector i + u 1i ) cses ij + ɛ ij = γ 00 + γ 01 meanses i + γ 02 sector i + γ 10 cses ij + γ 11 meanses i cses ij + γ 12 sector i cses ij + u 0i + u 1i cses ij + ɛ ij = β 0 + β 1 x 1ij + β 2 x 2ij + β 3 x 3ij + β 4 (x 1ij x 3ij ) + β 5 (x 2ij x 3ij )+ b 0i + b 1i z 1ij + ɛ ij AED The linear mixed model: modeling hierarchical and longitudinal data 35 of 44

36 R code > fit.model6 <- lmer(mach 1 + meanses * cses + sector * cses + + (1 + cses school), data = Hsb82) > summary(fit.model6) Linear mixed model fit by REML Formula: mach 1 + meanses * cses + sector * cses + (1 + cses school) Data: Hsb82 AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. Corr school (Intercept) cses Residual Number of obs: 7185, groups: school, 160 Fixed effects: Estimate Std. Error t value (Intercept) meanses cses sectorcatholic meanses:cses cses:sectorcatholic Correlation of Fixed Effects: AED The linear mixed model: modeling hierarchical and longitudinal data 36 of 44

37 (Intr) meanss cses sctrct mnss:c meanses cses sectorcthlc meanses:css css:sctrcth Do we need the random slopes? > fit.model6bis <- lmer(mach 1 + meanses * cses + sector * cses + + (1 school), data = Hsb82) > anova(fit.model6bis, fit.model6) Data: Hsb82 Models: fit.model6bis: mach 1 + meanses * cses + sector * cses + (1 school) fit.model6: mach 1 + meanses * cses + sector * cses + (1 + cses school) Df AIC BIC loglik Chisq Chi Df Pr(>Chisq) fit.model6bis fit.model no: apparently, the level-2 predictors do a sufficiently good job of accounting for differences in slopes that the variance component for slopes is no longer needed AED The linear mixed model: modeling hierarchical and longitudinal data 37 of 44

38 1.3 Further issues modern software (for example lmer in R) can also handle partially nested data structures (for example, some students belong to several schools) you should always center level-1 covariates the standard hierarchical model can be extended to include heterogeneous level-1 variance terms in this case: each individual/cluster has a different variance for example: math achievement is more variable among boys than girls heterogeneity of the level-1 variance may be modelled as a function of predictors at both levels in practice, the log of the variance is modelled (to ensure positivity) the presence of heterogeneity of variance at level-1 is often due to misspecification of the level-1 model AED The linear mixed model: modeling hierarchical and longitudinal data 38 of 44

39 2 Modeling Longitudinal Data longitudinal data is very similar to hierarchical data: you typically have two levels: the timepoints (level-1) and the individuals/units (level-2) the focus of the analysis is often to find an appropriate polynomial (linear, quadratic, cubic,... ) that describes the evolution or growth curve of the data over time the components of that polynomial (intercept, slope, quadratic component,... ) may be constant (fixed) or varying (random) across individuals/units a complication in longitudinal data is that it may not be longer reasonable to assume that the level-1 errors ɛ ij are independent, since observations taken close in time on the same individual/units may well be more similar than observations farther apart in time; this is called auto-correlation in the world of linear mixed models, there is a delicate trade-off between adding more random effects to the model (implying more complicated patterns in the covariance matrix) or using a more complicated form for Σ i AED The linear mixed model: modeling hierarchical and longitudinal data 39 of 44

40 2.1 Example: Reaction times in a sleep deprivation study Dependent variable: the average reaction time per day (Reaction) for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. Days Number of days of sleep deprivation Subject Subject number on which the observation was made dataset is available in the lme4 package AED The linear mixed model: modeling hierarchical and longitudinal data 40 of 44

41 Exploring the data: the spaghetti plot 450 Average reaction time (ms) Days of sleep deprivation AED The linear mixed model: modeling hierarchical and longitudinal data 41 of 44

42 Exploring the data: linear regression for each subject Average reaction time (ms) Days of sleep deprivation AED The linear mixed model: modeling hierarchical and longitudinal data 42 of 44

43 A simple linear growth model > fit.linear <- lmer(reaction Days + (Days Subject), data = sleepstudy) > summary(fit.linear) Linear mixed model fit by REML Formula: Reaction Days + (Days Subject) Data: sleepstudy AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. Corr Subject (Intercept) Days Residual Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value (Intercept) Days Correlation of Fixed Effects: (Intr) Days > fit.quad <- lmer(reaction Days + I(Daysˆ2) + (Days Subject), AED The linear mixed model: modeling hierarchical and longitudinal data 43 of 44

44 + data = sleepstudy) > summary(fit.quad) Linear mixed model fit by REML Formula: Reaction Days + I(Daysˆ2) + (Days Subject) Data: sleepstudy AIC BIC loglik deviance REMLdev Random effects: Groups Name Variance Std.Dev. Corr Subject (Intercept) Days Residual Number of obs: 180, groups: Subject, 18 Fixed effects: Estimate Std. Error t value (Intercept) Days I(Daysˆ2) Correlation of Fixed Effects: (Intr) Days Days I(Daysˆ2) AED The linear mixed model: modeling hierarchical and longitudinal data 44 of 44

Performing Cluster Bootstrapped Regressions in R

Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and