Random coefficients models

Size: px
Start display at page:

Download "Random coefficients models"

Transcription

1 enote 9 1 enote 9 Random coefficients models

2 enote 9 INDHOLD 2 Indhold 9 Random coefficients models Introduction Example: Constructed data Simple regression analysis Fixed effects analysis Two step analysis Random coefficient analysis Example: Consumer preference mapping of carrots Random coefficient models in perspective R-TUTORIAL: Constructed data R-TUTORIAL: Consumer preference mapping of carrots Exercises Introduction Random coefficient models emerge as natural mixed model extensions of simple linear regression models in a hierarchical (nested) data setup. In the standard situation, we are interested in the relationship between x and y. Assume we have observations

3 enote INTRODUCTION 3 (x 1, y 1 ),... (x n, y n ) for a subject. Then we would fit the linear regression model, given by y j = α + βx j + ɛ j Assume next that such regression data are available on a number of subjects. Then a model that expresses different regression lines for each subject is expressed by: or using the more general notation: y ij = α i + β i x ij + ɛ ij y i = α(subject i ) + β(subject i )x i + ɛ i (9-1) This model has the same structure as the different slopes ANCOVA model of the previous enote, only now the regression relationships are in focus. Assume finally that the interest lies in the average relationship across subjects. A commonly used ad hoc approach is to employ a two-step procedure: 1. Carry out a regression analysis for each subject. 2. Do subsequent calculations on the parameter estimates from these regression analyzes to obtain the average slope (and intercept) and their standard errors. Since the latter treats the subjects as a random sample, it would be natural to incorporate this in the model, by assuming the subject effects (intercepts and slopes) to be random: where y i = a(subject i ) + b(subject i )x i + ɛ i a(k) N(α, σ 2 a ), b(k) N(β, σ 2 b ), ɛ i N(0, σ 2 ) and where k = 1,..., K with K being the number of subjects. The parameters α and β are the unknown population values for the intercept and slope. This is a mixed model, although a few additional considerations are required to identify the typical mixed model expression. The expected value is and the variance is Ey i = α + βx i Var y i = σ 2 a + σ 2 b x2 i + σ 2 So, an equivalent way of writing the model is the following where the fixed and the random part is split: y i = α + βx i + a(subject i ) + b(subject i )x i + ɛ i (9-2)

4 enote EXAMPLE: CONSTRUCTED DATA 4 where a(k) N(0, σ 2 a ), b(k) N(0, σ 2 b ), ɛ i N(0, σ 2 ) (9-3) Now the linear mixed model structure is apparent. Although we do not always explicitly state this, there is the additional assumption that the random effects a(k), b(k) and ɛ i are mutually independent. For randomly varying lines (a(k), b(k)) in the same x- domain this may be an unreasonable assumption since the slope and intercept values may very well be related to each other. It is possible to extend the model to allow for such a correlation/covariance between the intercept and slope by assuming a bi-variate normal distribution for each set of line parameters: ( ) σ 2 (a(k), b(k)) N(0, a σ ab ), ɛ i N(0, σ 2 ) (9-4) σ ab The model given by (9-2) and (9-4) is the standard random coefficient mixed model. σ 2 b 9.2 Example: Constructed data To illustrate the basic principles we start with two constructed data sets of 100 observations of y for 10 different x-values, see Figure 9.1. It reflects that a raw scatter plot of a data set can be hiding quite different structures, if the data is in fact hierarchical (repeated observations on each individual rather than exactly one observation for each individual) Simple regression analysis Had the data NOT been hierarchical, but in stead observations on 100 subjects, a simple regression analysis, corresponding to the model y i = α + βx i + ɛ i (9-5) where ɛ i N(0, σ 2 ), i = 1,..., 100 would be a reasonable approach. For comparison we state the results of such an analysis for the two data sets. The parameter estimates are: Data 1 Data 2 Parameter Estimate SE P-value Estimate SE P-value σ α β < <0.0001

5 enote EXAMPLE: CONSTRUCTED DATA 5 y y y x x x y y y x x x Figur 9.1: Constructed data: Top: data set 1, bottom: data set 2. Left: Raw scatter plot with simple regression line, middle: Individual patterns, right: individual lines See Figure 9.1(left) for the estimated lines Fixed effects analysis If we had special interest in the 10 subjects, a fixed effects analysis corresponding to model (9-1) could be carried out. The F-tests and P-values from the Type 1 (successive) ANOVA tables become: Data set 1 Data set 2 Source DF F P-value F P-value x < <.0001 subject < x*subject <

6 enote EXAMPLE: CONSTRUCTED DATA 6 For data set 1 the slopes are clearly different whereas for data set 2 the slopes can be assumed equal, but the intercepts (subjects) are different. Although it is usually recommended to rerun the analysis without an insignificant interaction effect, the Type I table shows that the result of this will clearly be that the subject (intercept) effect is significant for data set 2, cf. the discussion of Type I/Type III tables in enote 3. So for data set 1, the (fixed effect) story is told by providing the 10 intercept and slope estimates and/or possibly as described for the different slopes ANCOVA model in the previous enote. For data set 2, an equal slopes ANCOVA model can be used to summarize the results. The common slope and error variance estimates are: ˆβ = , SE ˆβ = , ˆσ2 = The confidence band for the common slope, using the 89 error degrees of freedom becomes ± t (89) which, since t (89) = 1.987, gives [0.9279, ] The subjects could be described and compared as for the common slopes ANCOVA model of the previous enote Two step analysis If the interest is NOT in the individual subjects but rather in the average line, then a natural ad hoc approach is simply to start by calculating the individual intercepts and slopes and then subsequently treat those as simple random samples and calculate average, variance and standard error to obtain confidence limits for the population average values. So e.g. for the slopes we have ˆβ 1,..., ˆβ 10 and calculate the average ˆβ = ˆβ i, i=1 the variance and the standard error s 2ˆβ = i=1 SE ˆβ = ( ˆβ i ˆβ) 2 s ˆβ 10

7 enote EXAMPLE: CONSTRUCTED DATA 7 to obtain the 95% confidence interval: (using that t (9) = 2.26) The variances for data set 1 are: ˆβ ± 2.26SE ˆβ s 2ˆα = , = s2ˆβ and for data set 2: s 2ˆα = , = s2ˆβ The results for the intercepts and slopes for the two data sets are given in the following table: Data set 1 Data set 2 α β α β Average SE Lower Upper Note that for data set 2, the standard error for the slope is almost identical to the standard error from the fixed effect equal slopes model from above. However, due to the smaller degrees of freedom, 9 instead of 89, the confidence band is somewhat larger here. This reflects the difference in interpretation: In the fixed effects analysis the β estimates the common slope for these specific 10 subjects. Here the estimate is of the population average slope (the population from which these 10 subjects were sampled). This distinction does not alter the estimate itself, but does change the statistical inference that is made. Note, by the way, that for estimating the individual lines, it does not make a difference whether an overall different slopes model is used or 10 individual ( small ) regression models separately. Although not used, the observed correlation between the intercepts and slopes in each case can be found: corr 1 = 0.382, corr 2 = 0.655

8 enote EXAMPLE: CONSTRUCTED DATA Random coefficient analysis The results of fitting the random coefficient model given by (9-2) and (9-4) to each data set is given in the following table: Data set 1 Data set 2 α β α β Estimate SE Lower Upper Note that this table is an exact copy of the result table for the two-step analysis above! The parameters of the variance part of the mixed model for data set 1 is estimated at: (read off from R-output) ˆσ a = 4.031, ˆσ b = 0.496, ˆρ ab = 0.38, ˆσ = which corresponds to the following variances: and for data set 2: ˆσ 2 a = 16.25, ˆσ 2 b = 0.246, ˆσ2 = ˆσ a = 1.086, ˆσ b = 0.147, ˆρ ab = 1.00, ˆσ = which corresponds to the following variances: ˆσ 2 a = 1.18, ˆσ 2 b = 0.022, ˆσ2 = Compare with the variances calculated in the two-step procedure: For data set 1, the random coefficient model estimates are slightly smaller, whereas for data set 2, they are considerably smaller. This makes good sense, as the variances in the two-step procedure also will include some additional variation due to the residual error variance (just like the mean squares in a standard hierarchical model). For data set 1, this residual error variance is estimated at a very small value (0.0732) whereas for data set 2 it is This illustrates how the random coefficient model provides the proper story about what is going on, and directly distinguishes between the two quite different situations exemplified here. Note also that for data set 1, the correlation estimate ˆρ ab = 0.38 is close to the observed correlation calculated in the two-step procedure. However, for data set 2 the estimated

9 enote EXAMPLE: CONSTRUCTED DATA 9 correlation becomes ˆρ ab = 1!!! This obviously makes no sense! We encounter a situation similar to the the negative variance problem discussed previously. The correlation may become meaningless when some of the variances are estimated very small, which is the case for the slope variance here. To put it differently, for data set 2 the model we have specified include components (in the variance) that is not actually present in the data. We already new this, since the equal slopes model was a reasonable description of this data. In the random coefficient framework the equal slopes model is expressed by where y i = α + βx i + a(subject i ) + ɛ i (9-6) a(k) N(0, σ 2 a ), ɛ i N(0, σ 2 ) (9-7) The adequacy of this model can be tested by a residual likelihood ratio test, cf. enote 5. For data set 2 we obtain G = 2l REML,1 ( 2l REML,2 ) = 0.65 which is non-significant using a χ 2 distribution with 2 degrees of freedom. For data set 1 the similar test becomes which is extremely significant. G = 2l REML,1 ( 2l REML,2 ) = For data set 2 the conclusions should be based on the equal slopes model given by (9-6) and (9-7), and we obtain the following: Data set 2 α β Estimate SE Lower Upper We see a minor change in the confidence bands: believing in equal slopes increases the (estimated) precision (smaller confidence interval) for this slope, whereas the precision of the average intercept decreases.

10 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS Example: Consumer preference mapping of carrots In a consumer study 103 consumers scored their preference of 12 danish carrot types on a scale from 1 to 7. The carrots were harvested in autumn 1996 and tested in march A number of background information variables were recorded for each consumer, see the data description in enote 13 for details. Data are available in the carrots.txt file. The aim of a so-called external preference mapping is to find the sensory drivers of the consumer preference behaviour and to investigate if these are different in different segments of the population. To do this, in addition to the consumer survey, the carrot products are evaluated by a trained panel of tasters, the sensory panel, with respect to a number of sensory (taste, odour and texture) properties. Since usually a high number of (correlated) properties (variables) are used, in this case 14, it is a common procedure to use a few, often 2, combined variables that contain as much of the information in the sensory variables as possible. This is achieved by extracting the first two principal components in a principal components analysis (PCA) on the product-by-property panel average data matrix. PCA is a commonly used multivariate technique to explore and/or decompose high dimensional data. We call these two variables sens1 and sens2 and they are given by sens1 i = 14 a j v i 14 j and sens2 i = b j v i j j=1 j=1 where v1 i,..., vi 14 are the 14 average sensory scores for carrot product i and the coefficients a j and b j defining the two combined sensory variables are as depicted in Figure 9.2. So sens1 is a variable that (primarily) measures bitterness vs. nutty taste whereas sens2 measures sweetness (and related properties). The actual preference mapping is carried out by first fitting regression models for the preference as a function of the sensory variables for each individual consumer using the 12 observations across the carrot products. Next, the individual regression coefficients are investigated, often in an explorative manner in which a scatter plot is used to look for a possible segmentation of consumers in these regression coefficients. In stead of looking for segmentation ( Cluster analysis ) we investigate whether we see any differences with respect to the background variables in the data, e.g. the gender or homesize (number of persons in the household). Let y i be the ith preference score. The natural model for this is a model that expresses randomly varying individual relations to the sensory variables, but with average (expected) values that may depend on the homesize. Let us consider the factor structure of the setting. The basic setting is a randomized block experiment with 12 treatments (carrot products), the factor prod, and 103 blocks

11 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS 11 sweet_ta fruit_ta Sens nut_ta carrot_af juicy colour bitter_ta car_od bitter_af earty_od crisp earthy_ta transp hard Sens1 Figur 9.2: Loadings plot for PCA of sensory variables: Scatter plot of coefficients b j versus a j. (consumers), the factor cons. Homesize (size) is a factor that partitions the consumers into two groups, those with homesize of 1 or 2, and those with a larger homesize. So the factor cons is nested within size, or equivalently size is coarser than cons. This basic structure is depicted in Figure 9.3. The linear effect of the sensory variables is a part of the prod effect, since these covariates are on product level. So they are both coarser than the product effect. The sensory

12 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS [prod] [I] [cons] 101 size 1 2 Figur 9.3: The factor structure diagram for the carrots data variables in the model will therefore explain some of the product differences. Including prod in the model as well will enable us to test whether the sensory variables can explain all the product differences. As we do not expect this to be the case, we adopt the point of view that the 12 carrot products is a random sample from the population of carrot products in Denmark, that is, the product effect is considered as a random effect. In other words, we consider the deviations in the product variation from what can be explained by the regression on the sensory variables, as random variation. Finally, the interactions between homesize and the sensory variables should enter the model as fixed effects, allowing for different average slopes for the two homesizes, leading to the model given by where y i = α(size i ) + β 1 (size i ) sens1 i + β 2 (size i ) sens2 i + a(cons i ) +b 1 (cons i ) sens1 i + b 2 (cons i ) sens2 i + d(prod i ) + ɛ i (9-8) a(k) N(0, σa 2 ), b 1 (k) N(0, σb1 2 ), b 2(k) N(0, σb2 2 ), k = 1, (9-9) and d(prod i ) N(0, σ 2 P ), ɛ i N(0, σ 2 ) (9-10)

13 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS 13 To finish the specification of a general random coefficient model, we need the assumption of the possibility of correlations between the random coefficients: σa 2 σ ab1 σ ab2 (a(k), b 1 (k), b 2 (k)) N(0, σ ab1 σb 2 1 σ b1 b 2 ) (9-11) σ ab2 σ b1 b 2 σb 2 2 Before studying the fixed effects, the variance part of the model is investigated further. We give details in the R-TUTORIAL section on how we end up simplifying this 8-parameter variance model down to the 5-parameter variance model, where the σ 2 b 1 - parameter and the two related correlations can be tested non-significant. The model therefore reduces to: y i = α(size i ) + β 1 (size i ) sens1 i + β 2 (size i ) sens2 i + a(cons i ) +b 2 (cons i ) sens2 i + d(prod i ) + ɛ i (9-12) where and [ σ 2 ] (a(k), b 2 (k)) N(0, a σ ab2 σ ab2 σb 2 ), k = 1, (9-13) 2 d(prod i ) N(0, σ 2 P ), ɛ i N(0, σ 2 ) (9-14) With this variance structure, we investigate the fixed effects. Successively removing insignificant terms we find that the following final model is an appropriate description of the data: y i = α(size i ) + β 2 sens2 i + a(cons i ) +b 2 (cons i ) sens2 i + d(prod i ) + ɛ i (9-15) where and [ σ 2 ] (a(k), b 2 (k)) N(0, a σ ab2 σ ab2 σb 2 ), k = 1, (9-16) 2 d(prod i ) N(0, σ 2 P ), ɛ i N(0, σ 2 ) (9-17) Estimates of the variance-parameters are given in the following table: ˆσ a ˆσ b ˆρ ab ˆσ P ˆσ

14 enote RANDOM COEFFICIENT MODELS IN PERSPECTIVE 14 The conclusions regarding the relation between the preference and the sensory variables are that no significant relation was found to sens1, but indeed so for sens2. The relation does not depend on the homesize and is estimated (with 95% confidence interval) to: ˆβ 2 = 0.071, [0.033, 0.107] So two products with a difference of 10 in the 2nd sensory dimension (this is the span in the data set) are expected to differ in average preference with between 0.33 and Sweet products are preferred to non-sweet products, cf. Figure 9.2 above. The expected values for the two homesizes (for an average product) and their differences are estimated at: ˆα(1) + ˆβ 2 sens2 = 4.91, [4.73, 5.09] ˆα(2) + ˆβ 2 sens2 = 4.66, [4.47, 4.85] ˆα(1) ˆα(2) = 0.25, [0.04, 0.46] So homes with more persons tend to have a slightly lower preference in general for such carrot products. 9.4 Random coefficient models in perspective Although the factor structure diagrams with all the features of finding expected mean squares and degrees of freedom are only strictly valid for balanced designs and models with no quantitative covariates, they may still be useful as a more informal structure visualization tool for these non-standard situations. The setting with hierarchical regression data is really an example of what also could be characterized as repeated measures data. A common situation is that repeated measurements on a subject (animal, plant, sample) are taken over time then also known as longitudinal data. So apart from appearing as natural extensions of fixed regression models, the random coefficient models are one option for analyzing repeated measures data. The simple models can be extended to polynomial models to cope with non-linear structures in the data. Also additional residual correlation structures can be incorporated. In enotes 11 and 12 a thorough treatment of repeated measures data is given with a number of different methods simple as well as more complex approaches. 9.5 R-TUTORIAL: Constructed data The constructed data are available in the file randcoef.txt.

15 enote R-TUTORIAL: CONSTRUCTED DATA 15 The simple linear regression analyses of the two response y1 and y2 in the data set randcoef are obtained using lm: randcoef <- read.table("randcoef.txt", sep=",", header=true) randcoef$subject <- factor(randcoef$subject) model1y1 <- lm(y1 ~ x, data = randcoef) model1y2 <- lm(y2 ~ x, data = randcoef) The parameter estimates with corresponding standard errors in the two models are: coef(summary(model1y1)) Estimate Std. Error t value Pr(> t ) (Intercept) e-22 x e-09 coef(summary(model1y2)) Estimate Std. Error t value Pr(> t ) (Intercept) e-12 x e-11 The raw scatter plots for the data with superimposed regression lines are obtained using the plot and abline functions: par(mfrow=c(1, 2)) with(randcoef, { plot(x, y1, las=1) abline(model1y1) plot(x, y2, las=1) abline(model1y2) })

16 enote R-TUTORIAL: CONSTRUCTED DATA x y x y2 par(mfrow=c(1, 1)) The individual patterns in the data can be seen from the next plot: par(mfrow=c(1, 2)) with(randcoef, { plot(x, y1, las=1) for (i in 1:10) lines(x[subject==i], y1[subject==i], lty=i) plot(x, y2, las=1) for (i in 1:10) lines(x[subject==i], y2[subject==i], lty=i) })

17 enote R-TUTORIAL: CONSTRUCTED DATA x y x y2 par(mfrow=c(1, 1)) The function lines connects points with line segments. Notice how the repetetive plotting is solved using a for loop: For each i between 1 and 10 the relevant subset of the data is plotted with a line type that changes as the subject changes. Alternatively we could have used 10 lines lines for each response. The fixed effects analysis with the two resulting (type III) ANOVA tables are: model2y1 <- lm(y1 ~ x*subject, data = randcoef) model2y2 <- lm(y2 ~ x*subject, data = randcoef) library(car) # for Anova Anova(model2y1, type=3) Anova Table (Type III tests) Response: y1

18 enote R-TUTORIAL: CONSTRUCTED DATA 18 Sum Sq Df F value Pr(>F) (Intercept) < 2.2e-16 *** x < 2.2e-16 *** subject < 2.2e-16 *** x:subject < 2.2e-16 *** Residuals Signif. codes: 0 *** ** 0.01 * Anova(model2y2, type=3) Anova Table (Type III tests) Response: y2 Sum Sq Df F value Pr(>F) (Intercept) * x * subject x:subject Residuals Signif. codes: 0 *** ** 0.01 * A plot of the data with individual regression lines based on model2y1 and model2y2 is again produced using a for loop. First we fit the two models in a different parameterisation (to obtain the estimates in a convenient form of one intercept and one slope per subject) model3y1 <- lm(y1 ~ subject x:subject, data = randcoef) model3y2 <- lm(y2 ~ subject x:subject, data = randcoef) The plots are produced using:

19 enote R-TUTORIAL: CONSTRUCTED DATA 19 par(mfrow=c(1, 2)) with(randcoef, { plot(x, y1, las=1) for (i in 1:10) abline(coef(model3y1)[c(i, i+10)], lty=i) plot(x, y2, las=1) for (i in 1:10) abline(coef(model3y2)[c(i, i+10)], lty=i) }) x y x y2 par(mfrow=c(1, 1)) Explanation: Remember that coef extracts the parameter estimates. Now the first 10 estimates will be the intercept estimates and the next 10 will be the slope estimates. Thus the component pairs (1, 11), (2, 12),..., (10, 20) will be belong to the subjects 1, 2,..., 10, respectively. This is exploited in the for loop in the part [c(i, i+10)] which produces these pairs as i runs from 1 to 10. The equal slopes model for the second data set with parameter estimates is:

20 enote R-TUTORIAL: CONSTRUCTED DATA 20 model4y2 <- lm(y2 ~ subject + x, data = randcoef) coef(summary(model4y2)) Estimate Std. Error t value Pr(> t ) (Intercept) e-04 subject e-02 subject e-01 subject e-01 subject e-02 subject e-01 subject e-01 subject e-01 subject e-03 subject e-01 x e-13 The summary of the two step analysis can be obtained using the functions mean and sd (computing empirical mean and standard deviation of a vector, respectively) to the vector of intercept estimates and to the vector of slope estimates (from the different slopes models). Here it is shown for data set 1, but it is done similarly for data set 2: ainty1 <- mean(coef(model3y1)[1:10]) sdinty1 <- sd(coef(model3y1)[1:10])/sqrt(10) uinty1 <- ainty * sdinty1 linty1 <- ainty * sdinty1 asloy1 <- mean(coef(model3y1)[11:20]) sdsloy1 <- sd(coef(model3y1)[11:20])/sqrt(10) usloy1 <- asloy * sdsloy1 lsloy1 <- asloy * sdsloy1 The correlations between intercepts and between slopes in the two data set are computed using corr cor(coef(model3y1)[1:10], coef(model3y1)[11:20]) [1]

21 enote R-TUTORIAL: CONSTRUCTED DATA 21 cor(coef(model3y2)[1:10], coef(model3y2)[11:20]) [1] The random coefficients analysis is done with lmer. The different slopes random coefficient model is: library(lmertest) model5y1 <- lmer(y1 ~ x + (1 + x subject), data = randcoef) model5y2 <- lmer(y2 ~ x + (1 + x subject), data = randcoef) The random part of the model specification, (1 + x subject) specifies that the regression model 1 + x, i.e. intercept and slope for x, should be allowed for each subject. This corresponds to the random part in formula (9.2). The (fixed effects) parameter estimates and their standard errors are obtained from the model summary: coef(summary(model5y1)) Estimate Std. Error df t value Pr(> t ) (Intercept) e-05 x e-04 coef(summary(model5y2)) Estimate Std. Error df t value Pr(> t ) (Intercept) e-08 x e-09 The variance parameter, including the correlations between intercept and slope, estimates are obtained using: VarCorr(model5y1) Groups Name Std.Dev. Corr

22 enote R-TUTORIAL: CONSTRUCTED DATA 22 subject (Intercept) x Residual VarCorr(model5y2) Groups Name Std.Dev. Corr subject (Intercept) x Residual The equal slopes models within the random coefficient framework are specified as model6y1 <- lmer(y1 ~ x + (1 subject), data = randcoef) model6y2 <- lmer(y2 ~ x + (1 subject), data = randcoef) Likelihood ratio tests for reduction from different slopes to equal slopes can be obtained using anova with two lmer objects as arguments (the first argument (model) is less general than the second argument (model)): anova(model6y1, model5y1, refit=false) Data: randcoef Models: object: y1 ~ x + (1 subject)..1: y1 ~ x + (1 + x subject) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) object < 2.2e-16 *** --- Signif. codes: 0 *** ** 0.01 * anova(model6y2, model5y2, refit=false) Data: randcoef Models:

23 enote R-TUTORIAL: CONSTRUCTED DATA 23 object: y2 ~ x + (1 subject)..1: y2 ~ x + (1 + x subject) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) object Confidence intervals for the final model for y2 are: pr <- profile(model6y2, which=1:2, signames=false) confint(pr) 2.5 % 97.5 % sd_(intercept) subject sigma For y1, the profile function for the current version of the lme4 package (version ) generates numerous warning messages and ends up not converging so we do not show any results: pr <- profile(model6y1, which=1:2, signames=false) confint(pr) An alternative is to compute simulation based bootstrap confidence intervals: ci <- confint(model5y1, parm=1:4, method="boot", nsim=1000, oldnames=false) Computing bootstrap confidence intervals... ci 2.5 % 97.5 % sd_(intercept) subject cor_x.(intercept) subject sd_x subject sigma

24 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 24 Here we use nsim=1000 simulations or bootstrap samples but in practice we should use 10 or 100 times as many for a reasonable accuracy. Note also that the confidence limit will vary from run to run as different random numbers are simulated. The (fixed effects) parameter estimates for the final model for data set 2 are: coef(summary(model6y2)) Estimate Std. Error df t value Pr(> t ) (Intercept) e-08 x e R-TUTORIAL: Consumer preference mapping of carrots Data are available in the file carrots.txt carrots <- read.table("carrots.txt", header = TRUE, sep = ",") carrots <- within(carrots, { Homesize <- factor(homesize) Consumer <- factor(consumer) product <- factor(product) }) Recall that the most general model (9.8) to (9.11) states that for each level of Consumer the random intercept and random slopes of sens1 and sens2 are correlated in an arbitrary way (the specification in (9.11)). This model can be specified as follows: model1 <- lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 + sens1 + sens2 Consumer), data=carrots) print(summary(model1), corr=false) Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [lmermod] Formula: Preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 + sens1 + sens2 Consumer)

25 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 25 Data: carrots REML criterion at convergence: Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. Corr Consumer (Intercept) sens sens product (Intercept) Residual Number of obs: 1233, groups: Consumer, 103; product, 12 Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) < 2e-16 *** Homesize * sens sens ** Homesize3:sens Homesize3:sens Signif. codes: 0 *** ** 0.01 * The random part deserves some explanation. The structure (9.11) amounts to the term (1 + sens1 + sens2 Consumer), where for each consumer we fit an intercept and two slopes, one for each of sens1 and sens2. Further, these three terms are allowed to be arbitrarily correlated. In addition there is the random effect for product. There are two relevant sub-models to to consider in order to assess or simplify the random-effects structure of the model. The first sub-model would reduce the general random-effects structure for Consumer from (1 + sens1 + sens2 Consumer) to (1 + sens1 Consumer); the other sub-model would reduce it to (1 + sens2 Consumer). We can assess each of these with likelihood ratio tests as follows here exemplified for

26 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 26 the first sub-model: model2 <- lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 + sens1 Consumer), data=carrots) anova(model1, model2, refit=false) Data: carrots Models:..1: Preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize *..1: sens2 + (1 product) + (1 + sens1 Consumer) object: Preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * object: sens2 + (1 product) + (1 + sens1 + sens2 Consumer) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) object * --- Signif. codes: 0 *** ** 0.01 * This test i significant, so leaving out sens2 is not warrented. Note that this test is on 3 degrees of freedom: the variance for sens2 and the two covariances with the intercept and sens1. The rand function in the lmertest package automates the likelihood ratio tests of randomeffects terms and provides an ANOVA-like summary table: rand(model1) Analysis of Random effects Table: Chi.sq Chi.DF p.value product e-05 *** sens1:consumer sens2:consumer * --- Signif. codes: 0 *** ** 0.01 * This shows that the sens1 random-effect for consumer is not significant and we can simplify the model.

27 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 27 We can now fit the reduced model and check that the random-effect structure cannot be simplified any further: model3 <- lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 + sens2 Consumer), data=carrots) rand(model3) Analysis of Random effects Table: Chi.sq Chi.DF p.value product e-05 *** sens2:consumer * --- Signif. codes: 0 *** ** 0.01 * Note that it is possible to fit a model where we enforce independence of the random intercepts and slopes for consumer, i.e. we fix the correlation between these terms to zero as the following code illustrates. We do not show the results of this model and we warn against fitting and interpreting such models the reason is given in the following remark. lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 Consumer) + (-1 + sens2 Consumer), data=carrots)

28 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 28 Remark 9.1 Random coefficient correlations Correlations between random intercepts and slopes should always be retained in the model. The reason is that the model is only invariant to a shift in origin of the covariate if the correlation between the random intercept and slope is included in the model and estimated from the data. For example, if the covariate is temperature and we omit the correlation, then we would obtain different models depending on whether temperature was measured in, say, Kelvin, Celcius or Fahrenheit. Since we want our models to be invariant to arbitrary aspects such as the unit of measurement, we need to include correlations between random intercepts and slopes in our randomcoefficient models. It also means that a likelihood ratio test of the correlation parameter is usually not meaningful since the size of the test statistic, and consequently also the size of the p value, depends on shifts in the origin of the covariate. In conclusion, the correlation parameters are necessary for the models to make sence and we should not attempt to fix them at zero or test their significance. After having reduced the covariance structure in the model, we turn attention to the mean structure, i.e. the fixed effects. After successively removing insignificant terms, we find that the following model is an appropriate description of the data: model4 <- lmer(preference ~ Homesize + sens2 + (1 product) + (1 + sens2 Consumer), data=carrots) anova(model4) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) Homesize * sens ** --- Signif. codes: 0 *** ** 0.01 * Estimates of the variance parameters are obtained with:

29 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 29 VarCorr(model4) Groups Name Std.Dev. Corr Consumer (Intercept) sens product (Intercept) Residual and their confidence intervals with: confint(model4, parm=1:5, oldnames=false) Computing profile confidence intervals % 97.5 % sd_(intercept) Consumer cor_sens2.(intercept) Consumer sd_sens2 Consumer sd_(intercept) product sigma LS-means and the difference of these for Homesize are obtained with (lms_size <- lsmeans::lsmeans(model4, "Homesize")) Homesize lsmean SE df lower.cl upper.cl Degrees-of-freedom method: satterthwaite Confidence level used: 0.95 confint(pairs(lms_size)) contrast estimate SE df lower.cl upper.cl Confidence level used: 0.95

30 enote EXERCISES 30 and confidence interval for the slope of sens2 can be extracted with: lstrends(model4, specs="1", var="sens2") 1 sens2.trend SE df lower.cl upper.cl overall Results are averaged over the levels of: Homesize Degrees-of-freedom method: satterthwaite Confidence level used: Exercises

Random coefficients models

Random coefficients models enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information

A short explanation of Linear Mixed Models (LMM)

A short explanation of Linear Mixed Models (LMM) A short explanation of Linear Mixed Models (LMM) DO NOT TRUST M ENGLISH! This PDF is downloadable at "My learning page" of http://www.lowtem.hokudai.ac.jp/plantecol/akihiro/sumida-index.html ver 20121121e

More information

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall

More information

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

Introduction to mixed-effects regression for (psycho)linguists

Introduction to mixed-effects regression for (psycho)linguists Introduction to mixed-effects regression for (psycho)linguists Martijn Wieling Department of Humanities Computing, University of Groningen Groningen, April 21, 2015 1 Martijn Wieling Introduction to mixed-effects

More information

Practical 4: Mixed effect models

Practical 4: Mixed effect models Practical 4: Mixed effect models This practical is about how to fit (generalised) linear mixed effects models using the lme4 package. You may need to install it first (using either the install.packages

More information

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,

More information

lme for SAS PROC MIXED Users

lme for SAS PROC MIXED Users lme for SAS PROC MIXED Users Douglas M. Bates Department of Statistics University of Wisconsin Madison José C. Pinheiro Bell Laboratories Lucent Technologies 1 Introduction The lme function from the nlme

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

Output from redwing2.r

Output from redwing2.r Output from redwing2.r # redwing2.r library(lsmeans) library(nlme) #library(lme4) # not used #library(lmertest) # not used library(multcomp) # get the data # you may want to change the path to where you

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

PSY 9556B (Feb 5) Latent Growth Modeling

PSY 9556B (Feb 5) Latent Growth Modeling PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

Performing Cluster Bootstrapped Regressions in R

Performing Cluster Bootstrapped Regressions in R Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Package simr. April 30, 2018

Package simr. April 30, 2018 Type Package Package simr April 30, 2018 Title Power Analysis for Generalised Linear Mixed Models by Simulation Calculate power for generalised linear mixed models, using simulation. Designed to work with

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test

Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test 1 / 42 Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test Søren Højsgaard 1 Ulrich Halekoh 2 1 Department of Mathematical Sciences Aalborg University, Denmark sorenh@math.aau.dk

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation: Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

Statistical Analysis of Series of N-of-1 Trials Using R. Artur Araujo

Statistical Analysis of Series of N-of-1 Trials Using R. Artur Araujo Statistical Analysis of Series of N-of-1 Trials Using R Artur Araujo March 2016 Acknowledgements I would like to thank Boehringer Ingelheim GmbH for having paid my tuition fees at the University of Sheffield

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References... Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

MAGMA joint modelling options and QC read-me (v1.07a)

MAGMA joint modelling options and QC read-me (v1.07a) MAGMA joint modelling options and QC read-me (v1.07a) This document provides a brief overview of the (application of) the different options for conditional, joint and interaction analysis added in version

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Additional Issues: Random effects diagnostics, multiple comparisons

Additional Issues: Random effects diagnostics, multiple comparisons : Random diagnostics, multiple Austin F. Frank, T. Florian April 30, 2009 The dative dataset Original analysis in Bresnan et al (2007) Data obtained from languager (Baayen 2008) Data describing the realization

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

Exploratory model analysis

Exploratory model analysis Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

An Experiment in Visual Clustering Using Star Glyph Displays

An Experiment in Visual Clustering Using Star Glyph Displays An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master

More information

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison 610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because

More information

Package lmertest. September 21, 2013

Package lmertest. September 21, 2013 Package lmertest September 21, 2013 Type Package Title Tests for random and fixed effects for linear mixed effect models (lmer objects of lme4 package). Version 2.0-0 Date 2012-01-09 Author Alexandra Kuznetsova,

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41 1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

Package lmertest. November 30, 2017

Package lmertest. November 30, 2017 Type Package Title Tests in Linear Mixed Effects Models Version 2.0-36 Package lmertest November 30, 2017 Depends R (>= 3.0.0), Matrix, stats, methods, lme4 (>= 1.0) Imports plyr, MASS, Hmisc, ggplot2

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA ECL 290 Statistical Models in Ecology using R Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA Datasets in this problem set adapted from those provided

More information

Study Guide. Module 1. Key Terms

Study Guide. Module 1. Key Terms Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 1 1 / 31 It is possible to have many factors in a factorial experiment. We saw some three-way factorials earlier in the DDD book (HW 1 with 3 factors:

More information

Unit 5 Logistic Regression Practice Problems

Unit 5 Logistic Regression Practice Problems Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Stat 4510/7510 Homework 4

Stat 4510/7510 Homework 4 Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Factorial ANOVA. Skipping... Page 1 of 18

Factorial ANOVA. Skipping... Page 1 of 18 Factorial ANOVA The potato data: Batches of potatoes randomly assigned to to be stored at either cool or warm temperature, infected with one of three bacterial types. Then wait a set period. The dependent

More information