Random coefficients models

Size: px
Start display at page:

Download "Random coefficients models"

Transcription

1 enote 9 1 enote 9 Random coefficients models

2 enote 9 INDHOLD 2 Indhold 9 Random coefficients models Introduction Example: Constructed data Simple regression analysis Fixed effects analysis Two step analysis Random coefficient analysis Example: Consumer preference mapping of carrots Random coefficient models in perspective R-TUTORIAL: Constructed data R-TUTORIAL: Consumer preference mapping of carrots Exercises Introduction Random coefficient models emerge as natural mixed model extensions of simple linear regression models in a hierarchical (nested) data setup. In the standard situation, we are interested in the relationship between x and y. Assume we have observations

3 enote INTRODUCTION 3 (x 1, y 1 ),... (x n, y n ) for a subject. Then we would fit the linear regression model, given by y j = α + βx j + ɛ j Assume next that such regression data are available on a number of subjects. Then a model that expresses different regression lines for each subject is expressed by: or using the more general notation: y ij = α i + β i x ij + ɛ ij y i = α(subject i ) + β(subject i )x i + ɛ i (9-1) This model has the same structure as the different slopes ANCOVA model of the previous module, only now the regression relationships are in focus. Assume finally that the interest lies in the average relationship across subjects. A commonly used ad hoc approach is to employ a two-step procedure: 1. Carry out a regression analysis for each subject. 2. Do subsequent calculations on the parameter estimates from these regression analyzes to obtain the average slope (and intercept) and their standard errors. Since the latter treats the subjects as a random sample, it would be natural to incorporate this in the model, by assuming the subject effects (intercepts and slopes) to be random: where y i = a(subject i ) + b(subject i )x i + ɛ i a(k) N(α, σ 2 a ), b(k) N(β, σ 2 b ), ɛ i N(0, σ 2 ) and where k = 1,..., K with K being the number of subjects. The parameters α and β are the unknown population values for the intercept and slope. This is a mixed model, although a few additional considerations are required to identify the typical mixed model expression. The expected value is and the variance is Ey i = α + βx i Vary i = σ 2 a + σ 2 b x2 i + σ 2 So, an equivalent way of writing the model is the following where the fixed and the random part is split: y i = α + βx i + a(subject i ) + b(subject i )x i + ɛ i (9-2)

4 enote EXAMPLE: CONSTRUCTED DATA 4 where a(k) N(0, σ 2 a ), b(k) N(0, σ 2 b ), ɛ i N(0, σ 2 ) (9-3) Now the linear mixed model structure is apparent. Although we do not always explicitly state this, there is the additional assumption that the random effects a(k), b(k) and ɛ i are mutually independent. For randomly varying lines (a(k), b(k)) in the same x- domain this may be an unreasonable assumption since the slope and intercept values may very well be related to each other. It is possible to extend the model to allow for such a correlation/covariance between the intercept and slope by assuming a bi-variate normal distribution for each set of line parameters: ( ) σ 2 (a(k), b(k)) N(0, a σ ab ), ɛ i N(0, σ 2 ) (9-4) σ ab The model given by (9-2) and (9-4) is the standard random coefficient mixed model. σ 2 b 9.2 Example: Constructed data To illustrate the basic principles we start with two constructed data sets of 100 observations of y for 10 different x-values, see figure 9.1. It reflects that a raw scatter plot of a data set can be hiding quite different structures, if the data is in fact hierarchical (repeated observations on each individual rather than exactly one observation for each individual) Simple regression analysis Had the data NOT been hierarchical, but in stead observations on 100 subjects, a simple regression analysis, corresponding to the model y i = α + βx i + ɛ i (9-5) where ɛ i N(0, σ 2 ), i = 1,..., 100 would be a reasonable approach. For comparison we state the results of such an analysis for the two data sets. The parameter estimates are: Data 1 Data 2 Parameter Estimate SE P-value Estimate SE P-value σ α β < <0.0001

5 enote EXAMPLE: CONSTRUCTED DATA 5 See figure 9.1(left) for the estimated lines Fixed effects analysis If we had special interest in the 10 subjects, a fixed effects analysis corresponding to model (9-1) could be carried out. The F-tests and P-values from the Type 1 (successive) ANOVA tables become: Data set 1 Data set 2 Source DF F P-value F P-value x < <.0001 subject < x*subject < y y y x x x y y y x x x Figur 9.1: Constructed data: Top: data set 1, bottom: data set 2. Left: Raw scatter plot with simple regression line, middle: Individual patterns, right: individual lines

6 enote EXAMPLE: CONSTRUCTED DATA 6 For data set 1 the slopes are clearly different whereas for data set 2 the slopes can be assumed equal, but the intercepts (subjects) are different. Although it is usually recommended to rerun the analysis without an insignificant interaction effect, the Type I table shows that the result of this will clearly be that the subject (intercept) effect is significant for data set 2, cf. the discussion of Type I/Type III tables in Module 3. So for data set 1, the (fixed effect) story is told by providing the 10 intercept and slope estimates and/or possibly as described for the different slopes ANCOVA model in the previous module. For data set 2, an equal slopes ANCOVA model can be used to summarize the results. The common slope and error variance estimates are: ˆβ = , SE ˆβ = , ˆσ2 = The confidence band for the common slope, using the 89 error degrees of freedom becomes ± t (89) which, since t (89) = 1.987, gives [0.9279, ] The subjects could be described and compared as for the common slopes ANCOVA model of the previous module Two step analysis If the interest is NOT in the individual subjects but rather in the average line, then a natural ad hoc approach is simply to start by calculating the individual intercepts and slopes and then subsequently treat those as simple random samples and calculate average, variance and standard error to obtain confidence limits for the population average values. So e.g. for the slopes we have ˆβ 1,..., ˆβ 10 and calculate the average ˆβ = ˆβ i, i=1 the variance and the standard error s 2ˆβ = i=1 SE ˆβ = ( ˆβ i ˆβ) 2 s ˆβ 10

7 enote EXAMPLE: CONSTRUCTED DATA 7 to obtain the 95% confidence interval: (using that t (9) = 2.26) The variances for data set 1 are: ˆβ ± 2.26SE ˆβ s 2ˆα = , = s2ˆβ and for data set 2: s 2ˆα = , = s2ˆβ The results for the intercepts and slopes for the two data sets are given in the following table: Data set 1 Data set 2 α β α β Average SE Lower Upper Note that for data set 2, the standard error for the slope is almost identical to the standard error from the fixed effect equal slopes model from above. However, due to the smaller degrees of freedom, 9 instead of 89, the confidence band is somewhat larger here. This reflects the difference in interpretation: In the fixed effects analysis the β estimates the common slope for these specific 10 subjects. Here the estimate is of the population average slope (the population from which these 10 subjects were sampled). This distinction does not alter the estimate itself, but does change the statistical inference that is made. Note, by the way, that for estimating the individual lines, it does not make a difference whether an overall different slopes model is used or 10 individual ( small ) regression models separately. Although not used, the observed correlation between the intercepts and slopes in each case can be found: corr 1 = 0.382, corr 2 = 0.655

8 enote EXAMPLE: CONSTRUCTED DATA Random coefficient analysis The results of fitting the random coefficient model given by (9-2) and (9-4) to each data set is given in the following table: Data set 1 Data set 2 α β α β Estimate SE Lower Upper Note that this table is an exact copy of the result table for the two-step analysis above! The parameters of the variance part of the mixed model for data set 1 is estimated at: (read off from R-output) ˆσ a = 4.031, ˆσ b = 0.496, ˆρ ab = 0.38, ˆσ = which corresponds to the following variances: and for data set 2: ˆσ 2 a = 16.25, ˆσ 2 b = 0.246, ˆσ2 = ˆσ a = 1.086, ˆσ b = 0.145, ˆρ ab = 1.00, ˆσ = which corresponds to the following variances: ˆσ 2 a = 1.18, ˆσ 2 b = 0.021, ˆσ2 = Compare with the variances calculated in the two-step procedure: For data set 1, the random coefficient model estimates are slightly smaller, whereas for data set 2, they are considerably smaller. This makes good sense, as the variances in the two-step procedure also will include some additional variation due to the residual error variance (just like the mean squares in a standard hierarchical model). For data set 1, this residual error variance is estimated at a very small value (0.0732) whereas for data set 2 it is This illustrates how the random coefficient model provides the proper story about what is going on, and directly distinguishes between the two quite different situations exemplified here. Note also that for data set 1, the correlation estimate ˆρ ab = 0.38 is close to the observed correlation calculated in the two-step procedure. However, for data set 2 the

9 enote EXAMPLE: CONSTRUCTED DATA 9 estimated correlation becomes ˆρ ab = 1!!! This obviously makes no sense! We encounter a situation similar to the the negative variance problem discussed previously. The correlation may become meaningless when some of the variances are estimated very small, which is the case for the slope variance here. To put it differently, for data set 2 the model we have specified include components (in the variance) that is not actually present in the data. We already new this, since the equal slopes model was a reasonable description of this data. In the random coefficient framework the equal slopes model is expressed by y i = α + βx i + a(subject i ) + ɛ i (9-6) where a(k) N(0, σ 2 a ), ɛ i N(0, σ 2 ) (9-7) The adequacy of this model can be tested by a residual likelihood ratio test, cf. Module 5. For data set 2 we obtain G = 2l REML,1 ( 2l REML,2 ) = 0.65 which is non-significant using a χ 2 distribution with 2 degrees of freedom. For data set 1 the similar test becomes which is extremely significant. G = 2l REML,1 ( 2l REML,2 ) = For data set 2 the conclusions should be based on the equal slopes model given by (9-6) and (9-7), and we obtain the following: Data set 2 α β Estimate SE Lower Upper We see a minor change in the confidence bands: believing in equal slopes increases the (estimated) precision (smaller confidence interval) for this slope, whereas the precision of the average intercept decreases.

10 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS Example: Consumer preference mapping of carrots In a consumer study 103 consumers scored their preference of 12 danish carrot types on a scale from 1 to 7. The carrots were harvested in autumn 1996 and tested in march A number of background information variables were recorded for each consumer, see the data description in Module 13 for details. The data file can be downloaded as: carrots.txt and is described also in enote13. The aim of a so-called external preference mapping is to find the sensory drivers of the consumer preference behaviour and to investigate if these are different in different segments of the population. To do this, in addition to the consumer survey, the carrot products are evaluated by a trained panel of tasters, the sensory panel, with respect to a number of sensory (taste, odour and texture) properties. Since usually a high number of (correlated) properties(variables) are used, in this case 14, it is a common procedure to use a few, often 2, combined variables that contain as much of the information in the sensory variables as possible. This is achieved by extracting the first two principal components in a principal components analysis(pca) on the product-by-property panel average data matrix. PCA is a commonly used multivariate technique to explore and/or decompose high dimensional data. We call these two variables sens1 and sens2 and they are given by sens1 i = 14 a j v i 14 j and sens2 i = b j v i j j=1 j=1 where v1 i,..., vi 14 are the 14 average sensory scores for carrot product i and the coefficients a j and b j defining the two combined sensory variables are as depicted in figure 9.2. So sens1 is a variable that (primarily) measures bitterness vs. nutty taste whereas sens2 measures sweetness (and related properties). The actual preference mapping is carried out by first fitting regression models for the preference as a function of the sensory variables for each individual consumer using the 12 observations across the carrot products. Next, the individual regression coefficients are investigated, often in an explorative manner in which a scatter plot is used to look for a possible segmentation of consumers in these regression coefficients. In stead of looking for segmentation ( Cluster analysis ) we investigate whether we see any differences with respect to the background variables in the data, e.g. the gender or homesize (number of persons in the household). Let y i be the ith preference score. The natural model for this is a model that expresses randomly varying individual relations to the sensory variables, but with average (expected) values that may depend on the homesize. Let us consider the factor structure of the setting. The basic setting is a randomized

11 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS 11 sweet_ta fruit_ta Sens nut_ta carrot_af juicy colour bitter_ta car_od bitter_af earty_od crisp earthy_ta transp hard Sens1 Figur 9.2: Loadings plot for PCA of sensory variables: Scatter plot of coefficients b j versus a j. block experiment with 12 treatments (carrot products), the factor prod, and 103 blocks (consumers), the factor cons. Homesize (size) is a factor that partitions the consumers into two groups, those with homesize of 1 or 2, and those with a larger homesize. So the factor cons is nested within size, or equivalently size is coarser than cons. This basic structure is depicted in figure 9.3. (Note that the corresponding diagram plot in the video/audio based presentation of this module has a couple errors compared to the correct one given here)

12 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS [prod] [I] [cons] 101 size 1 2 Figur 9.3: The factor structure diagram for the carrots data The linear effect of the sensory variables is a part of the prod effect, since these covariates are on product level. So they are both coarser than the product effect. The sensory variables in the model will therefore explain some of the product differences. Including prod in the model as well will enable us to test whether the sensory variables can explain all the product differences. As we do no expect this to be the case, we adopt the point of view that the 12 carrot products is a random sample from the population of carrot products in Denmark, that is, the product effect is considered as a random effect. In other words, we consider the deviations in the product variation from what can be explained by the regression on the sensory variables, as random variation. Finally, the interactions between homesize and the sensory variables should enter the model as fixed effects, allowing for different average slopes for the two homesizes, leading to the model given by where y i = α(size i ) + β 1 (size i ) sens1 i + β 2 (size i ) sens2 i + a(cons i ) +b 1 (cons i ) sens1 i + b 2 (cons i ) sens2 i + d(prod i ) + ɛ i (9-8) a(k) N(0, σa 2 ), b 1 (k) N(0, σb1 2 ), b 2(k) N(0, σb2 2 ), k = 1, (9-9) and d(prod i ) N(0, σ 2 P ), ɛ i N(0, σ 2 ) (9-10)

13 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS 13 To finish the specification of a general random coefficient model, we need the assumption of the possibility of correlations between the random coefficients: (a(k), b 1 (k), b 2 (k)) N(0, σa 2 σ ab1 σ ab2 σ ab1 σb 2 1 σ b1 b 2 σ ab2 σ b1 b 2 σb 2 2 ) (9-11) Before studying the fixed effects, the variance part of the model is investigated further. We give details in the R-tutorial section on how we end up simplifying this 8-parameter variance model down to the 4-parameter variance model, where the σ 2 b 1 -parameter and the two related correlations first can be tested non-significant, and after that the correlation between the b 2 -effect and the intercept (which can make sense here as the sens2- values are mean centered, see the discussion in the tutorial section): where and y i = α(size i ) + β 1 (size i ) sens1 i + β 2 (size i ) sens2 i + a(cons i ) +b 2 (cons i ) sens2 i + d(prod i ) + ɛ i (9-12) a(k) N(0, σa 2 ), b 2 (k) N(0, σb2 2 ), k = 1, (9-13) d(prod i ) N(0, σ 2 P ), ɛ i N(0, σ 2 ) (9-14) and where there are no more correlations in the model. The three remaining variance parameters (not counting the residual variance) are now all significant. With this variance structure, we investigate the fixed effects - here showing the results of the automated step-function of lmertest: Sum Sq Mean Sq NumDF DenDF F.value elim.num Pr(>F) Homesize:sens sens Homesize:sens Homesize kept 0.02 sens kept 0.00 The final model for these data is therefore given by: y i = α(size i ) + β 2 sens2 i + a(cons i ) +b 2 (cons i ) sens2 i + d(prod i ) + ɛ i (9-15)

14 enote EXAMPLE: CONSUMER PREFERENCE MAPPING OF CARROTS 14 where and a(k) N(0, σa 2 ), b 2 (k) N(0, σb2 2 ), k = 1, (9-16) d(prod i ) N(0, σ 2 P ), ɛ i N(0, σ 2 ) (9-17) The estimates of the variances are listed in the following table: ˆσ b ˆσ a ˆσ P ˆσ ˆα(Homesize1) 4.91 ˆα(Homesize3) 4.67 ˆβ With confidence intervals as they come from the confint function: 2.5 % 97.5 %.sig sig sig sigma Homesize Homesize3-Homesize sens The conclusions regarding the relation between the preference and the sensory variables are that no significant relation was found to sens1, but indeed so for sens2. The relation does not depend on the homesize and is estimated at:(with 95% confidence interval) ˆβ 2 = 0.071, [0.04, 0.10] So two products with a difference of 10 in the 2nd sensory dimension (this is the span in the data set) are expected to differ in average preference with between 0.4 and 10. Sweet products are preferred to non-sweet products, cf. figure 9.2 above. The expected values for the two homesizes (for an average product) and their differences are estimated at: ˆα(1) + ˆβ 2 sens2 = 4.91, [4.73, 5.09] ˆα(2) + ˆβ 2 sens2 = 4.67, [4.47, 4.85] ˆα(1) ˆα(2) = 0.25, [0.04, 0.46] So homes with more persons tend to have a slightly lower preference in general for such carrot products.

15 enote RANDOM COEFFICIENT MODELS IN PERSPECTIVE Random coefficient models in perspective Although the factor structure diagrams with all the features of finding expected mean squares and degrees of freedom are only strictly valid for balanced designs and models with no quantitative covariates, they may still be useful as a more informal structure visualization tool for these non-standard situations. The setting with hierarchical regression data is really an example of what also could be characterized as repeated measures data. A common situation is that repeated measurements on a subject(animal, plant, sample) are taken over time then also known as longitudinal data. So apart from appearing as natural extensions of fixed regression models, the random coefficient models are one option for analyzing repeated measures data. The simple models can be extended to polynomial models to cope with non-linear structures in the data. Also additional residual correlation structures can be incorporated. In Modules 11 and 12 a thorough treatment of repeated measures data is given with a number of different methods simple as well as more complex approaches. 9.5 R-TUTORIAL: Constructed data The data file can be downloaded as: randcoef.txt and is described also in enote13. The simple linear regression analyses of the two response y1 and y2 in the data set randcoef are obtained using lm: randcoef <- read.table("randcoef.txt", sep=",", header=true) randcoef$subject <- factor(randcoef$subject) model1y1 <- lm(y1 ~ x, data = randcoef) model1y2 <- lm(y2 ~ x, data = randcoef) The parameter estimates with corresponding standard errors in the two models are: summary(model1y1) Call: lm(formula = y1 ~ x, data = randcoef)

16 enote R-TUTORIAL: CONSTRUCTED DATA 16 Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** x e-09 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 4 on 98 degrees of freedom Multiple R-squared: 0.301,Adjusted R-squared: F-statistic: 42.2 on 1 and 98 DF, p-value: 3.41e-09 summary(model1y2) Call: lm(formula = y2 ~ x, data = randcoef) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-12 *** x e-11 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 4.53 on 98 degrees of freedom Multiple R-squared: 0.377,Adjusted R-squared: F-statistic: 59.4 on 1 and 98 DF, p-value: 1.08e-11 The raw scatter plots for the data with superimposed regression lines are obtained using the plot and abline functions:

17 enote R-TUTORIAL: CONSTRUCTED DATA 17 par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) abline(model1y1) plot(x,y2) abline(model1y2)}) y y x x par(mfrow=c(1,1)) The individual patterns in the data can be seen from the next plot: par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) for (i in 1:10) {lines(x[subject==i],y1[subject==i],lty=i)} plot(x,y2) for (i in 1:10) {lines(x[subject==i],y2[subject==i],lty=i)}})

18 enote R-TUTORIAL: CONSTRUCTED DATA 18 y y x x par(mfrow=c(1,1)) The function lines connects points with line segments. Notice how the repetetive plotting is solved using a for loop: For each i between 1 and 10 the relevant subset of the data is plotted with a line type that changes as the subject changes. Alternatively we could have used 10 lines lines for each response. The fixed effects analysis with the two resulting ANOVA tables are: model2y1 <- lm(y1 ~ x + subject + x * subject, data = randcoef) model2y2 <- lm(y2 ~ x + subject + x * subject, data = randcoef) library(xtable) print(xtable(anova(model2y1))) print(xtable(anova(model2y2))) A plot of the data with individual regression lines based on model2y1 and model2y2 is again produced using a for loop. First we fit the two models in a different parameteri-

19 enote R-TUTORIAL: CONSTRUCTED DATA 19 Df Sum Sq Mean Sq F value Pr(>F) x subject x:subject Residuals Df Sum Sq Mean Sq F value Pr(>F) x subject x:subject Residuals sation (to obtain the estimates in a convenient form of one intercept and one slope per subject) model3y1 <- lm(y1 ~ subject x * subject - x, data = randcoef) model3y2 <- lm(y2 ~ subject x * subject - x, data = randcoef) The plots are produced using par(mfrow=c(1,2)) with(randcoef, {plot(x,y1) for (i in 1:10) {abline(coef(model3y1)[c(i,i+10)],lty=i)} plot(x,y2) for (i in 1:10) {abline(coef(model3y2)[c(i,i+10)],lty=i)}})

20 enote R-TUTORIAL: CONSTRUCTED DATA 20 y y x x par(mfrow=c(1,1)) Explanation: Remember that coef extracts the parameter estimates. Now the first 10 estimates will be the intercept estimates and the next 10 will be the slope estimates. Thus the component pairs (1, 11), (2, 12),..., (10, 20) will be belong to the subjects 1, 2,..., 10, respectively. This is exploited in the for loop in the part [c(i,i+10)]! which produces these pairs as i runs from 1 to 10. The equal slopes model for the second data set with parameter estimates is: model4y2 <- lm(y2 ~ subject + x, data = randcoef) summary(model4y2) Call: lm(formula = y2 ~ subject + x, data = randcoef) Residuals:

21 enote R-TUTORIAL: CONSTRUCTED DATA 21 Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** subject * subject subject subject subject subject subject subject ** subject x e-13 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 4.15 on 89 degrees of freedom Multiple R-squared: 0.524,Adjusted R-squared: F-statistic: 9.81 on 10 and 89 DF, p-value: 7.23e-11 The summary of the two step analysis can be obtained using the functions mean and sd (computing empirical mean and standard deviation of a vector, respectively) to the vector of intercept estimates and to the vector of slope estimates (from the different slopes models) to perform the computations in this Module 9. Here it comes from data set 1, but it is done similarly for data set 2: ainty1 <- mean(coef(model3y1)[1:10]) sdinty1 <- sd(coef(model3y1)[1:10])/sqrt(10) uinty1 <- ainty * sdinty1 linty1 <- ainty * sdinty1 asloy1 <- mean(coef(model3y1)[11:20]) sdsloy1 <- sd(coef(model3y1)[11:20])/sqrt(10) usloy1 <- asloy * sdsloy1 lsloy1 <- asloy * sdsloy1

22 enote R-TUTORIAL: CONSTRUCTED DATA 22 The correlations between intercepts and between slopes in the two data set are computed using corr cor(coef(model3y1)[1:10], coef(model3y1)[11:20]) [1] cor(coef(model3y2)[1:10], coef(model3y2)[11:20]) [1] The random coefficients analysis is done with lmer. The different slopes random coefficient model is library(lmertest) model5y1 <- lmer(y1 ~ x + (1 + x subject), data = randcoef) model5y2 <- lmer(y2 ~ x + (1 + x subject), data = randcoef) After random the part 1x+ specifies the terms to which the random factors after are assigned. One way to think about this, is that 1 is multiplied by subject and that x is multiplied by subject yielding the terms 1 subject + x subject which corresponds to the random part in formula (9.2). The (fixed effects) parameter estimates andn their standard errors are obtained from the model summary: summodel5y1 <- summary(model5y1) summodel5y2 <- summary(model5y2) print(xtable(summodel5y1$coefficients)) Estimate Std. Error df t value Pr(> t ) (Intercept) x

23 enote R-TUTORIAL: CONSTRUCTED DATA 23 print(xtable(summodel5y2$coefficients)) Estimate Std. Error df t value Pr(> t ) (Intercept) x The variance parameter, including the correlations between intercept and slope, estimates are obtained using: summodel5y1$varcor Groups Name Std.Dev. Corr subject (Intercept) x Residual summodel5y2$varcor Groups Name Std.Dev. Corr subject (Intercept) x Residual The equal slopes models within the random coefficient framework are specified as model6y1 <- lmer(y1 ~ x + (1 subject), data = randcoef) model6y2 <- lmer(y2 ~ x + (1 subject), data = randcoef) Likelihood ratio tests for reduction from different slopes to equal slopes can be obtained using anova with two lmer result objects as arguments (the first argument (model) is less general than the second argument (model)). print(xtable(anova(model6y1, model5y1, refit=false)))

24 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 24 Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) object print(xtable(anova(model6y2, model5y2, refit=false))) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) object Confidence intervals for the relevant final models may be obtained by: print(xtable(confint(model6y2))) 2.5 % 97.5 %.sig sigma (Intercept) x print(xtable(confint(model5y1))) In the latter case, three of the variance parameters cannot be profiled (only for the subject-main-effect variance component a finite confidence interval is found). This is not necessarily a problem, as the likelihood and CIs and tests for the fixed effects still make good sense. The (fixed effects) parameter estimates for the final model for data set 2 are print(xtable(summary(model6y2)$coefficients)) 9.6 R-TUTORIAL: Consumer preference mapping of carrots The data file can be downloaded as: carrots.txt and is described also in enote13.

25 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS % 97.5 %.sig sig sig Inf.sigma 0.00 Inf (Intercept) x Estimate Std. Error df t value Pr(> t ) (Intercept) x Recall that the most general model ((9.8) to (9.11) in Module 9) states that for each level of Consumer the random intercept and random slopes of sens1 and sens2 are correlated in an arbitrary way (the specification in (9.11)). It can be specified as follows carrots <- read.table("carrots.txt", header = TRUE, sep = ",") carrots$homesize <- factor(carrots$homesize) carrots$consumer <- factor(carrots$consumer) carrots$product <- factor(carrots$product) model1 <- lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1+sens1+sens2 Consumer),data=carrots) summary(model1) Linear mixed model fit by REML [ mermodlmertest ] Formula: Preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 + sens1 + sens2 Consumer) Data: carrots REML criterion at convergence: 3748 Scaled residuals: Min 1Q Median 3Q Max Random effects:

26 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 26 Groups Name Variance Std.Dev. Corr Consumer (Intercept) sens sens product (Intercept) Residual Number of obs: 1233, groups: Consumer, 103; product, 12 Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) <2e-16 *** Homesize * sens sens ** Homesize3:sens Homesize3:sens Signif. codes: 0 *** ** 0.01 * Correlation of Fixed Effects: (Intr) Homsz3 sens1 sens2 Hms3:1 Homesize sens sens Homsz3:sns Homsz3:sns The random part deserves some explanation. The structure (9.11) amounts to the term (1sens1+sens2 Consumer)+, for each level of Consumer we have 3 random effects, one intercept and two slopes, and they are arbitrarily correlated. In addition there is the random effect product. Let us check what the step function of lmertest can tell us about the random effects of this model: mystep <- step(model1) print(xtable(mystep$rand.table)) We note that the sens1:consumer effect is tested with 3 degrees of freedom (and is NS, hence eliminated). This is because elminating this term from the model means that the

27 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 27 Chi.sq Chi.DF elim.num p.value sens1:consumer product kept 0.00 sens2:consumer kept 0.02 variance AND the correlations between this coefficient and the sens2 coefficients and the intercepts are are all assumed to be zero. This is the elimination principle implemented here. Remark 9.1 Random coefficient correlations Generally it is recommended to include these correlations in the models (and also what R is doing for us, when implemented as shown above). This is so, as correlations between the x es will induce such correlations between the coefficients by construction and hence it would be wrong not to allow for them in the model. The basic example is a non-centred x in a regression that will lead to a relation between the slope and the intercept. However, IF the x is centered (and hence the x has correlation zero with the constant ) this relation disappears. And generally if the x es are independent (orthogonal) then models with independent coefficients could make sense and could be a reasonable approach to stabilize the random effect part of the model. In this case the sens1 and sens2 are in fact both mean centred and independent by construction (scores from a principal component analysis), but check: mean(carrots$sens1) [1] 6.667e-11 mean(carrots$sens2) [1] -7.5e-11 cor(carrots$sens1, carrots$sens2) [1] -1.93e-11

28 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 28 The models with (A) and without (B) correlation between intercepts and sens2-slopes (Model 1 in Module 9) is (note the difference in R-syntax for the random effects) model2a <- lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1+sens2 Consumer), data=carrots) model2b <- lmer(preference ~ Homesize + sens1 + sens2 + Homesize * sens1 + Homesize * sens2 + (1 product) + (1 Consumer) + (0+sens2 Consumer), data=carrots) print(xtable(anova(model2a, model2b, refit=false))) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) object So we do not need the correlation in the model. We could anyway without any problems use the results of the step call above or we could redo by applying the step function on the model2b-fit: mystep2 <- step(model2b) Warning: Model failed to converge with max grad = (tol = 0.002) Warning: Model failed to converge with max grad = (tol = 0.002) print(xtable(mystep2$rand.table)) Chi.sq Chi.DF elim.num p.value product kept 0.00 Consumer kept 0.00 sens2:consumer kept 0.01 Now we also see a test for the random main (intercept) effect of Consumer, which was not part of the above.

29 enote R-TUTORIAL: CONSUMER PREFERENCE MAPPING OF CARROTS 29 The warnings do not worry us to much here: In one or two of the models the convergence just barely failed due to one of the convergence criteria, but clearly it was pretty close. There are ways to work with setting various optimizer options including extending the number of iterations etc. but we will not pursue these here. Instead we check that the final model converges: finalmodel <- lmer(preference ~ Homesize + sens2 + (1 product) + (1 Consumer) + (0 + sens2 Consumer), data = carrots) After having reduced the covariance structure in the model, we turn attention to the mean structure, ie the fixed effects: print(xtable(mystep2$anova.table)) Sum Sq Mean Sq NumDF DenDF F.value elim.num Pr(>F) Homesize:sens sens Homesize:sens Homesize kept 0.02 sens kept 0.00 And various model parameter summaries and post hoc: VarCorr(mystep2$model) Groups Name Std.Dev. Consumer sens Consumer.1 (Intercept) product (Intercept) Residual print(xtable(confint(mystep2$model))) print(xtable(mystep$lsmeans))

30 enote EXERCISES % 97.5 %.sig sig sig sigma (Intercept) Homesize sens Homesize Estimate Standard Error DF t-value Lower CI Upper CI p-va Homesize Homesize print(xtable(mystep$diffs.lsmeans)) Estimate Standard Error DF t-value Lower CI Upper CI p-value Homesize Exercises Exercise 1 Carrots data Consider the carrots data of this Chapter. The data file can be downloaded as: carrots.txt and is described also in enote13. module. Carry out a similar analysis using (at least) one of the other three response variables (Sweetness,Bitter or Crisp) instead of the preference. Try to include (at least) one other background variable than the homesize, e.g. gender.

Random coefficients models

Random coefficients models enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information

A short explanation of Linear Mixed Models (LMM)

A short explanation of Linear Mixed Models (LMM) A short explanation of Linear Mixed Models (LMM) DO NOT TRUST M ENGLISH! This PDF is downloadable at "My learning page" of http://www.lowtem.hokudai.ac.jp/plantecol/akihiro/sumida-index.html ver 20121121e

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall

More information

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,

More information

Practical 4: Mixed effect models

Practical 4: Mixed effect models Practical 4: Mixed effect models This practical is about how to fit (generalised) linear mixed effects models using the lme4 package. You may need to install it first (using either the install.packages

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

lme for SAS PROC MIXED Users

lme for SAS PROC MIXED Users lme for SAS PROC MIXED Users Douglas M. Bates Department of Statistics University of Wisconsin Madison José C. Pinheiro Bell Laboratories Lucent Technologies 1 Introduction The lme function from the nlme

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Introduction to mixed-effects regression for (psycho)linguists

Introduction to mixed-effects regression for (psycho)linguists Introduction to mixed-effects regression for (psycho)linguists Martijn Wieling Department of Humanities Computing, University of Groningen Groningen, April 21, 2015 1 Martijn Wieling Introduction to mixed-effects

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Performing Cluster Bootstrapped Regressions in R

Performing Cluster Bootstrapped Regressions in R Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Output from redwing2.r

Output from redwing2.r Output from redwing2.r # redwing2.r library(lsmeans) library(nlme) #library(lme4) # not used #library(lmertest) # not used library(multcomp) # get the data # you may want to change the path to where you

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation: Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41 1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

PSY 9556B (Feb 5) Latent Growth Modeling

PSY 9556B (Feb 5) Latent Growth Modeling PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

MAGMA joint modelling options and QC read-me (v1.07a)

MAGMA joint modelling options and QC read-me (v1.07a) MAGMA joint modelling options and QC read-me (v1.07a) This document provides a brief overview of the (application of) the different options for conditional, joint and interaction analysis added in version

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test

Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test 1 / 42 Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test Søren Højsgaard 1 Ulrich Halekoh 2 1 Department of Mathematical Sciences Aalborg University, Denmark sorenh@math.aau.dk

More information

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References... Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison 610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because

More information

Statistical Analysis of Series of N-of-1 Trials Using R. Artur Araujo

Statistical Analysis of Series of N-of-1 Trials Using R. Artur Araujo Statistical Analysis of Series of N-of-1 Trials Using R Artur Araujo March 2016 Acknowledgements I would like to thank Boehringer Ingelheim GmbH for having paid my tuition fees at the University of Sheffield

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio

More information

Cross-validation and the Bootstrap

Cross-validation and the Bootstrap Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface

More information

Package simr. April 30, 2018

Package simr. April 30, 2018 Type Package Package simr April 30, 2018 Title Power Analysis for Generalised Linear Mixed Models by Simulation Calculate power for generalised linear mixed models, using simulation. Designed to work with

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

Additional Issues: Random effects diagnostics, multiple comparisons

Additional Issues: Random effects diagnostics, multiple comparisons : Random diagnostics, multiple Austin F. Frank, T. Florian April 30, 2009 The dative dataset Original analysis in Bresnan et al (2007) Data obtained from languager (Baayen 2008) Data describing the realization

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

Workshop 8: Model selection

Workshop 8: Model selection Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

Stat 4510/7510 Homework 4

Stat 4510/7510 Homework 4 Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader

More information

An Experiment in Visual Clustering Using Star Glyph Displays

An Experiment in Visual Clustering Using Star Glyph Displays An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master

More information

The Truth behind PGA Tour Player Scores

The Truth behind PGA Tour Player Scores The Truth behind PGA Tour Player Scores Sukhyun Sean Park, Dong Kyun Kim, Ilsung Lee May 7, 2016 Abstract The main aim of this project is to analyze the variation in a dataset that is obtained from the

More information

mcssubset: Efficient Computation of Best Subset Linear Regressions in R

mcssubset: Efficient Computation of Best Subset Linear Regressions in R mcssubset: Efficient Computation of Best Subset Linear Regressions in R Marc Hofmann Université de Neuchâtel Cristian Gatu Université de Neuchâtel Erricos J. Kontoghiorghes Birbeck College Achim Zeileis

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90% ------------------ log: \Term 2\Lecture_2s\regression1a.log log type: text opened on: 22 Feb 2008, 03:29:09. cmdlog using " \Term 2\Lecture_2s\regression1a.do" (cmdlog \Term 2\Lecture_2s\regression1a.do

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Unit 5 Logistic Regression Practice Problems

Unit 5 Logistic Regression Practice Problems Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises

More information

The lmekin function. Terry Therneau Mayo Clinic. May 11, 2018

The lmekin function. Terry Therneau Mayo Clinic. May 11, 2018 The lmekin function Terry Therneau Mayo Clinic May 11, 2018 1 Background The original kinship library had an implementation of linear mixed effects models using the matrix code found in coxme. Since the

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information