enote 3 1 enote 3 Case study

Size: px

Start display at page:

Download "enote 3 1 enote 3 Case study"

Calvin Barton
5 years ago
Views:

1 enote 3 1 enote 3 Case study

2 enote 3 INDHOLD 2 Indhold 3 Case study Introduction Initial explorative analysis Test of overall effects/model reduction Post hoc analysis and summarizing the results Estimates of the variance parameters Estimates of the fixed parameters Comparisons of the fixed parameters R-TUTORIAL: Creating report ready tables and figures Plot devices Plotting with colours Report ready tables with xtable R-TUTORIAL: Initial explorative analysis Test of overall effects/model reduction R-TUTORIAL: Post hoc analysis and summarizing the results Exercises

3 enote INTRODUCTION Introduction This module consists of the first part of a complete analysis of the beech wood data presented as an example in module 2. The aim is to show that the principles for data analysis and result summary for fixed ANOVA and/or regression models also apply for mixed models. And maybe some readers will find it helpful to have some of these principles reviewed. For completeness we repeat here the description and initial factor structure considerations. To investigate the effect of drying of beech wood on the humidity percentage, the following experiment was conducted. Each of 20 planks was dryed in a certain period of time. Then the humidity percentage was measured in 5 depths and 3 widths for each plank: depth 1: close to the top depth 5: in the center depth 9: close to the bottom depth 3: between 1 and 5 depth 7: between 5 and 9 width 1: close to the side width 3: in the center width 2: between 1 and 3 So there are 3 5 = 15 measurements for each plank and all together 300 observations. The data is can be found as planks.txt and is reproduced in the following table.

4 enote INTRODUCTION 4 Width 1 Width 2 Width 3 Depth Depth Depth Planks In this experiment we have 3 factors apart from the trivial factors I and 0. Let us use the factor names plank, width and depth. The factor plank has 20 levels, width has 3 and depth has 5 levels. For the ith measurement of humidity, plank i denotes the plank on which this measurement was performed. And correspondingly width i and depth i denotes the width and depth, respectively, of this ith measurement. It would be natural to include the interaction between width and depth corresponding to the product factor width depth. The product factor has in this case 15 levels. A natural model would include plank as a block factor while depth and width enter together with their interaction. If Y i denotes the humidity percentage corresponding to the ith measurement, the model with fixed block effect can be written as: Y i = µ + α(width i ) + β(depth i ) + γ(width i, depth i ) + δ(plank i ) + ɛ i, (3-1) where i = 1,..., 300 and where the ɛ i s are independent and normally distributed random variables. Or similarly: Y ijk = µ + α i + β j + γ ij + δ k + ɛ ijk

5 enote INITIAL EXPLORATIVE ANALYSIS 5 Figur 3.1: The factor structure diagram where Y ijk is the kth measurement within the (i, j)th combination of the two factors, i = 1,..., 3, j = 1,..., 5 and k = 1,..., 20. As pointed out in Module 1 the block (plank) effect should be considered as a random effect, leading to the mixed model: Y i = µ + α(width i ) + β(depth i ) + γ(width i, depth i ) + d(plank i ) + ɛ i, (3-2) where d(plank i ) N(0, σplank 2 ) and ɛ ijk N(0, σ 2 ). This model corresponds to the factor structure diagram given in figure Initial explorative analysis Having realized the complete structure of the data, it is time to do initial plotting/ explorative analysis. Throughout this module, figures and results are presented without

6 enote INITIAL EXPLORATIVE ANALYSIS 6 mean of humidity mean of humidity width depth mean of humidity mean of humidity width depth Figur 3.2: Four average humidity profiles showing R code or raw R output. This can be seen as a standard for reports in the course! Typically, numerous figures not entering a final project report should be studied, since this phase is explorative, and final figures to present the key results are chosen after the statistical analysis is completed. The plotting of various average profiles is usually a helpful tool for data with several factors. In figure 3.2 four of these are presented. In the top left diagram the width humidity patterns for each plank is depicted by plotting the average humidity (taking the average of the five depths for each width and plank) against the widths. It is immediately clear that there is extensive plank-to-plank variations in the level of humidity. The message about the width effect is less clear. In the top right the similar plot for the depth effect is seen. Here the message is much clearer: The humidity is high in the center (depth=5) and low at the top (depth=1) and at the bottom (depth=9). As pointed out, this is the effect seen when the three widths are averaged. It could be that the depth effect is different for widths close to the side of the plank (width=1) than for widths in the center (width=3). In other words, there could be a plank*width interaction effect, that we wouldn t find in the plots above. Instead similar plots are given in the bottom diagrams of figure 3.2 for the widths and depths by averaging over the planks (that is, plotting the 15 average values). The depth structure already seen is recognized. Also, it is seen that there is a clear shift in humidity level from width to width and that the depth humidity pattern seems to be

7 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 7 roughly the same for the three widths. However, there are some deviations from parallel patterns and the uncertainties in the deviations from parallel patterns are not visible. A similar increasing-decreasing width pattern, that was not clearly visible from the top diagram is now seen. This pattern seems to be roughly the same for all depths (with the same precautions as before) and the low humidity levels for the top and bottom depths are clearly seen. Note again that the two bottom plots contain the same information: had there been clearly non-parallel patterns in one figure (an interaction effect) this would also appear in the other figure. The next step is to start the actual statistical analysis of the data. 3.3 Test of overall effects/model reduction A statistical analysis of this kind is commonly carried out in several steps, starting with the basic model found from the factor structure considerations. This model usually contains every possible effect there may be in the data. However, it is of interest to simplify things into easily interpretable results, if possible! So, the idea is to remove nonsignifcant complex stuff from the model before summarizing the results. Carrying out the mixed model analysis corresponding to the model given by (3-2) gives the following ANOVA table of fixed effects: Source of Numerator degrees Denominator degrees F- P- variation of freedom of freedom statistics values depth < width < depth*width We see, that the depth*width interaction effect is non-significant. Hence, we remove the interaction term and do the analysis based on the model: Y i = µ + α(width i ) + β(depth i ) + d(plank i ) + ɛ i, (3-3) where d(plank i ) N(0, σ 2 Plank ) and ɛ i N(0, σ 2 ). This model is illustrated by the factor structure diagram in figure 3.3. Note how the 8 degrees of freedom from the interaction effect has now been added to the error degrees of freedom. The table of fixed effects then becomes:

8 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 8 Figur 3.3: The factor structure diagram Source of Numerator degrees Denominator degrees F- P- variation of freedom of freedom statistics values depth < width < Note that the removal of the non-significant interaction effect only has minor effects on the conclusions regarding the depth and width effects: They are both extremely significant, confirming what we explored above. Since there are no more non-significant

9 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 9 fixed effects, the model given by 3-3 is the final model to use for summarizing the results. 3.4 Post hoc analysis and summarizing the results Estimates of the variance parameters The final model is given by (3-3), since main effects of as well width as depth are clearly significant. Estimates of the two variance parameters are: ˆσ 2 Planks = , ˆσ2 = Uncertainties of these estimates are given by: 2.5 % 97.5 %.sig sigma The remaining part of this subsection on post-hoc analysis and presentation of results illustrates how the information in factors can be summarized whenever the factor does not interact with any other factor Estimates of the fixed parameters Estimates of the expected values (LSMEANS) for each level of depth, together with their uncertainties and 95% confidence intervals are: Estimate SE Lower Upper Depth Depth Depth Depth Depth and correspondingly for each level of width:

10 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 10 Estimate SE Lower Upper Width Width Width Comparisons of the fixed parameters A commonly used post hoc analysis is to compare either specific pairs of depths (resp. widths) or compare all combinations within each factor. For the former, a standard t- tests can be used, e.g. ˆβ(1) ˆβ(2) t = SE ( ˆβ(1) ˆβ(2) ) using the error degrees of freedom (274). Or equivalently expressed by a 95% confidence interval: ˆβ(1) ˆβ(2) ± t.975,274 SE ( ˆβ(1) ˆβ(2) ) In this case, the estimates of the fixed effects are raw averages of the data based on the same number of observations for each level, so the standard error of the difference between two depth levels is given by SE ( ˆβ(1) ˆβ(2) ) = 2 ˆσ 2 /60 This means that two depth levels are claimed signifcantly different if they differ by more than t.975,274 2 ˆσ 2 /60 from each other. This is also called the 95% Least Significant Difference (LSD) value. It would be tempting to do such tests for all combinations of levels within each factor. This is generally NOT an acceptable approach, since the probability of significance-bychance becomes too large when many tests are performed simultaneously. This is called the multiplicity problem. With five depth levels there are 5 4/2 = 10 possible depth pairs to compare. Comparing two specific (decided before seeing the data) levels is not the same as comparing the smallest among five with the largest among five. In a case with no effects one would always expect the latter two to be more different by chance than the former. There are numerous solutions to properly handle this problem, if all comparisons indeed are made. All of them amounts to requiring differences to be larger than required

11 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 11 by the usual t-test to be claimed significant. One general idea, that can be used whenever numerous tests are performed simultaneously, is the Bonferroni correction: If k tests are performed simultaneously, then use level α/k in each test rather than α. For instance, if all depth levels are compared, standard pair-wise t-test output can be used, but employing level 0.5% in each test rather than 5%: So only claiming those differences significant for which the usual P-value is less than This method is known to be somewhat conservative, meaning that it may be too critical, or in other words again: it may miss some actual differences. Another solution is to use another distribution than the t-distribution, when comparisons are made. With the so-called Tukey-Kramer method two depth levels would be claimed signifcantly different if they differ by more than ν.975,j,274 ˆσ 2 /60 from each other, where J is the number of groups to be compared and ν 0.975,J,274 is the 97.5%-quantile of the so-called studentized range distribution with J groups. This distribution takes into account that the two levels that we compare in a single test is coming from J groups all together. This distribution is, just like the t-distribution, tabulated or available in the computer. Note that if J = 2, then the studentized range distribution corresponds to the t-distribution, The Tukey-adjusted results are: ν.975,2,274 = t.975,274 2 Depth Parameter Estimate SE Lower Upper P-value difference 1-3 β(1) β(2) < β(1) β(3) < β(1) β(4) < β(1) β(5) β(2) β(3) β(2) β(4) β(2) β(5) < β(3) β(4) β(3) β(5) < β(4) β(5) < Note that since the P-values are corrected, that is, based on the more proper studentized range distribution, they can be used directly without any additional Bonferroni correction. Similarly for the width effect:

12 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 12 Width Parameter Estimate SE Lower Upper P-value difference 1-2 α(1) α(2) α(1) α(3) < α(2) α(3) < Frequently, the key information of the two tables for each effect is summarized into a single table in which the lsmeans are ordered by size: Depth 9 Depth 1 Depth 7 Depth 3 Depth 5 Estimate a a b bc c The letter subscripts express the 5% significance results of the 10 pair-wise comparisons: Two depths sharing a subscript are NOT significantly different Two depths NOT sharing a subscript are significantly different So the pattern already observed in Figure 3.2 can now be statistically confirmed: there is a clear lower humidity close to the top and the bottom (and no difference between top and bottom). Also there is an indication that the center position has significantly higher humidity than the in between positions (between which no difference is seen). For the width effect, the summary table becomes particularly simple, since all three differences are significant: Width 3 Width 1 Width 2 Estimate a b c For these data, a figure of the raw data, like one of the bottom plots of figure 3.2 together with a statement of the lack of significant width*depth interaction and the two summary tables would probably suffice for most purposes. In later modules we will see how

13 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 13 additional plots of the model expectations/details will provide informative figures for interpretation. Other types (than the multiple comparison approach) of post hoc analysis may be employed, especially when quantitative information about the factor levels are available. In this case we know exactly the positions that corresponds to the different widths and depths and this could be used in the analysis. For instance, it could be investigated whether a quadratic function of the depths could be used to describe the humidity pattern. Apart from the nice direct functional interpretation of the dependence of humidity on depth, it could possibly provide more powerful tests for interaction effects. In fact this would still be a linear model, and could be handled by lmer We will return to such analyzes in a later module. Non-linear models (using e.g. exponentials etc) could also be an option in some cases, but then the model will no longer be a linear model, and additional theory and packages would be needed. The summary approach above was based on the assumption of no interaction between width and depth, that is, the conclusions regarding widths hold for all the depths, and vice versa. Had there been a significant interaction, we would have to present, say, the depth effects for each of the three widths (and/or vice versa), since the significance tells us that these three conclusions will NOT be the same. In practice, we proceed as above, BUT for the combined width*depth factor with 15 levels rather than for each of them separately. We will see examples of this later. One important step in the analysis given is missing: An investigation of the validity of the model assumptions! We return to this issue in Module 6, where we then finish the analysis of this data set on the humidity of beech wood planks. 3.5 R-TUTORIAL: Creating report ready tables and figures Since reports witout raw R-code or raw R-output are requested as well in this course as generally, it is useful to be able to apply some of the tools given in R to create nice tables (and figures) for LaTex and/or Word-based report writing Plot devices First of all, there are different device functions for saving plots in various formats, e.g. to save a plot as a pdf, write:

14 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 14 pdf("myplanksinteractionplot.pdf") with(planks, interaction.plot(depth,width,humidity,legend=f,col=2:4)) dev.off() Or as a png: (you choose the extension of the output file yourself, but it is clearly highly recommended to choose the right extension) png("myplanksinteractionplot.png") with(planks, interaction.plot(depth,width,humidity,legend=f,col=2:4)) dev.off() And similarly there are bmp and jpeg device functions. Plots can also be exported directly from the plots-windows in Rstudio Plotting with colours Colors can be specified in several different ways. And various plot functions may have various colour options for colouring different aspects of the plot. The simplest way to specify a colour is with a character string giving the color name (e.g., red ). A list of the possible colors can be obtained with the function colours, write: colors (distinct = FALSE) to see all the possible choices. Have a look at this website to see what all these colours look like, or go to: the QuickR website. Even more easily you can use integers as colour codes. As a default R uses a palette of 8 colours: palette() [1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" [8] "gray" which can then be refered to by the numbers 1-8. And then it would cycle modulus 8, meaning that using 9 would give black again.

15 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 15 There are a number pre-defined palettes that can be used when more (and better) collection of colours are needed, e.g. functions hsv, rainbow and hsv, e.g. write:?heat.colors which then could be used e.g. as: par(mfrow=c(2,2)) with(planks, interaction.plot(width,plank,humidity,legend=f,col=heat.colors(20))) with(planks, interaction.plot(depth,plank,humidity,legend=f,col=terrain.colors(20))) with(planks, interaction.plot(width,depth,humidity,legend=f,col=topo.colors(5))) with(planks, interaction.plot(depth,width,humidity,legend=f,col=cm.colors(3))) par(mfrow=c(1,1)) Or: # Rainbow color # you notice the value 10 is used to tell that you want 10 colors # e.g. rainbow(10) gives 10 different colors. rainbow(5) gives 5 colors par(mfrow=c(2,2)) with(planks, interaction.plot(width,plank,humidity,legend=f,col=rainbow(20))) with(planks, interaction.plot(depth,plank,humidity,legend=f,col=rainbow(20))) with(planks, interaction.plot(width,depth,humidity,legend=f,col=rainbow(5))) with(planks, interaction.plot(depth,width,humidity,legend=f,col=rainbow(3))) par(mfrow=c(1,1)) Or: par(mfrow=c(2,2)) with(planks, interaction.plot(width,plank,humidity,legend=f,col=hsv(1:20/20))) with(planks, interaction.plot(depth,plank,humidity,legend=f,col=hsv(1:20/20))) with(planks, interaction.plot(width,depth,humidity,legend=f,col=hsv(1:5/5))) with(planks, interaction.plot(depth,width,humidity,legend=f,col=hsv(1:3/3))) par(mfrow=c(1,1)) Report ready tables with xtable Nice tables can be produced by the xtable function of the xtable-package. An example:

16 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 16 means=as.matrix(with(planks, tapply(humidity,width,mean))) xtable(means) % latex table generated in R by xtable package % Fri Sep 18 13:46: \begin{table}[ht] \centering \begin{tabular}{rr} \hline & x \\ \hline 1 & 5.51 \\ 2 & 5.79 \\ 3 & 5.10 \\ \hline \end{tabular} \end{table} And then when this tex-code is included in your tex-file it will appear in the report as: x Note how the input to xtable was a matrix here. The function is prepared to recognize a number of different R-objects, see e.g.: methods(xtable) [1] xtable.anova* xtable.aov* [3] xtable.aovlist* xtable.coxph* [5] xtable.data.frame* xtable.glm* [7] xtable.lm* xtable.matrix* [9] xtable.prcomp* xtable.summary.aov* [11] xtable.summary.aovlist* xtable.summary.glm* [13] xtable.summary.lm* xtable.summary.prcomp* [15] xtable.table* xtable.ts*

17 enote R-TUTORIAL: INITIAL EXPLORATIVE ANALYSIS 17 [17] xtable.zoo* see?methods for accessing help and source code For instance, ANOVA-tables will be recognized. So a LaTex-user can then copy these tex-lines into the report.tex-document. Or to integrate the R-code into the tex-code, use the knitr-package to create the pure tex-file from a.rnw file, which is a kind of tex-file with all the R-code integrated into it, with a lot of flexibility in controlling what will be showed/evaluated etc in the output. This can be used for both raw code/results, tables and figures. A word user may also use xtable through the html-print-option: print(xtable(means), type = "html")  <!-- Fri Sep 18 13:46: > <table border=1> <tr> <th> </th> <th> x </th> </tr> <tr> <td align="right"> 1 </td> <td align="right"> 5.51 </td> </tr> <tr> <td align="right"> 2 </td> <td align="right"> 5.79 </td> </tr> <tr> <td align="right"> 3 </td> <td align="right"> 5.10 </td> </tr> </table> And then print the table directly into a file: print(xtable(means), type = "html", file = "myhtmltable.html") Open the file in a browser and copy-paste to Word. 3.6 R-TUTORIAL: Initial explorative analysis The data set planks is imported as described in R Module 1. Assume that the data set is called planks in R. The plots in figure 3.2 in Module 3 are produced using the function interaction.plot which requires three arguments: first the factor that is to be on the x- axis, then the factor that separates the data into distinct graphs and finally the response

18 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 18 variable. An optional parameter legend which takes either FALSE (F) or TRUE (T) specifies whether or not a legend should be added (relating the graphs to the factor levels) par(mfrow=c(2,2)) planks <- read.table("planks.txt", header = TRUE, sep = ",") with(planks, interaction.plot(width,plank,humidity,legend=f,col=2:11)) with(planks, interaction.plot(depth,plank,humidity,legend=f,col=2:11)) with(planks, interaction.plot(width,depth,humidity,legend=f,col=2:11)) with(planks, interaction.plot(depth,width,humidity,legend=f,col=2:11)) Notice that the with{... } function around the interaction.plot statements results in evaluation of the statements within a frame where the data set planks is available. This approach avoids having to attach data sets. To obtain all four plots in a two-by-two setup exactly like in figure 3.2, the statement par(mfrow=c(2,2)) should be issued prior to the above with statements. As already mentioned in the R Module 1, the function par is used to set a variety of graphical parameters (try typing?par for details). The parameter mfrow is a vector of length two where the first component is the number of rows on the graphical device and the second component is the number of columns on the graphical device. To return to the default use par(mfrow=c(1,1)). 3.7 Test of overall effects/model reduction In the previous section we did not need to define factors (Module 2)to use interaction.plot, but now we do. Configure the three variables depth, plank and width as factors planks$plank <- factor(planks$plank) planks$depth <- factor(planks$depth) planks$width <- factor(planks$width) Analysis of models including random effects can be done using the lmre function in the package lme4. The general model with fixed-effects structure consisting of the interaction between two factors and random effects assigned to the plank is specified as follows

19 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 19 model1 <- lmer(humidity ~ depth*width +(1 plank), data = planks) Notice that the fixed-effects structure is specified as either depth+width+depth:width or depth*width as more short used here - they give the same result. The relevant tests of the fixed-effects structure are obtained applying anova(model1) after making sure the lmertest-package is available require(lmertest) anova(model1) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) depth < 2.2e-16 *** width e-12 *** depth:width Signif. codes: 0 *** ** 0.01 * Or using the xtable: xtable(anova(model1)) Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) depth width depth:width or using ANOVA from the car package: require(car) xtable(anova(model1, test.statistic = "F", type = 3)) The interaction is not significant and a reduced model can be formulated

20 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS20 F Df Df.res Pr(>F) (Intercept) depth width depth:width model2 <- lmer(humidity ~ depth + width + (1 plank), data = planks) xtable(anova(model2)) Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) depth width Both factors are highly significant and no further reduction is possible. 3.8 R-TUTORIAL: Post hoc analysis and summarizing the results The so-called likelihood profile based confidence intervals for the two variance parameters are found as:: summary(model2)$varcor Groups Name Std.Dev. plank (Intercept) Residual m2prof <- profile(model2,which=1:2) xtable(confint(m2prof)) 2.5 % 97.5 %.sig sigma As in R Module 1 we can use lsmeans to compute the estimated mean levels and their differences:

21 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS21 require(lsmeans) lsmeans::lsmeans(model2, pairwise ~ depth) $lsmeans depth lsmean SE df lower.cl upper.cl Results are averaged over the levels of: width Confidence level used: 0.95 $contrasts contrast estimate SE df t.ratio p.value < < < < < <.0001 Results are averaged over the levels of: width P value adjustment: tukey method for comparing a family of 5 estimates lsmeans::lsmeans(model2, pairwise ~ width) $lsmeans width lsmean SE df lower.cl upper.cl Results are averaged over the levels of: depth

22 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS22 Confidence level used: 0.95 $contrasts contrast estimate SE df t.ratio p.value < <.0001 Results are averaged over the levels of: depth P value adjustment: tukey method for comparing a family of 3 estimates or used together with the xtable function: print(xtable(summary(lsmeans::lsmeans(model2, pairwise ~ depth)$lsmeans))) depth lsmean SE df lower.cl upper.cl print(xtable(summary(lsmeans::lsmeans(model2, pairwise ~ width)$lsmeans))) width lsmean SE df lower.cl upper.cl The multcomp package also includes the so-called compact letter displays: require(multcomp) tuk2 <- glht(model2, linfct = mcp(depth = "Tukey")) tuk.cld2 <- cld(tuk2) tuk.cld "a" "bc" "c" "b" "a"

23 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS23 ### use sufficiently large upper margin old.par <- par(mai=c(1,1,1.25,1), no.readonly=true) plot(tuk.cld2, col=2:6) a b c c b a linear predictor depth par(old.par) tuk2 <- glht(model2, linfct = mcp(width = "Tukey")) tuk.cld2 <- cld(tuk2) tuk.cld "b" "c" "a" ### use sufficiently large upper margin

24 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS24 old.par <- par(mai=c(1,1,1.25,1), no.readonly=true) plot(tuk.cld2, col=2:6) b c a linear predictor width par(old.par) The lmertest package also offers some differences of lsmeans posthoc analysis (based on the Satterthwaite s DF method) together with some plotting: summodel2 <- step(model2,reduce.fixed = FALSE, reduce.random = FALSE) ## Tests for random effects xtable(summodel2$rand.table)

25 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS25 Chi.sq Chi.DF p.value plank ## Tests for fixed effects xtable(summodel2$anova.table) Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) depth width ## LSMEANS table names(summodel2$lsmeans.table)[4]="se" names(summodel2$lsmeans.table)[7]="lowci" names(summodel2$lsmeans.table)[8]="uppci" xtable(summodel2$lsmeans.table ) depth width Estimate SE DF t-value LowCI UppCI p-value depth depth depth depth depth width width width ## DIFF LSMEANS table xtable(summodel2$diffs.lsmeans.table) ## Plots of all LSMEANS and DIFFLSMEANS: plot(summodel2)

26 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS26 Estimate Standard Error DF t-value Lower CI Upper CI p-value depth depth depth depth depth depth depth depth depth depth width width width depth width humidity Significance NS p value < p value < 0.01 p value < levels Using the generic plotting of LSMEANS and DIFFLSMEANS from the lmertest-package

27 enote EXERCISES 27 like this has currently the (unfortunate) feaure that it ignores any definition of mfrow for multiple-plot-pr-page setting one might have, and simply lists the plots on a number of pages with one plot pr. page. 3.9 Exercises Exercise 1 Colour of spinage Spinage heated to 90 or 100 degrees Celcius was vacuum packed and stored for 0, 1 or 2 weeks before the packs were opened and chill stored in normal atmosphere for 0, 1 or 2 days. Then the colour was measured on a Hunter Lab. Two of the colour coordinates, a and b (measuring respectively something like red and yellow colour), were recorded and are given in the data set below. The variable batch is a blocking variable referring to two batches of spinage. The data is available here and listed below: Batch temp weeks days a b A A A A A A A A A A A A A A A A A A B B B B

28 enote EXERCISES 28 B B B B B B B B B B B B B B a) Write down all the factors relevant for the analysis, and their levels and mutual structure. Are they crossed or nested, for example? Make the factor structure diagram. b) Analyse the effect of the different factors on the two colour measurements and summarize the significant effects. (lsmeans etc) Exercise 2 Sensory evaluation of spinage In the spinage experiment from exercise 1 sensory evaluations were performed beside the colour measurements. The treatments were still the same, so the factors were heating temperature, original storage (weeks), storage after opening (days), and batch. The products from each treatment combination from each batch were assessed by (some of) 7 assessors who gave a score (between 0 and 15) for each of 6 different sensory properties (see the list further below). There was one sesssion for each combination of batch and weeks, and at each session the assessors evaluated the same 6 products (6 combinations of days and temperature). Note that not all assessors were present at all sessions.

29 enote EXERCISES 29 The results, with one line per evaluation, are given in the order: weeks of storage, days after opening, batch, temperature, session number, assessor number, and the six sensory properties hay flavour 1, hay flavour 2, hay taste, spinage flavour 1, spinage flavour 2, spinage taste. The data is available here and listed partly below: 0 0 A A A A A A A A A A A A A A B B (252 lines in total) 2 2 B B a) Write down the factors relevant for the analysis, and their levels and mutual structure. [You should include a production factor corresponding to the combinations of temperature, weeks, days, and batch.] b) Specify which effects you want to include in the model. Pay particular attention to which interactions you want in the model. [Include at least some of the interactions between assessor and treatment factors]. Which effects are random and which are fixed?

30 enote EXERCISES 30 c) Perform the analysis for one of the sensory properties and draw conclusions.

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall