enote 3 1 enote 3 Case study

Size: px

Start display at page:

Download "enote 3 1 enote 3 Case study"

Calvin Grant
5 years ago
Views:

1 enote 3 1 enote 3 Case study

2 enote 3 INDHOLD 2 Indhold 3 Case study Introduction Initial explorative analysis Test of overall effects/model reduction Post hoc analysis and summarizing the results Estimates of the variance parameters Estimates of the fixed parameters Comparisons of the fixed parameters R-TUTORIAL: Creating report ready tables and figures Plot devices Plotting with colours Report ready tables with xtable R-TUTORIAL: Initial explorative analysis Test of overall effects/model reduction R-TUTORIAL: Post hoc analysis and summarizing the results Exercises

3 enote INTRODUCTION Introduction This module consists of the first part of a complete analysis of the beech wood data presented as an example in module 2. The aim is to show that the principles for data analysis and result summary for fixed ANOVA and/or regression models also apply for mixed models. And maybe some readers will find it helpful to have some of these principles reviewed. For completeness we repeat here the description and initial factor structure considerations. To investigate the effect of drying of beech wood on the humidity percentage, the following experiment was conducted. Each of 20 planks was dryed in a certain period of time. Then the humidity percentage was measured in 5 depths and 3 widths for each plank: depth 1: close to the top depth 5: in the center depth 9: close to the bottom depth 3: between 1 and 5 depth 7: between 5 and 9 width 1: close to the side width 3: in the center width 2: between 1 and 3 So there are 3 5 = 15 measurements for each plank and all together 300 observations. The data is can be found as planks.txt and is reproduced in the following table.

4 enote INTRODUCTION 4 Width 1 Width 2 Width 3 Depth Depth Depth Planks In this experiment we have 3 factors apart from the trivial factors I and 0. Let us use the factor names plank, width and depth. The factor plank has 20 levels, width has 3 and depth has 5 levels. For the ith measurement of humidity, plank i denotes the plank on which this measurement was performed. And correspondingly width i and depth i denotes the width and depth, respectively, of this ith measurement. It would be natural to include the interaction between width and depth corresponding to the product factor width depth. The product factor has in this case 15 levels. A natural model would include plank as a block factor while depth and width enter together with their interaction. If Y i denotes the humidity percentage corresponding to the ith measurement, the model with fixed block effect can be written as: Y i = µ + α(width i ) + β(depth i ) + γ(width i, depth i ) + δ(plank i ) + ɛ i, (3-1) where i = 1,..., 300 and where the ɛ i s are independent and normally distributed random variables. Or similarly: Y ijk = µ + α i + β j + γ ij + δ k + ɛ ijk

5 enote INITIAL EXPLORATIVE ANALYSIS 5 Figur 3.1: The factor structure diagram where Y ijk is the kth measurement within the (i, j)th combination of the two factors, i = 1,..., 3, j = 1,..., 5 and k = 1,..., 20. As pointed out in Module 1 the block (plank) effect should be considered as a random effect, leading to the mixed model: Y i = µ + α(width i ) + β(depth i ) + γ(width i, depth i ) + d(plank i ) + ɛ i, (3-2) where d(plank i ) N(0, σplank 2 ) and ɛ ijk N(0, σ 2 ). This model corresponds to the factor structure diagram given in figure Initial explorative analysis Having realized the complete structure of the data, it is time to do initial plotting/ explorative analysis. Throughout this module, figures and results are presented without showing R code or raw R output. This can be seen as a standard for reports in the course! Typically, numerous figures not entering a final project report should be studied, since this phase is explorative, and final figures to present the key results are chosen after the statistical analyses are completed. The plotting of various average profiles is usually a helpful tool for data with several factors. In Figure 3.2 four of these are presented. In the top left diagram the width humidity patterns for each plank is depicted by plotting the average humidity (taking the average of the five depths for each width and plank) against the widths. It is immediately clear that there is extensive plank-to-plank variations in the level of humidity. The message about the width effect is less clear. In the top right the similar

6 enote INITIAL EXPLORATIVE ANALYSIS mean of humidity mean of humidity width depth mean of humidity depth mean of humidity width width depth Figur 3.2: Four average humidity profiles plot for the depth effect is seen. Here the message is much clearer: The humidity is high in the center (depth=5) and low at the top (depth=1) and at the bottom (depth=9). As pointed out, this is the effect seen when the three widths are averaged. It could be that the depth effect is different for widths close to the side of the plank (width=1) than for widths in the center (width=3). In other words, there could be a plank*width interaction effect, that we wouldn t find in the plots above. Instead similar plots are given in the bottom diagrams of figure 3.2 for the widths and depths by averaging over the planks (that is, plotting the 15 average values). The depth structure already seen is recognized. Also, it is seen that there is a clear shift in humidity level from width to width and that the depth humidity pattern seems to be roughly the same for the three widths. However, there are some deviations from parallel patterns and the uncertainties in the deviations from parallel patterns are not visible. A similar increasing-decreasing width pattern, that was not clearly visible from the top diagram is now seen. This pattern seems to be roughly the same for all depths (with the

7 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 7 same precautions as before) and the low humidity levels for the top and bottom depths are clearly seen. Note again that the two bottom plots contain the same information: had there been clearly non-parallel patterns in one figure (an interaction effect) this would also appear in the other figure. The next step is to start the actual statistical analysis of the data. 3.3 Test of overall effects/model reduction A statistical analysis of this kind is commonly carried out in several steps, starting with the basic model found from the factor structure considerations. This model usually contains every possible effect there may be in the data. However, it is of interest to simplify things into easily interpretable results, if possible! So, the idea is to remove nonsignificant complex stuff from the model before summarizing the results. Carrying out the mixed model analysis corresponding to the model given by (3-2) gives the following ANOVA table of fixed effects: Source of Numerator degrees Denominator degrees F- P- variation of freedom of freedom statistics values depth < width < depth*width We see, that the depth*width interaction effect is non-significant. Hence, we remove the interaction term and do the analysis based on the model: Y i = µ + α(width i ) + β(depth i ) + d(plank i ) + ɛ i, (3-3) where d(plank i ) N(0, σ 2 Plank ) and ɛ i N(0, σ 2 ). This model is illustrated by the factor structure diagram in figure 3.3. Note how the 8 degrees of freedom from the interaction effect has now been added to the error degrees of freedom. The table of fixed effects then becomes: Source of Numerator degrees Denominator degrees F- P- variation of freedom of freedom statistics values depth < width <0.0001

8 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 8 Figur 3.3: The factor structure diagram Note that the removal of the non-significant interaction effect only has minor effects on the conclusions regarding the depth and width effects: They are both extremely significant, confirming what we explored above. Since there are no more non-significant fixed effects, the model given by 3-3 is the final model to use for summarizing the results. 3.4 Post hoc analysis and summarizing the results Estimates of the variance parameters The final model is given by (3-3), since main effects of as well width as depth are clearly significant. Estimates of the two variance parameters are: ˆσ 2 Planks = , ˆσ 2 = Uncertainties of these estimates on the standard deviation scale given as 95% profile likelihood confidence limits are: 2.5 % 97.5 % Planks Residual

9 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 9 The remaining part of this subsection on post-hoc analysis and presentation of results illustrates how the information in factors can be summarized whenever the factor does not interact with any other factor Estimates of the fixed parameters Estimates of the expected values (LSMEANS) for each level of depth, together with their uncertainties and 95% confidence intervals are: Estimate SE Lower Upper Depth Depth Depth Depth Depth and correspondingly for each level of width: Estimate SE Lower Upper Width Width Width Comparisons of the fixed parameters A commonly used post hoc analysis is to compare either specific pairs of depths (resp. widths) or compare all combinations within each factor. For the former, a standard t- tests can be used, e.g. ˆβ(1) ˆβ(2) t = SE ( ˆβ(1) ˆβ(2) ) using the error degrees of freedom (274). Or equivalently expressed by a 95% confidence interval: ˆβ(1) ˆβ(2) ± t.975,274 SE ( ˆβ(1) ˆβ(2) )

10 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 10 In this case, the estimates of the fixed effects are raw averages of the data based on the same number of observations for each level, so the standard error of the difference between two depth levels is given by SE ( ˆβ(1) ˆβ(2) ) = 2 ˆσ 2 /60 This means that two depth levels are claimed signifcantly different if they differ by more than t.975,274 2 ˆσ 2 /60 from each other. This is also called the 95% Least Significant Difference (LSD) value. It would be tempting to do such tests for all combinations of levels within each factor. This is generally NOT an acceptable approach, since the probability of significanceby-chance becomes too large when many tests are performed simultaneously. This is called the multiplicity problem With five depth levels there are 5 4/2 = 10 possible depth pairs to compare. Comparing two specific (decided before seeing the data) levels is not the same as comparing the smallest among five with the largest among five. In a case with no effects one would always expect the latter two to be more different by chance than the former. There are numerous solutions to properly handle this problem, if all comparisons indeed are made. All of them amounts to requiring differences to be larger than required by the usual t-test to be claimed significant. One general idea, that can be used whenever numerous tests are performed simultaneously, is the Bonferroni correction: If k tests are performed simultaneously, then use level α/k in each test rather than α. For instance, if all depth levels are compared, standard pair-wise t-test output can be used, but employing level 0.5% in each test rather than 5%: So only claiming those differences significant for which the usual P-value is less than This method is known to be somewhat conservative, meaning that it may be too critical, or in other words again: it may miss some actual differences. Another solution is to use another distribution than the t-distribution, when comparisons are made. With the so-called Tukey-Kramer method two depth levels would be claimed signifcantly different if they differ by more than ν.975,j,274 ˆσ 2 /60 from each other, where J is the number of groups to be compared and ν 0.975,J,274 is the 97.5%-quantile of the so-called studentized range distribution with J groups. This distribution takes into account that the two levels that we compare in a single test is coming from J groups all together. This distribution is, just like the t-distribution, tabulated or available in the computer. Note that if J = 2, then the studentized range

11 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 11 distribution corresponds to the t-distribution, The Tukey-adjusted results are: ν.975,2,274 = t.975,274 2 Depth Parameter Estimate SE Lower Upper P-value difference 1-3 β(1) β(2) < β(1) β(3) < β(1) β(4) < β(1) β(5) β(2) β(3) β(2) β(4) β(2) β(5) < β(3) β(4) β(3) β(5) < β(4) β(5) < Note that since the P-values are corrected, that is, based on the more proper studentized range distribution, they can be used directly without any additional Bonferroni correction. Similarly for the width effect: Width Parameter Estimate SE Lower Upper P-value difference 1-2 α(1) α(2) α(1) α(3) < α(2) α(3) < Frequently, the key information of the two tables for each effect is summarized into a single table in which the lsmeans are ordered by size: Depth 9 Depth 1 Depth 7 Depth 3 Depth 5 Estimate a a b bc c

12 enote POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 12 The letter subscripts express the 5% significance results of the 10 pair-wise comparisons: Two depths sharing a subscript are NOT significantly different Two depths NOT sharing a subscript are significantly different So the pattern already observed in Figure 3.2 can now be statistically confirmed: there is a clear lower humidity close to the top and the bottom (and no difference between top and bottom). Also there is an indication that the center position has significantly higher humidity than the in between positions (between which no difference is seen). For the width effect, the summary table becomes particularly simple, since all three differences are significant: Width 3 Width 1 Width 2 Estimate a b c For these data, a figure of the raw data, like one of the bottom plots of figure 3.2 together with a statement of the lack of significant width*depth interaction and the two summary tables would probably suffice for most purposes. In later modules we will see how additional plots of the model expectations/details will provide informative figures for interpretation. Other types (than the multiple comparison approach) of post hoc analysis may be employed, especially when quantitative information about the factor levels are available. In this case we know exactly the positions that corresponds to the different widths and depths and this could be used in the analysis. For instance, it could be investigated whether a quadratic function of the depths could be used to describe the humidity pattern. Apart from the nice direct functional interpretation of the dependence of humidity on depth, it could possibly provide more powerful tests for interaction effects. In fact this would still be a linear model, and could be handled by lmer from the lme4- package. We will return to such analyzes in a later module. Non-linear models (using e.g. exponentials etc) could also be an option in some cases, but then the model will no longer be a linear model, and additional theory and packages would be needed. The summary approach above was based on the assumption of no interaction between width and depth, that is, the conclusions regarding widths hold for all the depths, and vice versa. Had there been a significant interaction, we would have to present, say, the

13 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 13 depth effects for each of the three widths (and/or vice versa), since the significance tells us that these three conclusions will NOT be the same. In practice, we proceed as above, BUT for the combined width*depth factor with 15 levels rather than for each of them separately. We will see examples of this later. One important step in the analysis given is missing: An investigation of the validity of the model assumptions! We return to this issue in Module 6, where we then finish the analysis of this data set on the humidity of beech wood planks. 3.5 R-TUTORIAL: Creating report ready tables and figures Since reports witout raw R-code or raw R-output are requested as well in this course as generally, it is useful to be able to apply some of the tools given in R to create nice tables (and figures) for LATEX and/or Microsoft Word-based report writing Plot devices First of all, there are different device functions for saving plots in various formats, e.g., to save a plot as a pdf, write: pdf("myplanksinteractionplot.pdf") with(planks, interaction.plot(depth, width, humidity, col=2:4)) dev.off() Note that dev.off() lets R know that no further graphics commands will follow. It turns off the graphics device and saves the figure to the designated file. Or as a png: (you choose the extension of the output file yourself, but it is clearly recommended to choose an extension that corresponds to the device function (here pdf or png.) png("myplanksinteractionplot.png") with(planks, interaction.plot(depth, width, humidity, col=2:4)) dev.off() Similarly, there are bmp, jpeg and other device functions. Plots can also be exported directly from the Plots -window in Rstudio.

14 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES Plotting with colours Colors can be specified in several different ways. And various plot functions may have various colour options for colouring different aspects of the plot. The simplest way to specify a colour is with a character string giving the color name (e.g., "red"). A list of the possible colors can be obtained with the function colours, write: colors (distinct = FALSE) to see all the possible choices. Have a look at this website to see what all these colours look like, or go to: the QuickR website. Even more easily you can use integers as colour codes. As a default R uses a palette of 8 colours: palette() [1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" [8] "gray" which can then be refered to by the numbers 1-8. And then it would cycle modulus 8, meaning that using 9 would give "black" again. There are a number pre-defined palettes that can be used when more (and better) collection of colours are needed, e.g. functions rainbow and hsv, e.g. write:?heat.colors which then could be used e.g. as (plots not shown): par(mfrow=c(2,2)) with(planks, { interaction.plot(width, plank, humidity, legend=false, col=heat.colors(20)) interaction.plot(depth, plank, humidity, legend=false, col=terrain.colors(20)) interaction.plot(width, depth, humidity, col=topo.colors(5)) interaction.plot(depth, width, humidity, col=cm.colors(3)) }) par(mfrow=c(1,1))

15 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 15 Or: # Notice the value 10 is used to tell that you want 10 colors # e.g. rainbow(10) gives 10 different colors. rainbow(5) gives 5 colors with(planks, interaction.plot(width, depth, humidity, col=rainbow(5))) Or: with(planks, interaction.plot(width, depth, humidity, col=hsv(1:5/5))) Report ready tables with xtable Nice tables can be produced by the xtable function from the xtable-package. An example: means <- as.matrix(with(planks, tapply(humidity, width, mean))) xtable(means) % latex table generated in R by xtable package % Wed Sep 27 11:51: \begin{table}[ht] \centering \begin{tabular}{rr} \hline & x \\ \hline 1 & 5.51 \\ 2 & 5.79 \\ 3 & 5.10 \\ \hline \end{tabular} \end{table} When this tex-code is included in your tex-file it will appear in the report as in the following table. Note how the input to xtable was a matrix here. The function is prepared to recognize a number of different R-objects, see e.g.:

16 enote R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 16 x methods(xtable) [1] xtable.anova* xtable.aov* [3] xtable.aovlist* xtable.coxph* [5] xtable.data.frame* xtable.glm* [7] xtable.gmsar* xtable.lagimpact* [9] xtable.lm* xtable.matrix* [11] xtable.prcomp* xtable.sarlm* [13] xtable.sarlm.pred* xtable.spautolm* [15] xtable.sphet* xtable.splm* [17] xtable.stsls* xtable.summary.aov* [19] xtable.summary.aovlist* xtable.summary.glm* [21] xtable.summary.gmsar* xtable.summary.lm* [23] xtable.summary.prcomp* xtable.summary.sarlm* [25] xtable.summary.spautolm* xtable.summary.sphet* [27] xtable.summary.splm* xtable.summary.stsls* [29] xtable.table* xtable.ts* [31] xtable.zoo* see?methods for accessing help and source code For instance, ANOVA-tables will be recognized. So a LATEX-user can then copy these tex-lines into the report.tex-document. Or to integrate the R-code into the LATEX-code, use the knitr R-package to create the pure tex-file from an.rnw file, which is a kind of LATEX-file with all the R-code integrated into it, with a lot of flexibility in controlling what will be showed/evaluated etc. in the output. This can be used for both raw code, results, tables and figures. A Microsoft Word user may also use xtable through the html-print-option: print(xtable(means), type = "html")  <!-- Wed Sep 27 11:51: >

17 enote R-TUTORIAL: INITIAL EXPLORATIVE ANALYSIS 17 <table border=1> <tr> <th> </th> <th> x </th> </tr> <tr> <td align="right"> 1 </td> <td align="right"> 5.51 </td> </tr> <tr> <td align="right"> 2 </td> <td align="right"> 5.79 </td> </tr> <tr> <td align="right"> 3 </td> <td align="right"> 5.10 </td> </tr> </table> And then print the table directly into a file: print(xtable(means), type = "html", file = "myhtmltable.html") Open the file in a browser and copy-paste to Word. 3.6 R-TUTORIAL: Initial explorative analysis The data set planks is imported as described in enote 1. Assume that the data set is called planks in R. The plots in Figure 3.2 in are produced using the function interaction.plot() which requires three arguments: first the factor that is to be on the x-axis, then the factor that separates the data into distinct graphs and finally the response variable. An optional parameter legend which takes either FALSE or TRUE specifies whether or not a legend should be added (relating the graphs to the factor levels) The code that produced this figure was: par(mar = c(3.5, 3.5, 1, 1), # smaller margin on top and right mgp = c(2.4,0.7,0), # position of axis labels, ticks labels and axis las=1) planks <- read.table("planks.txt", header = TRUE, sep = ",") Ylim <- c(3, 9) par(mfrow=c(2,2)) with(planks, { interaction.plot(width, plank, humidity, ylim=ylim, legend=false, bty="n", col=2:11, xtick = TRUE) interaction.plot(depth, plank, humidity, ylim=ylim, legend=false, bty="n", col=2:11, xtick = TRUE) interaction.plot(width, depth, humidity, ylim=ylim,

18 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 18 bty="n", col=2:11, xtick = TRUE) interaction.plot(depth, width, humidity, ylim=ylim, bty="n", col=2:11, xtick = TRUE) }) par(mfrow=c(1,1)) Notice that the with{... } function around the interaction.plot statements results in evaluation of the statements within a frame where the data set planks is available. This approach avoids having to attach data sets. The function par is used to set a variety of graphical parameters (try typing?par for details). The parameter mfrow is a vector of length two where the first component is the number of rows on the graphical device and the second component is the number of columns on the graphical device. To return to the default use par(mfrow=c(1, 1)). 3.7 Test of overall effects/model reduction In the previous section we did not need to define the variables as factors in Rto use interaction.plot, but in the following we do. Configure the three variables depth, plank and width as factors: planks$plank <- factor(planks$plank) planks$depth <- factor(planks$depth) planks$width <- factor(planks$width) Analysis of models including random effects can be done using the lmer function from the R-package lme4. The general model with fixed-effects structure consisting of the interaction between two factors and random effects assigned to the plank is specified as follows require(lme4) model1 <- lmer(humidity ~ depth*width +(1 plank), data = planks) Notice that the fixed-effects structure is specified as either depth + width + depth:width or depth*width as more short used here they give the same result. The relevant tests of the fixed-effects structure are obtained applying anova(model1) after making sure the lmertest-package is available

19 enote TEST OF OVERALL EFFECTS/MODEL REDUCTION 19 require(lmertest) anova(model1) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) depth < 2.2e-16 *** width e-12 *** depth:width Signif. codes: 0 *** ** 0.01 * lmertest automatically loads lme4, so we could have just run require(lmertest) from the beginning instead. Using Anova from the car-package we obtain: require(car) Anova(model1, test.statistic = "F", type = 3) Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df) Response: humidity F Df Df.res Pr(>F) (Intercept) < 2e-16 *** depth < 2e-16 *** width * depth:width Signif. codes: 0 *** ** 0.01 * The interaction is not significant and a reduced model can be formulated model2 <- lmer(humidity ~ depth + width + (1 plank), data = planks) anova(model2) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)

20 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS20 depth < 2.2e-16 *** width e-12 *** --- Signif. codes: 0 *** ** 0.01 * Both factors are highly significant and no further reduction is possible. 3.8 R-TUTORIAL: Post hoc analysis and summarizing the results Estimates of the variance-parameters are found with VarCorr(model2) Groups Name Std.Dev. plank (Intercept) Residual Note that the estimates are given on the standard-deviation scale not the variancescale. The so-called profile likelihood based confidence intervals for the two variance parameters are found with: m2prof <- profile(model2, which=1:2, signames=false) confint(m2prof) 2.5 % 97.5 % sd_(intercept) plank sigma The profile function by default profiles the likelihood for all model parameters, but since profiling is time-consuming and since we are only interested in the profile likelihood confidence intervals for the two variance parameters we set the which=1:2 option. As in enote 1 we can use lsmeans to compute the estimated mean levels and their differences:

21 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS21 require(lsmeans) lsmeans::lsmeans(model2, ~ depth) depth lsmean SE df lower.cl upper.cl Results are averaged over the levels of: width Degrees-of-freedom method: satterthwaite Confidence level used: 0.95 lsmeans::lsmeans(model2, pairwise ~ width) $lsmeans width lsmean SE df lower.cl upper.cl Results are averaged over the levels of: depth Degrees-of-freedom method: satterthwaite Confidence level used: 0.95 $contrasts contrast estimate SE df t.ratio p.value < <.0001 Results are averaged over the levels of: depth P value adjustment: tukey method for comparing a family of 3 estimates Observe that writing pairwise ~ (LS) means. generates all pairwise differences of the expected

22 enote R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS22 The multcomp package also includes the so-called compact letter displays: require(multcomp) tuk2 <- glht(model2, linfct = mcp(depth = "Tukey")) tuk.cld2 <- cld(tuk2) tuk.cld2 # Display the CLD "a" "bc" "c" "b" "a" # Plot the compact-letter-display: old.par <- par(no.readonly=true) # Save current graphics parameters par(mai=c(1,1,1.25,1)) # Use sufficiently large upper margin plot(tuk.cld2, col=2:6) a b c c b a linear predictor depth

23 enote EXERCISES 23 par(old.par) # reset graphics parameters The lmertest-package has a rand function which produces an ANOVA-like table of χ 2 - tests of the random effects in a mixed model: rand(model2) Analysis of Random effects Table: Chi.sq Chi.DF p.value plank <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Exercises Exercise 1 Colour of spinage Spinage heated to 90 or 100 degrees Celcius was vacuum packed and stored for 0, 1 or 2 weeks before the packs were opened and chill stored in normal atmosphere for 0, 1 or 2 days. Then the colour was measured on a Hunter Lab. Two of the colour coordinates, a and b (measuring respectively something like red and yellow colour), were recorded and are given in the data set below. The variable batch is a blocking variable referring to two batches of spinage. The data is available in the file spinage.txt and listed here: Batch temp weeks days a b A A A A A A A A A A

24 enote EXERCISES 24 A A A A A A A A B B B B B B B B B B B B B B B B B B a) Write down all the factors relevant for the analysis, and their levels and mutual structure. Are they crossed or nested, for example? Make the factor structure diagram. b) Analyse the effect of the different factors on the two colour measurements and summarize the significant effects. (lsmeans etc)

25 enote EXERCISES 25 Exercise 2 Sensory evaluation of spinage In the spinage experiment from exercise 1 sensory evaluations were performed beside the colour measurements. The treatments were still the same, so the factors were heating temperature, original storage (weeks), storage after opening (days), and batch. The products from each treatment combination from each batch were assessed by (some of) 7 assessors who gave a score (between 0 and 15) for each of 6 different sensory properties (see the list further below). There was one sesssion for each combination of batch and weeks, and at each session the assessors evaluated the same 6 products (6 combinations of days and temperature). Note that not all assessors were present at all sessions. The results, with one line per evaluation, are given in the order: weeks of storage, days after opening, batch, temperature, session number, assessor number, and the six sensory properties hay flavour 1, hay flavour 2, hay taste, spinage flavour 1, spinage flavour 2, spinage taste. The data is available in the file spinagesens.txt and listed partly below: 0 0 A A A A A A A A A A A A A A B B (252 lines in total) 2 2 B B

26 enote EXERCISES 26 a) Write down the factors relevant for the analysis, and their levels and mutual structure. [You should include a production factor corresponding to the combinations of temperature, weeks, days, and batch.] b) Specify which effects you want to include in the model. Pay particular attention to which interactions you want in the model. [Include at least some of the interactions between assessor and treatment factors]. Which effects are random and which are fixed? c) Perform the analysis for one of the sensory properties and draw conclusions.

enote 3 1 enote 3 Case study

enote 3 1 enote 3 Case study enote 3 INDHOLD 2 Indhold 3 Case study 1 3.1 Introduction.................................... 3 3.2 Initial explorative analysis............................ 5 3.3 Test of overall