margarine Name: 2017-04-24 Contents margarine 1 data analysis............................................. 1 ANOVA F test for equality of means................................ 3 multiple comparisons......................................... 4 margarine references: - Peck, 1/e, 17.20 - ANOVA table, Peck, chapter 17, table 17.2, p.14 - saturated fat, Wikipedia - myristic acid, a saturated fatty acid, Wikipedia - monounsaturated fat, Wikipedia - polyunsaturated fat, Wikipedia data analysis Import the data. Measure the physiologically active polyunsaturated fatty acids (= PAPUFA, in percent) for each sample of margarine. BlueBonnet <- read.table("bluebonnet.txt", header=false, sep=" ") Chiffon <- read.table("chiffon.txt", header=false, sep=" ") Fleischmanns <- read.table("fleischmanns.txt", header=false, sep=" ") Imperial <- read.table("imperial.txt", header=false, sep=" ") Mazola <- read.table("mazola.txt", header=false, sep=" ") Parkay <- read.table("parkay.txt", header=false, sep=" ") val <- unname(unlist(c(bluebonnet, Chiffon, Fleischmanns, Imperial, Mazola, Parkay))) label <- c(rep("bluebonnet", length(bluebonnet)), rep("chiffon", length(chiffon)), rep("fleischmanns", length(fleischmanns)), rep("imperial", length(imperial)), rep("mazola", length(mazola)), rep("parkay", length(parkay))) data <- data.frame(label, val) head(data) ## label val ## 1 BlueBonnet 13.5 ## 2 BlueBonnet 13.4 ## 3 BlueBonnet 14.1 1
## 4 BlueBonnet 14.3 ## 5 Chiffon 13.2 ## 6 Chiffon 12.7 The data has two columns and 26 rows. Equal standard deviations? margarine.sd <- aggregate(val ~ label, data=data, sd) barplot(margarine.sd$val, col=terrain.colors(6), names.arg=margarine.sd$label, las=1, cex.names=0.8, ylab="standard deviation of PAPUFA") Standard deviation of PAPUFA 0.6 0.5 0.4 0.3 0.2 0.1 0.0 BlueBonnet Chiffon Fleischmanns Imperial Mazola Parkay largest.ratio <- margarine.sd$val[3] / margarine.sd$val[4] largest.ratio ## [1] 1.820931 Boxplots. boxplot(val ~ label, data=data, horizontal=true, las=1, par(mar=c(4, 7, 2, 2)), col=terrain.colors(6), xlab="papufa (percent)") 2
Parkay Mazola Imperial Fleischmanns Chiffon BlueBonnet 13 14 15 16 17 18 PAPUFA (percent) ANOVA F test for equality of means H 0 : all the means are the same H a : not all the means are the same Construct a linear model and call anova on that model. margarine.lm <- lm(val ~ label, data=data) options(show.signif.stars = FALSE) anova(margarine.lm) ## Analysis of Variance Table ## ## Response: val ## Df Sum Sq Mean Sq F value Pr(>F) ## label 5 108.19 21.637 79.264 1.737e-12 ## Residuals 20 5.46 0.273 There are g = 6 groups and n = 26 values, so the test statistic is F = 79.264 with g 1 = 5 and n g = 20 degrees of freedom. Confirm that the p-value of this statistic is as reported in the anova display. 1 - pf(79.264, df1=5, df2=20) ## [1] 1.736833e-12 Illustration Here is an illustration relating these statistics. 3
x.max <- 100 y.max <- 0.8 f.val <- 79.264 g <- 6 n <- nrow(data) f.df1 <- g - 1 f.df2 <- n - g f.p.value <- 1.737e-12 title <- "F Test" draw.f(x.max, y.max, f.val, f.df1, f.df2, f.p.value, title) F Test Density 0.0 0.2 0.4 0.6 0.8 F(df 1 = 5, df 2 = 20) p value = 1.737e 12 F = 79.264 0 20 40 60 80 100 x Conclusion. State the formal conclusion of the HT and explain how you reached that conclusion p.value <- f.p.value alpha <- 0.05 reject.h0 <- p.value <= alpha reject.h0 ## [1] TRUE State the conclusion in context. multiple comparisons R s TukeyHSD procedure (= Tukey Honest Significant Differences) implements the Tukey-Cramer Multiple Comparison Procedure discussed in our text. 4
TukeyHSD(aov(margarine.lm)) ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = margarine.lm) ## ## $label ## diff lwr upr p adj ## Chiffon-BlueBonnet -0.725-1.8862516 0.4362516 0.3963073 ## Fleischmanns-BlueBonnet 4.275 3.1137484 5.4362516 0.0000000 ## Imperial-BlueBonnet 0.275-0.8862516 1.4362516 0.9736311 ## Mazola-BlueBonnet 3.315 2.2133400 4.4166600 0.0000001 ## Parkay-BlueBonnet -1.025-2.1266600 0.0766600 0.0775326 ## Fleischmanns-Chiffon 5.000 3.8387484 6.1612516 0.0000000 ## Imperial-Chiffon 1.000-0.1612516 2.1612516 0.1176619 ## Mazola-Chiffon 4.040 2.9383400 5.1416600 0.0000000 ## Parkay-Chiffon -0.300-1.4016600 0.8016600 0.9526463 ## Imperial-Fleischmanns -4.000-5.1612516-2.8387484 0.0000000 ## Mazola-Fleischmanns -0.960-2.0616600 0.1416600 0.1107597 ## Parkay-Fleischmanns -5.300-6.4016600-4.1983400 0.0000000 ## Mazola-Imperial 3.040 1.9383400 4.1416600 0.0000004 ## Parkay-Imperial -1.300-2.4016600-0.1983400 0.0150616 ## Parkay-Mazola -4.340-5.3786550-3.3013450 0.0000000 par.orig <- par(mar=c(2, 12, 0.5, 0.5), las = 1, mgp = c(2.9, 0.7, 0)) plot(tukeyhsd(aov(margarine.lm)), las=1, col="forestgreen") 5
Chiffon BlueBonnet Fleischmanns BlueBonnet Imperial BlueBonnet Mazola BlueBonnet Parkay BlueBonnet Fleischmanns Chiffon Imperial Chiffon Mazola Chiffon Parkay Chiffon Imperial Fleischmanns Mazola Fleischmanns Parkay Fleischmanns Mazola Imperial Parkay Imperial Parkay Mazola 6 4 2 0 2 4 6 Conclusion. Interpret these results. Underscoring pattern. means <- aggregate(data[, 2], list(data$label), mean) names(means) <- c("margarine", "x.bar") means[order(means$x.bar), ] # calculate the means # order by mean ## margarine x.bar ## 6 Parkay 12.800 ## 2 Chiffon 13.100 ## 1 BlueBonnet 13.825 ## 4 Imperial 14.100 ## 5 Mazola 17.140 ## 3 Fleischmanns 18.100 Groups. Groups consist of means which have not yet been shown to be distinct. [P-C-B] [C-B-I] [B-I] [M-F] 6