SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape B=1) Factor C: speed (low=0/high=1) Response: Life of machine in tool hours. An engineer is interested in the effects of cutting angle (A), tool geometry (B), and cutting speed (C) on the life (in hours) of a machine tool. Three runs are done for each combination of factor levels, and all runs are done in random order. This is a completely randomized design (CRD). { D.C. Montgomery (2005). Design and analysis of experiments. John Wiley & Sons: USA. } SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/ data tool; do angle = 0,1; do geometry = 0,1; do speed = 0,1; do replicate = 1 to 3; input life @@; output; end; end; end; end; datalines; 22 31 25 32 43 29 35 34 50 55 47 46 44 45 38 40 37 36 60 50 54 39 41 47 ; proc print data=tool; Obs angle geometry speed replicate life 1 0 0 0 1 22 2 0 0 0 2 31 3 0 0 0 3 25 4 0 0 1 1 32 5 0 0 1 2 43 6 0 0 1 3 29 7 0 1 0 1 35 8 0 1 0 2 34 9 0 1 0 3 50 10 0 1 1 1 55

11 0 1 1 2 47 12 0 1 1 3 46 13 1 0 0 1 44 14 1 0 0 2 45 15 1 0 0 3 38 16 1 0 1 1 40 17 1 0 1 2 37 18 1 0 1 3 36 19 1 1 0 1 60 20 1 1 0 2 50 21 1 1 0 3 54 22 1 1 1 1 39 23 1 1 1 2 41 24 1 1 1 3 47 proc glm data=tool plot=diagnostics; class angle geometry speed replicate; model life=angle geometry speed; /* Full model fits a separately fit cell mean */ Partial output: Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 7 1612.666667 230.380952 7.64 0.0004 Error 16 482.666667 30.166667 Corrected Total 23 2095.333333 Dependent Variable: life Source DF Type III SS Mean Square F Value Pr > F angle 1 280.1666667 280.1666667 9.29 0.0077 geometry 1 770.6666667 770.6666667 25.55 0.0001 angle*geometry 1 48.1666667 48.1666667 1.60 0.2245 speed 1 0.6666667 0.6666667 0.02 0.8837 angle*speed 1 468.1666667 468.1666667 15.52 0.0012 geometry*speed 1 16.6666667 16.6666667 0.55 0.4681 angle*geometry*speed 1 28.1666667 28.1666667 0.93 0.3483 <--- The diagnostic plots look OK, and the 3-way interaction is not significant here, so that term could be removed from the model (which places it in the error term).

According to the Type III ANOVA table, the 2-way interaction between angle (A) and speed (C) is significant, and the other 2-way interactions are not significant (AB and BC). We will look at the marginal 2-way interaction plot for each combination of factors AB, AC, and BC (these plots average over replicates in a cell and over the levels of the unplotted factor)... Source DF Type III SS Mean Square F Value Pr > F angle*geometry 1 48.1666667 48.1666667 1.60 0.2245 angle*speed 1 468.1666667 468.1666667 15.52 0.0012 geometry*speed 1 16.6666667 16.6666667 0.55 0.4681 angle*geometry*speed 1 28.1666667 28.1666667 0.93 0.3483 /* Look at the marginal 2-way interaction plots.*/ symbol1 interpol=std1mj value=star line=1 color=black; symbol2 interpol=std1mj value=diamond line=2 color=blue; proc gplot data=tool; plot life*angle=geometry/haxis=-.5 to 1.5; title "AB interaction (averaged across third factor)"; proc gplot data=tool; plot life*angle=speed/haxis=-.5 to 1.5; title "AC interaction (averaged across third factor)"; proc gplot data=tool; plot life*speed=geometry/haxis=-.5 to 1.5; title "BC interaction (averaged across third factor)";

The type of interaction in the AC plot causes concern for making global statements about the main effects for angle (A) and speed(c), and this interaction is statistically significant. When angle is low (far left side), speed has a positive effect on life, and when angle is high (far right side), speed has a negative effect on life. The minimal model should include: A, B, C, AC (following the hierarchy principle).

Suppose the 3-way interaction was significant. How to proceed?... Subset data? One could proceed by considering a separate two-factor factorial model for each level of angle that includes speed and geometry. /*Fit 2-factor model for low A.*/ data lowa; set tool; if angle=0; proc glm data=lowa plot=diagnostics; class speed geometry replicate; model life=speed geometry; lsmeans geometry speed; /* The plot generated by the following is the same as that provided by PROC GLM*/ proc gplot data=lowa; plot life*speed=geometry; title Low angle: 2-way plot for BC; Class Level Information Class Levels Values speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 3 854.916667 284.972222 6.33 0.0166 Error 8 360.000000 45.000000 Corrected Total 11 1214.916667 Source DF Type III SS Mean Square F Value Pr > F speed 1 252.0833333 252.0833333 5.60 0.0455 geometry 1 602.0833333 602.0833333 13.38 0.0064 speed*geometry 1 0.7500000 0.7500000 0.02 0.9005 When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C) (see plot next page). There is a significant positive speed effect, and a significant positive geometry main effect.

Provided from PROC GLM. Least Squares Means geometry life LSMEAN 0 30.3333333 1 44.5000000 speed life LSMEAN 0 32.8333333 1 42.0000000 When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C). There is a significant positive speed effect, and a significant positive geometry main effect.

If you d like to get the estimates for the parameters in the model that you fitted, you can request them with the solution option in the model statement. But I think, in this case, the means are probably easier to interpret to a client. proc glm data=lowa plot=diagnostics; class speed geometry replicate; model life=speed geometry/solution; lsmeans geometry speed; lsmeans geometry*speed; Standard Parameter Estimate Error t Value Pr > t Intercept 49.33333333 B 3.87298335 12.74 <.0001 speed 0-9.66666667 B 5.47722558-1.76 0.1156 speed 1 0.00000000 B... geometry 0-14.66666667 B 5.47722558-2.68 0.0280 geometry 1 0.00000000 B... speed*geometry 0 0 1.00000000 B 7.74596669 0.13 0.9005 speed*geometry 0 1 0.00000000 B... speed*geometry 1 0 0.00000000 B... speed*geometry 1 1 0.00000000 B... NOTE: The X X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter B are not uniquely estimable. There are 4 cells in this 2-way ANOVA. Because SAS sets the effects for the final level of each factor to zero, the baseline group (i.e. cell mean represented by the intercept) is B=1 and C=1. The output shows this to be 49.333333 and that s the same as the LSmeans output for that cell mean in the model that includes interaction (shown below). Least Squares Means speed geometry life LSMEAN 0 0 26.0000000 0 1 39.6666667 1 0 34.6666667 1 1 49.3333333

/*Fit 2-factor model to high A.*/ data higha; set tool; if angle=1; proc glm data=higha plot=diagnostics; class speed geometry replicate; model life=speed geometry; lsmeans geometry speed; /* The plot generated by the following is the same as that provided by PROC GLM*/ proc gplot data=higha; plot life*speed=geometry; title High angle: 2-way plot for BC; Class Level Information Class Levels Values speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 3 477.5833333 159.1944444 10.38 0.0039 Error 8 122.6666667 15.3333333 Corrected Total 11 600.2500000 Source DF Type III SS Mean Square F Value Pr > F speed 1 216.7500000 216.7500000 14.14 0.0055 geometry 1 216.7500000 216.7500000 14.14 0.0055 speed*geometry 1 44.0833333 44.0833333 2.88 0.1284 When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C). There is a significant negative speed effect, and a significant positive geometry effect (see plot on next page).

Least Squares Means geometry life LSMEAN 0 40.0000000 1 48.5000000 speed life LSMEAN 0 48.5000000 1 40.0000000 When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C). There is a significant negative speed effect, and a significant positive geometry effect.

Suppose the 3-way interaction was significant. How to proceed?... Slice the data? One could get a very similar analysis (with more degrees of freedom for error) by fitting the full model and then slicing by angle (A). proc glm data=tool plot=diagnostics; class angle speed geometry replicate; model life=angle speed geometry; lsmeans angle*geometry*speed/slice=angle; /* slice the full model by angle level*/ Class Level Information Class Levels Values angle 2 0 1 speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 Number of Observations Used 24 Least Squares Means angle speed geometry life LSMEAN 0 0 0 26.0000000 0 0 1 39.6666667 0 1 0 34.6666667 0 1 1 49.3333333 1 0 0 42.3333333 1 0 1 54.6666667 1 1 0 37.6666667 1 1 1 42.3333333 angle*speed*geometry Effect Sliced by angle for life Sum of angle DF Squares Mean Square F Value Pr > F 0 3 854.916667 284.972222 9.45 0.0008 1 3 477.583333 159.194444 5.28 0.0101 }{{} If you compare the Mean Squares in the above slice output, they match the Mean Squares for the two models we fit in the two subsetted analyses, but the F -statistics are different. Why?

The full model (using all the data and all possible terms) provides ˆσ 2 = 30.17 with 16 d.f. for the error (output below): Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 7 1612.666667 230.380952 7.64 0.0004 Error 16 482.666667 30.166667 Corrected Total 23 2095.333333 When we subsetted the data into the Angle low, we found ˆσ 2 = 45.00 with 8 d.f. for the error. When we subsetted the data into the Angle high, we found ˆσ 2 = 15.33 with 8 d.f. for the error. As we have made the assumption that σ 2 is the same across all cell means, the full model estimate of σ 2 is a pooled estimate taken from the two subsetted data sets. They are all estimating the same constant variance σ 2, but we gain in d.f. for the error when we use the pooled estimate. Test for a difference in the four means where Angle held constant at either low or high with α = 0.05 H 0 : µ a 11 = µ a 12 = µ a 21 = µ a 22 vs. H 1 : not H 0 Using the slice option (i.e. using all the data), the threshold for significance is F (0.05,3,16) = 3.23 Using the subsetted data, the threshold for significance is F (0.05,3,8) = 4.07 The threshold for significance is lower when we have more degrees of freedom for error.