SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

Similar documents
General Factorial Models

General Factorial Models

R-Square Coeff Var Root MSE y Mean

STAT:5201 Applied Statistic II

BIOMETRICS INFORMATION

NCSS Statistical Software. Design Generator

Introduction to Statistical Analyses in SAS

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Lab #9: ANOVA and TUKEY tests

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Modeling Effects and Additive Two-Factor Models (i.e. without interaction)

Factorial ANOVA with SAS

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

Laboratory for Two-Way ANOVA: Interactions

Stat 5303 (Oehlert): Unreplicated 2-Series Factorials 1

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Analysis of Two-Level Designs

Week 6, Week 7 and Week 8 Analyses of Variance

Stat 5303 (Oehlert): Unbalanced Factorial Examples 1

Source df SS MS F A a-1 [A] [T] SS A. / MS S/A S/A (a)(n-1) [AS] [A] SS S/A. / MS BxS/A A x B (a-1)(b-1) [AB] [A] [B] + [T] SS AxB

Section 4 General Factorial Tutorials

An introduction to SPSS

Geometry Pre AP Graphing Linear Equations

5.5 Regression Estimation

STAT 5200 Handout #28: Fractional Factorial Design (Ch. 18)

Factorial ANOVA. Skipping... Page 1 of 18

One Factor Experiments

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

T-test og variansanalyse i SAS. T-test og variansanalyse i SAS p.1/18

Multiple Regression White paper

SAS PROC GLM and PROC MIXED. for Recovering Inter-Effect Information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Multivariate Normal Random Numbers

Module 3: SAS. 3.1 Initial explorative analysis 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE

The same procedure is used for the other factors.

STAT 5200 Handout #25. R-Square & Design Matrix in Mixed Models

Centering and Interactions: The Training Data

Subset Selection in Multiple Regression

STAT 5200 Handout #24: Power Calculation in Mixed Models

5:2 LAB RESULTS - FOLLOW-UP ANALYSES FOR FACTORIAL

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Model Selection and Inference

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Tips on JMP ing into Mixture Experimentation

Chemical Reaction dataset ( )

Multi-Factored Experiments

Example 5.25: (page 228) Screenshots from JMP. These examples assume post-hoc analysis using a Protected LSD or Protected Welch strategy.

Analysis of variance - ANOVA

Dealing with Categorical Data Types in a Designed Experiment

Geometric Modeling. Mesh Decimation. Mesh Decimation. Applications. Copyright 2010 Gotsman, Pauly Page 1. Oversampled 3D scan data

.(3, 2) Co-ordinate Geometry Co-ordinates. Every point has two co-ordinates. Plot the following points on the plane. A (4, 1) D (2, 5) G (6, 3)

Getting Correct Results from PROC REG

Data Management - 50%

SAS/STAT 13.1 User s Guide. The NESTED Procedure

Introductory Guide to SAS:

The Solution to the Factorial Analysis of Variance

The Kenton Study. (Applied Linear Statistical Models, 5th ed., pp , Kutner et al., 2005) Page 1 of 5

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Cell means coding and effect coding

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

More on Experimental Designs

For our example, we will look at the following factors and factor levels.

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Equivalence Tests for Two Means in a 2x2 Cross-Over Design using Differences

Computer Experiments: Space Filling Design and Gaussian Process Modeling

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Intermediate SAS: Statistics

Statistical Bioinformatics (Biomedical Big Data) Notes 2: Installing and Using R

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Contrasts and Multiple Comparisons

2014 Stat-Ease, Inc. All Rights Reserved.

IE 361 Exam 1 October 2005 Prof. Vardeman Give Give Does Explain What Answer explain

UNC Charlotte 2010 Comprehensive

Stat 5100 Handout #15 SAS: Alternative Predictor Variable Types

DISTANCE FORMULA: to find length or distance =( ) +( )

Applied Multivariate Analysis

For Additional Information...

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

The NESTED Procedure (Chapter)

2 Geometry Solutions

6:1 LAB RESULTS -WITHIN-S ANOVA

Analysis of Variance in R

Regression Analysis and Linear Regression Models

Get Ready. Solving Equations 1. Solve each equation. a) 4x + 3 = 11 b) 8y 5 = 6y + 7

1. The Pythagorean Theorem

Repeated Measures Part 4: Blood Flow data

Chapter 8. Interval Estimation

Design and Analysis of Experiments Prof. Jhareswar Maiti Department of Industrial and Systems Engineering Indian Institute of Technology, Kharagpur

Exercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:

Multivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)

Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009

STAT:5400 Computing in Statistics

Zero-Inflated Poisson Regression

Stat 5303 (Oehlert): Response Surfaces 1

Finite Math - J-term Homework. Section Inverse of a Square Matrix

Transcription:

STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape B=1) Factor C: speed (low=0/high=1) Response: Life of machine in tool hours. An engineer is interested in the effects of cutting angle (A), tool geometry (B), and cutting speed (C) on the life (in hours) of a machine tool. Three runs are done for each combination of factor levels, and all runs are done in random order. This is a completely randomized design (CRD). { D.C. Montgomery (2005). Design and analysis of experiments. John Wiley & Sons: USA. } SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/ data tool; do angle = 0,1; do geometry = 0,1; do speed = 0,1; do replicate = 1 to 3; input life @@; output; end; end; end; end; datalines; 22 31 25 32 43 29 35 34 50 55 47 46 44 45 38 40 37 36 60 50 54 39 41 47 ; proc print data=tool; Obs angle geometry speed replicate life 1 0 0 0 1 22 2 0 0 0 2 31 3 0 0 0 3 25 4 0 0 1 1 32 5 0 0 1 2 43 6 0 0 1 3 29 7 0 1 0 1 35 8 0 1 0 2 34 9 0 1 0 3 50 10 0 1 1 1 55

11 0 1 1 2 47 12 0 1 1 3 46 13 1 0 0 1 44 14 1 0 0 2 45 15 1 0 0 3 38 16 1 0 1 1 40 17 1 0 1 2 37 18 1 0 1 3 36 19 1 1 0 1 60 20 1 1 0 2 50 21 1 1 0 3 54 22 1 1 1 1 39 23 1 1 1 2 41 24 1 1 1 3 47 proc glm data=tool plot=diagnostics; class angle geometry speed replicate; model life=angle geometry speed; /* Full model fits a separately fit cell mean */ Partial output: Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 7 1612.666667 230.380952 7.64 0.0004 Error 16 482.666667 30.166667 Corrected Total 23 2095.333333 Dependent Variable: life Source DF Type III SS Mean Square F Value Pr > F angle 1 280.1666667 280.1666667 9.29 0.0077 geometry 1 770.6666667 770.6666667 25.55 0.0001 angle*geometry 1 48.1666667 48.1666667 1.60 0.2245 speed 1 0.6666667 0.6666667 0.02 0.8837 angle*speed 1 468.1666667 468.1666667 15.52 0.0012 geometry*speed 1 16.6666667 16.6666667 0.55 0.4681 angle*geometry*speed 1 28.1666667 28.1666667 0.93 0.3483 <--- The diagnostic plots look OK, and the 3-way interaction is not significant here, so that term could be removed from the model (which places it in the error term).

According to the Type III ANOVA table, the 2-way interaction between angle (A) and speed (C) is significant, and the other 2-way interactions are not significant (AB and BC). We will look at the marginal 2-way interaction plot for each combination of factors AB, AC, and BC (these plots average over replicates in a cell and over the levels of the unplotted factor)... Source DF Type III SS Mean Square F Value Pr > F angle*geometry 1 48.1666667 48.1666667 1.60 0.2245 angle*speed 1 468.1666667 468.1666667 15.52 0.0012 geometry*speed 1 16.6666667 16.6666667 0.55 0.4681 angle*geometry*speed 1 28.1666667 28.1666667 0.93 0.3483 /* Look at the marginal 2-way interaction plots.*/ symbol1 interpol=std1mj value=star line=1 color=black; symbol2 interpol=std1mj value=diamond line=2 color=blue; proc gplot data=tool; plot life*angle=geometry/haxis=-.5 to 1.5; title "AB interaction (averaged across third factor)"; proc gplot data=tool; plot life*angle=speed/haxis=-.5 to 1.5; title "AC interaction (averaged across third factor)"; proc gplot data=tool; plot life*speed=geometry/haxis=-.5 to 1.5; title "BC interaction (averaged across third factor)";

The type of interaction in the AC plot causes concern for making global statements about the main effects for angle (A) and speed(c), and this interaction is statistically significant. When angle is low (far left side), speed has a positive effect on life, and when angle is high (far right side), speed has a negative effect on life. The minimal model should include: A, B, C, AC (following the hierarchy principle).

Suppose the 3-way interaction was significant. How to proceed?... Subset data? One could proceed by considering a separate two-factor factorial model for each level of angle that includes speed and geometry. /*Fit 2-factor model for low A.*/ data lowa; set tool; if angle=0; proc glm data=lowa plot=diagnostics; class speed geometry replicate; model life=speed geometry; lsmeans geometry speed; /* The plot generated by the following is the same as that provided by PROC GLM*/ proc gplot data=lowa; plot life*speed=geometry; title Low angle: 2-way plot for BC; Class Level Information Class Levels Values speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 3 854.916667 284.972222 6.33 0.0166 Error 8 360.000000 45.000000 Corrected Total 11 1214.916667 Source DF Type III SS Mean Square F Value Pr > F speed 1 252.0833333 252.0833333 5.60 0.0455 geometry 1 602.0833333 602.0833333 13.38 0.0064 speed*geometry 1 0.7500000 0.7500000 0.02 0.9005 When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C) (see plot next page). There is a significant positive speed effect, and a significant positive geometry main effect.

Provided from PROC GLM. Least Squares Means geometry life LSMEAN 0 30.3333333 1 44.5000000 speed life LSMEAN 0 32.8333333 1 42.0000000 When angle is set to the low level, there is no significant interaction between geometry (B) and speed (C). There is a significant positive speed effect, and a significant positive geometry main effect.

If you d like to get the estimates for the parameters in the model that you fitted, you can request them with the solution option in the model statement. But I think, in this case, the means are probably easier to interpret to a client. proc glm data=lowa plot=diagnostics; class speed geometry replicate; model life=speed geometry/solution; lsmeans geometry speed; lsmeans geometry*speed; Standard Parameter Estimate Error t Value Pr > t Intercept 49.33333333 B 3.87298335 12.74 <.0001 speed 0-9.66666667 B 5.47722558-1.76 0.1156 speed 1 0.00000000 B... geometry 0-14.66666667 B 5.47722558-2.68 0.0280 geometry 1 0.00000000 B... speed*geometry 0 0 1.00000000 B 7.74596669 0.13 0.9005 speed*geometry 0 1 0.00000000 B... speed*geometry 1 0 0.00000000 B... speed*geometry 1 1 0.00000000 B... NOTE: The X X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter B are not uniquely estimable. There are 4 cells in this 2-way ANOVA. Because SAS sets the effects for the final level of each factor to zero, the baseline group (i.e. cell mean represented by the intercept) is B=1 and C=1. The output shows this to be 49.333333 and that s the same as the LSmeans output for that cell mean in the model that includes interaction (shown below). Least Squares Means speed geometry life LSMEAN 0 0 26.0000000 0 1 39.6666667 1 0 34.6666667 1 1 49.3333333

/*Fit 2-factor model to high A.*/ data higha; set tool; if angle=1; proc glm data=higha plot=diagnostics; class speed geometry replicate; model life=speed geometry; lsmeans geometry speed; /* The plot generated by the following is the same as that provided by PROC GLM*/ proc gplot data=higha; plot life*speed=geometry; title High angle: 2-way plot for BC; Class Level Information Class Levels Values speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 3 477.5833333 159.1944444 10.38 0.0039 Error 8 122.6666667 15.3333333 Corrected Total 11 600.2500000 Source DF Type III SS Mean Square F Value Pr > F speed 1 216.7500000 216.7500000 14.14 0.0055 geometry 1 216.7500000 216.7500000 14.14 0.0055 speed*geometry 1 44.0833333 44.0833333 2.88 0.1284 When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C). There is a significant negative speed effect, and a significant positive geometry effect (see plot on next page).

Least Squares Means geometry life LSMEAN 0 40.0000000 1 48.5000000 speed life LSMEAN 0 48.5000000 1 40.0000000 When angle is set to the high level, there is no significant interaction between geometry (B) and speed (C). There is a significant negative speed effect, and a significant positive geometry effect.

Suppose the 3-way interaction was significant. How to proceed?... Slice the data? One could get a very similar analysis (with more degrees of freedom for error) by fitting the full model and then slicing by angle (A). proc glm data=tool plot=diagnostics; class angle speed geometry replicate; model life=angle speed geometry; lsmeans angle*geometry*speed/slice=angle; /* slice the full model by angle level*/ Class Level Information Class Levels Values angle 2 0 1 speed 2 0 1 geometry 2 0 1 replicate 3 1 2 3 Number of Observations Used 24 Least Squares Means angle speed geometry life LSMEAN 0 0 0 26.0000000 0 0 1 39.6666667 0 1 0 34.6666667 0 1 1 49.3333333 1 0 0 42.3333333 1 0 1 54.6666667 1 1 0 37.6666667 1 1 1 42.3333333 angle*speed*geometry Effect Sliced by angle for life Sum of angle DF Squares Mean Square F Value Pr > F 0 3 854.916667 284.972222 9.45 0.0008 1 3 477.583333 159.194444 5.28 0.0101 }{{} If you compare the Mean Squares in the above slice output, they match the Mean Squares for the two models we fit in the two subsetted analyses, but the F -statistics are different. Why?

The full model (using all the data and all possible terms) provides ˆσ 2 = 30.17 with 16 d.f. for the error (output below): Dependent Variable: life Sum of Source DF Squares Mean Square F Value Pr > F Model 7 1612.666667 230.380952 7.64 0.0004 Error 16 482.666667 30.166667 Corrected Total 23 2095.333333 When we subsetted the data into the Angle low, we found ˆσ 2 = 45.00 with 8 d.f. for the error. When we subsetted the data into the Angle high, we found ˆσ 2 = 15.33 with 8 d.f. for the error. As we have made the assumption that σ 2 is the same across all cell means, the full model estimate of σ 2 is a pooled estimate taken from the two subsetted data sets. They are all estimating the same constant variance σ 2, but we gain in d.f. for the error when we use the pooled estimate. Test for a difference in the four means where Angle held constant at either low or high with α = 0.05 H 0 : µ a 11 = µ a 12 = µ a 21 = µ a 22 vs. H 1 : not H 0 Using the slice option (i.e. using all the data), the threshold for significance is F (0.05,3,16) = 3.23 Using the subsetted data, the threshold for significance is F (0.05,3,8) = 4.07 The threshold for significance is lower when we have more degrees of freedom for error.