In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball various distances. There are several factors that could be adjusted on the statapult that might affect the distance the ball is thrown. For our example, we will look at the following factors and factor levels.
A Taguchi L8 design was run. The response we wished to measure was distance We ran three replicates of each design matrix setup. The Data Entry sheet from DOE Wisdom software is shown here. Now let s talk about the calculations for the Analysis of Variance or ANOVA. The worksheet for our statapult design is shown here.
If we enter our data into a DOE software package and then ask it to model the data, the result is the multiple regression output shown here. From this ANOVA, we can build a model of our experiment. There is obviously a lot of information in our ANOVA. Not only can we build a model, we can also determine how well the prediction equation models our response (distance) in the range of interest. This goodness of fit will be evaluated as a whole and in parts.
To understand the greater part of a regression table, it is important for us to group data in certain ways. We will use two of these groupings to estimate population variances. Population denotes all possible responses at the experimental levels. The data we collect in experiments is just a sample of that population. If we repeated the same experiment we would probably collect a different sample from that same population. We are examining variance because there are some statistical tolls at our disposal that lend themselves directly to comparisons of variances in samples from the same population. This will lead to judgments of significance. For simplicity, let s suppose we have only one factor at 2 levels and eight runs that graphically look like this. Our prediction equation y provides a best fit for the data.
Overlaying the mean of the data points y, we can begin to look at the variance in the data and, using that variance, estimate the variance of all the possible population responses in this factor range. In this figure, we estimate population variance from the variance of the entire data group about y.
In this figure, we estimate population variance by pooling the variances of each of the subgroups about its own mean. (This figure shows Variance within subgroups) In this figure, we estimate population variance by finding the variance of the mean of each subgroup of data about the grand mean y. This figure shows Variance between subgroups. We will use the Variance within subgroups estimate and the Variance between subgroups estimate to judge the significance of the model
and the significance of individual factors (for a two-level design). Using the empirical percentages for a normal distribution, we expect 68.26% of any further
statapult firings to fall inside of y ± MSE or (one standard deviation). In our statapult example, we have three factors (A,B and C) and one response (distance y) This is a four dimensional space that will be difficult to visualize. The three data points collected in a run represent a data subgroup. We have eight data subgroups. If this were a three dimensional space, we would picture these data subgroups floating above and below the appropriate settings for factors A and B. The MSE would represent a distance above and below the surface that we would expect 68.26% of all future values to fall within. We refer to this surface as the Response Surface.
To obtain this "best" estimate of the population variance (MSE), we said that we would somehow pool the variance of subgroups. From a mechanical standpoint, we said we would compute a sum of squares and divide by degrees of freedom. For our example, the computations are shown here. Although it seems hidden in the mechanics, we have actually pooled the variances from each run. MSE is an overall measure of variation. Each subgroup is the set of data collected at a run setting. Since the predicted value for each data subgroup is the mean (for a two-level design), let's examine the SSE for one run. If we divide this by the number of data points for that run minus one, you will see the familiar form of the variance for that run.
Proceeding in this fashion, we can obtain the SSE for a two-level design simply by finding the variance of the responses for each run, multiplying each by (n RUNi 1), and summing them. The calculations for our example are shown here. Notice that the Sum of Squares Error or SSE is also known as the Sum of Squares Residual on our ANOVA.
Since the degrees of freedom have entered our computations, let s take a minute to discuss them. Suppose you know that the total of five numbers is 25. How many free choices do you have in selecting the numbers that will make this happen? We propose that the first four numbers are up to you and that the fifth is predetermined by the sum. Therefore, you have four degrees of freedom. In our catapult example, we had three data points in each run to compute a variance for that run. This gives us two degrees of freedom for each of our runs and a total of 16 degrees of freedom (2 df X 8 runs) for the Sum of Squares Error. Therefore, the Mean Square Error (or MSE) for a two-level factors is shown here. This number appears on our ANOVA.
The remainder of our analysis is a comparison of the population variance estimate between subgroups (Mean Square Between, or MSB, considering all factors, then each separately) with the Mean Square Error. Take another look at the variance between subgroups shown here. It seems reasonable to state that the only time the between estimate will be close to that of the standard error of the estimate will be when there is very little shift in the subgroups about y. If there was very little shift in response between levels of all factors, then the prediction equation would not predict much better than the mean. For each factor s influence considered separately, this would indicate that the factor would have little influence on the response. So, in order for the model to be a good predictor or for a factor to influence the response, the variation between subgroups must be somewhat bigger than the MSE. Note: Somewhat bigger will be defined by the t,z, or F test as a certain measure of confidence that the model has detected a significant shift in the mean, or that a factor should be included in the model.)
The computations of the MSB are as straightforward as those of the MSE. When we look at this between estimate for the overall model, we will refer to it as the Mean Square Regression (MSR). For individual factors, we will use MSB. First, let s attack the MSR. For our example, we will compute a sum of squares and then adjust it with the degrees of freedom we can attribute to the regression model. Once again, the computations have partially hidden what we are doing. For each run you have probably noticed that the predicted value was reported three times one for each data point taken during that run. Therefore, the sum of squares is nothing more than the sum of the number of data points collected during each run times the squared difference between the predicted value for a run and the overall mean.
To use this sum of squares to estimate a population variance, we still need to adjust it with the degrees of freedom. Since our data points for MSR are the predicted values for each run setting, we have 8 data points. The degrees of freedom will be 7 for the regression model. Thus, the equation for MSR is shown here. Now that we have all these estimates for variance, what do we do with them? To answer this question, we need to review three types of distributions: Z distribution (used when the number of
samples is 30), the t distribution (used when the number of samples is <30) and the F distribution (based upon samples taken from normal distributions). In our seminar we introduce a little exercise that helps our students visualize what is going on. We have forty-six small wooden balls in a sack. The balls are labeled with the numbers 0 through 11. They are distributed as shown in this figure. If we overlay the bell-shaped function, it becomes clear that our distribution is approximately normal. The Z and t distributions fall into this category. The Z distribution is shown here. The t distribution is a little flatter, based on the number of samples in the distribution. Most commonly, these distributions are standardized with a mean of zero, a standard deviation of one, and such that the area under the curve is one. That tells us that the horizontal axis will be the number of standard deviation units we are on either side of zero, and that the vertical axis will be a measure of frequencies. Usually we will be interested in the probability of being so many standard deviation units above or below zero. This is computed as area under those curves. For example, if we wish to know the probability of being 1.6 standard deviation units to the right of zero on the Z distribution, we compute the area under the curve from - to 1.6. The good news is that these values are provided by computers or tables.
We now have each student take five balls from the sack, record the numbers, and put them back in the sack. We ask them to compute the variance of their five numbers and divide it by the variance computed by two or three other students. Afterward, each student gives us their ratios. A typical count of the ratios is shown here. Empirically, our students have built an F distribution the graph being visible when we overlay a curve to the right of the stars.
Turning the picture more conventionally, the horizontal axis represents the F ratio and the vertical axis a measure of frequency. Although the F distribution will change with the degrees of freedom of the numerator and denominator of the F ratio, this provides a basis for understanding what is going on. What have we learned? If we compared the variance of samples taken from an approximate normal distribution, we see that most of the ratio values cluster around one and very few are
greater than six. The chance of finding a specific F value is found in a manner similar to the chance of being any number of standard deviation units from zero in the Z distribution. The overall F ratio for our model is shown here. F equals MSR/MSE equals 107.852. This agrees with our ANOVA. We saw in our demonstration that F > 6.0 seemed rare. In fact, we use F = 6.0 as our Rule of Thumb for a cut-off. If F > 6.0, we will say that there is a significant shift in the response of different run settings. That means we don t believe the change in the response at different settings happened by chance; we believe it happened because there is a difference in responses between factor levels. In order to get a handle on this chance, let s look again at a picture of an F distribution. Based on our previous discussion and demonstration, we believe that 107.852 is a large F value located considerably far out on the tail of the distribution. Since we do not believe there is a high probability that would happen by chance, we are saying we think it occurred because the factors shifted the response. Put another way, we do not believe the two estimates of variance came from the same populations. Instead, we think the response actually behaves differently (has different averages) for different factor levels.
Since there is a little bit of the tail to the right of 107.852, there appears to be some chance we might be wrong in assuming the shift in response is due to a change in factor levels. In other words, there is a small probability F= 107.852 could have happened by chance. This chance or risk is known as a type I error or α error. A good example which explains an α error is the decision reached by a jury regarding a defendant. This table summarizes the possibilities:
If the jury appropriately finds an innocent defendant innocent or a guilty defendant guilty, we have no disagreement. In our judicial system we guard against finding an innocent person guilty. We consider that the worst possible error. This type error is called a type I error or α error. The other situation is a type II or β error. For our present analysis, we will only concentrate on the α error. We will choose α = 0.5 (telling us we will be 95% confident that our decision is correct. This selection is arbitrary. Relating this to our problem, let s suppose we wish to be 95% confident that our model detects a linear shift in response due to changes in factor levels. Then we will want the area under the F curve to be greater than or equal to 0.95 for our F = 107.82.
From our rule of thumb, this factor appears significant. The P value supports that conclusion. This concludes the math section on Sum of Squares, Mean Square and F ratio. Exit out of this section to return to the main lesson.