Unit 1: The One-Factor ANOVA as a Generalization of the Two-Sample t Test

Size: px
Start display at page:

Download "Unit 1: The One-Factor ANOVA as a Generalization of the Two-Sample t Test"

Transcription

1 Minitab Notes for STAT 6305: Analysis of Variance Models Department of Statistics and Biostatistics CSU East Bay Unit 1: The One-Factor ANOVA as a Generalization of the Two-Sample t Test 1.1. Data and Worksheet Preparation Consider two randomly chosen samples of bottles of a particular drug. Bottles in Group 1 are chosen from current production, those in Group 2 have been stored under regulated conditions for one year. There are 10 bottles in each group. The potency of each bottle is assayed and recorded. The issue is whether potency of the population of year-old bottles is the same as for the population of the ones currently being made. The potency data are as shown below: These data are from Table 6.1 (page 294) Ott and Longnecker: An Introduction to Statistical Methods and Data Analysis 6th ed., Duxbury, One way to put these data into a Minitab worksheet is to "cut and paste" from this unit. Be sure Minitab commands are "enabled" before you start. The goal is to make the Session Window look as shown below by using the bulleted instructions. "Enable commands" in the Minitab Session Window using the EDITOR menu. (First, activate the Session window by clicking anywhere within it; you cannot modify the Session window when a Worksheet is active. Second, be sure to use the EDITOR menu, not EDIT.) Type the first two lines below (the ones with the name and set commands). The DATA> prompt should appear automatically at the beginning of the third line. In the third line, do the following instead of typing the data: In your browser, highlight the data for Group 1, and "cut" these 10 observations using CTRL-C. In the Minitab Session Window, make sure the cursor follows the DATA> prompt and "paste" the data with CTRL-V. Then press ENTER. (It's OK if the spacing is a little different than you see below, but make sure that you captured all 10 observations.) Similarly, cut and paste the data for Group 2 into the fourth line. Finally, type end on the fifth line to signal that data entry for c1 is complete. MTB > name c1 'Potency' MTB > set c1 DATA> 10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6 DATA> 9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9 DATA> end Now display the data in c1 using either the menu path or the command shown below: DATA > Display Data MTB > print c1 Group 1: 10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6 Group 2: 9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9

2 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-2 This produces a (horizontal) printout of the 20 observations in c1. Also look in the worksheet to see the data there. Next we need a column of "subscripts" in c2 to show which observations come from which group. Name c2 'Group' either with a command or by typing the name directly into the worksheet. Then enter the subscripts using either the menus (bold type) or the set command. Type Group atop column 2 in the Worksheet CALC > Patterned Data, Simple, values from 1 to 2, each individual value repeated 10 times MTB > name c2 'Group' MTB > set c2 DATA> (1:2)10 DATA> end This way of organizing data, with all observations in a single column and groups designated in a separate column of subscripts, is called "stacked" format in Minitab (and in some other software). For such a small dataset you could just type the 20 'Potency' determinations and the 20 'Group' numbers directly into the worksheet. However, when using documents in DOC, PDF, or HTML format, you may find it convenient to learn (i) to cut and paste data into a worksheet and (ii) to use the "patterned data" features of the set command. It is best to start learning with the current relatively simple data to do these two things. Once you have entered the data into a worksheet, you should always proofread your work before continuing. You can do this either by printing the data to the Session window (using the print command) or by looking directly at the Worksheet. Proofreading should become an automatic part of your data entry. Beyond the first few units these notes will not always be reminded to proofread, but do so anyway. Problems Here is an alternate way to prepare the worksheet. Follow through the steps, cutting and pasting data where appropriate. What menu choices would produce the same results? [Look at the DATA menu.] Explain what each command does. Compare c13 and c14 with c1 and c2. MTB > name c11 'Fresh' c12 'Stored' MTB > set c11 DATA> 10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6 DATA> end MTB > set c12 DATA> 9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9 DATA> end MTB > stack c11 c12 c13; SUBC> subs c In the process of working Problem you put the data for each group into a separate column (c11 and c12). Data in separate columns are said to be in "unstacked" format. Look at the DATA menu and figure out how the stacked data in c1 can be put into unstacked format using the subscripts in c2. (Use the column names c21 'New' and c22 'Old' for this.) What command/ subcommand combination could you use to unstack the data, without the help of the menus? (Minitab is a command-based package. The menus are sometimes a convenient way to generate the commands, which then appear in the Session window when the command language is enabled.)

3 1.2. Descriptive Methods Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-3 Whenever possible, data analysis should begin with descriptive methods, both numerical and graphical. Here, it seems clear from the dotplot below that there is a tendency for the potency of stored bottles to be less than the potency of the fresh ones: Group 1 (mean above 10.25) has generally higher values than Group 2 (mean below 10.00). GRAPH Dotplot With Groups (Makes 'professional' graphic display, different from the one shown below.) MTB > gstd (Puts Minitab into 'standard' graphics mode; use gpro to return to 'professional' graphics.) MTB > dotp c1; SUBC> by c2. Group 1 Group 2.. :.. : Potency. :. :. : Potency Now we compute numerical descriptive statistics, broken out by the subscript variable in c2 into two groups. STAT > Basic > Descriptive statistics, 'by variable' option MTB > describe c1; SUBC> by c2. Variable Group N N* Mean SE Mean StDev Minimum Q1 Median Potency Variable Group Q3 Maximum Potency Problems Minitab makes graphical displays in one of two formats: Standard (or Character) graphics. These are composed of text symbols and appear in the Session window. They have relatively low resolution, but they are easy to paste into reports using a work processor. They also help to keep file sizes small. (Be sure to use a monospace font such as Courier and to proofread to make sure the graph looks the same after pasting as it did before cutting from Minitab.) We often show standard graphics in these notes. To activate standard graphics, use the command gstd and then issue the command for the kind of graph desired. Standard graphics are not available from menus. Professional (or Pixel) graphics. These are true graphic images using Windows technology. They appear in separate boxes on your screen, not in the Session window. These images can be saved in a variety of graphics formats. They can be included as graphic images on the web and can be imported into word processing and desk-top publishing documents. They greatly increase the file size of documents that incorporate them. Minitab starts in professional graphics mode. To re-activate professional graphics after using character graphics, use the command gpro.

4 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-4 Illustrate both types of graphics by making boxplots as follows: MTB > gstd MTB > boxp c1; SUBC> by c2. MTB > gpro MTB > boxp c1 * c2 MTB > dotp c1 * c2 (Also accessible via menus.) Comment on the results as follows: (a) Do the boxplots show the differences between the two groups as clearly as do the dotplots? More clearly? Defend your answer. (b) Look at one of the dotplots above. Can you see exactly how many data points are represented? Now look at one of the boxplots above. Can you see how many data points are represented? (c) Minitab's boxplots sometimes indicate the presence of outliers. Are outliers indicated for either of our groups? (d) What descriptive statistics are used in making boxplots? (e) Describe the differences between standard-graphics and professional-graphics boxplots. [The two styles of boxplots use slightly different rules for computing quartiles. Particularly with small sample sizes, these differences may be noticeable.] (f) We have given several commands above. What menu choices can be used to produce professional-style boxplots? In R one prepares vectors for potencies of Stored and Fresh samples, finds descriptive statistics for each group, combines data into a single vector of potencies with a corresponding categorical vector of sample types, and makes stripcharts and boxplots of the data as shown below. Execute the code and show the results, and compare with corresponding results obtained in Minitab. (Note: The function as.factor designates typ as a categorical rather than a numerical variable. In this unit, the distinction in variable types is not always important because typ takes only two values. For some procedures, this distinction becomes crucial if an intended categorical variable takes more than two values.) fresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6) stored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9) potency = c(fresh, stored); n1 = length(fresh); n2 = length(stored) typ = as.factor(c(rep(1, times=n1), rep(2, times=n2))) summary(fresh); sd(fresh) summary(stored); sd(stored) par(mfrow=c(1,2)) # puts two graphs on one page stripchart(potency ~ typ, method="stack", vertical=t) boxplot(potency ~ typ) par(mfrow=c(1,1)) # return to default one graph per page

5 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit Comparing a t Test with a One-Factor ANOVA The descriptive methods in Section 1.2 strongly suggest that fresh samples of the drug tend to be more potent than stored ones. Now we look at several different ways to confirm this impression with formal statistical tests. That is, we test H 0 : the 2 groups have equal potency against H a : the 2 groups have different potencies. The first of these is the two-tailed, pooled two-sample t test. The command for a two-sample t test on stacked data is twot. Minitab defaults for two-sample t tests: The two-tailed (or two-sided) alternative is the default; one-sided alternatives require the subcommand alternative followed by either 1 (right-sided alternative) or -1 (left-sided). The separate variances ("t-prime") test is the default. Pooling requires the subcommand pool. Computer simulation results have established that the separate variances test is often preferable for two-sample tests. Here we use the pooled test because it generalizes more readily to the ANOVA methods of these notes. Note on stacked vs. unstacked data: The command twosample would be used if the potency measurements for the two groups had been entered into two separate columns one for Fresh and one for Stored. Such "unstacked" data are seldom used for computer analysis outside of elementary statistics classes. Minitab is one of the few serious computer packages that makes direct use of unstacked data and, even then, only for a few elementary procedures. STAT > Basic > 2-sample t, one column, assume equal variances MTB > twot c1 c2; SUBC> pool. Two-sample T for Potency Group N Mean StDev SE Mean Difference = mu (1) - mu (2) Estimate for difference: % CI for difference: ( , ) T-Test of difference = 0 (vs not =): T-Value = 4.24 P-Value = DF = 18 Both use Pooled StDev = We see (from the very small P-value) that the difference between the two groups is very highly significant. This is what we guessed would be the case from looking at the dotplots above. Either the Fresh samples were originally manufactured to have a higher potency or the potency of the Stored samples deteriorated with a year of storage. (Or perhaps a combination of these two mechanisms.) The one-factor or one-way ANOVA design (also sometimes called the "completely randomized design") is a generalization of the two-sided, pooled two-sample t test that can handle more than two groups. Thus, when it is applied to only two groups, its result should agree with that of the t test.

6 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-6 STAT ANOVA Oneway MTB > oneway c1 c2 (Alternatively: MTB > onew 'Potency' 'Group') One-way ANOVA: Potency versus Group Source DF SS MS F P Group Error Total S = R-Sq = 49.93% R-Sq(adj) = 47.15% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ( *------) (------* ) Pooled StDev = The P-value for both the t test and the one-way ANOVA is Depending on the release of Minitab this may be printed as (meaning less than ) or rounded to four places, for example: The square of a t-distributed random variable with 18 df is an F-distributed random variable with 1 df in the numerator and 18 df in the denominator. In fact, the squares of the.025 values for t(ν) are the.05 values for F(1, ν), as you can verify by looking at tables. [Upon squaring, the negative (left) and positive (right) tails of t both go into the right tail of F: =.05.] Also, the square of the t-statistic obtained in our t test above is the F-statistic in our ANOVA: = Note: In Minitab, the oneway procedure is the simplest of several ways to perform a one-way ANOVA on stacked data. This command requires column identifiers such as c1 and c2, or 'Potency' and 'Group' (column names inside single quotes). It does only one-way ANOVAs, and provides separate confidence intervals for each level (Fresh or Stored) of the single factor (Group). Problems: For a two-sample design with n = 10 observations in each group and a fixed significance level α =.05, find the critical values for the two-sided pooled t test and the F test discussed above. Use Minitab's invcdf command: MTB > invcdf 0.975; SUBC> t 18. MTB > invcdf 0.95; SUBC> F Compare your results with tables in your text. Verify that the square of the critical value for t is the critical value for F. In this problem, why do you need to use for the t distribution and 0.95 for the F distribution? (For each distribution, draw a sketch and shade in the area corresponding to probability 0.05.)

7 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-7 [Recall that the cumulative distribution function (cdf) F(x) of a random variable X is P(X x). Thus, the inverse cdf function for a particular value y gives the value c such that P(X c) = y. The inverse cdf function is sometimes called the quantile function.] Consider a balanced two-sample design in which each group has n observations. Let the group totals be T 1 and T 2, and denote the grand total of all observations as T 1 + T 2 = G. Express the formulas for both the pooled t-statistic and the F-statistic in terms of this notation. Then use simple algebra to verify that the F-statistic is the square of the t-statistic Starting with the same four lines of R code as in (in green below), one can perform the pooled two-sample t test and the one-way ANOVA as follows. Show the results and compare with the corresponding Minitab results. (The lines in green need not be repeated in a continuous R session.) fresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6) stored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9) potency = c(fresh, stored); n1 = length(fresh); n2 = length(stored) typ = as.factor(c(rep(1, times=n1), rep(2, times=n2))) t.test(potency ~ typ, var.equal=t) anova(lm(potency ~ typ)) 1.4. More-General Procedures Minitab's general anova procedure will handle a great variety of ANOVA models, many of which we shall study in these notes. With commands: designate the response variable (Potency here), followed by an equal sign, followed by the design or independent variables containing subscripts (here only one, 'Group'). Use of single quotes (apostrophes) around variable names is optional (unless the first character of the name is a number or a symbol). With Windows menus: you must select the response variable in one dialog box and the subscript variables that specify the model in another. (For now, ignore the box for "random" factors.) For more complicated designs than the completely randomized design, ANOVA will handle only balanced situations, i.e., only designs where each treatment (or treatment combination) has the same number of replications. Because it is programmed to handle such a wide variety of ANOVA designs, the general ANOVA procedure does not provide confidence intervals. STAT > ANOVA > Balanced, select 'Potency' as Response, 'Group' as Model MTB > anova Potency = Group Factor Type Levels Values Group fixed 2 1, 2

8 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-8 Analysis of Variance for Potency Source DF SS MS F P Group Error Total S = R-Sq = 49.93% R-Sq(adj) = 47.15% Finally, the GLM procedure (stands for "general linear model") has the same syntax as ANOVA. It requires more intensive computation and more computer memory (perhaps noticeable with large datasets and complex designs), can handle unbalanced cases, uses a regression approach, and automatically warns us about "unusual" observations. For more complex designs the two procedures have somewhat different options and capabilities. STAT > ANOVA > General linear model MTB > glm Potency = Group Factor Levels Values Group Analysis of Variance for Potency Source DF Seq SS Adj SS Adj MS F P Group Error Total S = R-Sq = 49.93% R-Sq(adj) = 47.15% Unusual Observations for Potency Obs. Potency Fit Stdev.Fit Residual St.Resid R R denotes an obs. with a large st. resid. Technical note: Because Group and Error correspond to orthogonal subspaces of the 20-dimensional vector space of observations, the Sequential and Adjusted Sums of Squares are identical for our data. Problems: The GLM procedure indicates that observation #5 is unusual. Minitab's criterion for calling an observation unusual is based on Studentized residuals of absolute value greater than 2. So this observation with its value of 2.11 is borderline. (We will not go into the computations involved in finding Studentized residuals. Very roughly, the idea is that this observation is relatively far from the mean of the rest of the observations in its group.) In this ANOVA, the (ordinary) residual of an observation is its difference from its group means. Using menus, in the one-way ANOVA procedure select the option to store residuals. Verify the values of the residuals for observations #1, #5, and #11 of the stacked data by hand. Make a box plot of the residuals. Does it indicate any outliers?

9 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit Use the menu path STAT > Basic statistics > Normality test to test the null hypothesis that the residuals fit a normal distribution (against the alternative that they are not normal). In the resulting normal probability plot, normal residuals should nearly fit a straight line. Do ours? What is the P- value of the Anderson-Darling test of normality? Test the hypothesis that the two groups come from populations with equal variances against the two-sided alternative. Use the cdf command to find the P-value of this test. Alternatively, look at the menu path STAT > Basic statistics > 2 variances for this test. (This test is known to have poor power; that is, to fail to reject the null hypothesis even when population variances differ.) 1.5. Traditional Nonparametric Alternatives Here we mention several nonparametric tests. You should read the descriptions of them in your text. In Windows, all menu paths for Minitab's implementations of these tests begin with STAT > Nonparametric. The nonparametric alternative to the two-sample t test is the Mann-Whitney-Wilcoxon test (command mann). It works only for unstacked data. Both of the nonparametric alternatives to the general one-way ANOVA are programmed to be used with stacked data: the Mood test (Minitab command mood) and the Kruskal-Wallis test (Minitab command kruskal). The Kruskal-Wallis test is a generalization of the Mann- Whitney-Wilcoxon test in the same sense that the one-way ANOVA is a generalization of a pooled two-sample t test. Unlike the t test and ANOVA, none of these nonparametric tests assume normal data. They all test null hypotheses about equal population medians (rather than means). Like their normal-theory counterparts, these nonparametric tests assume that: The data are random samples from their respective populations, The data for different levels (e.g., Fresh and Stored groups) are independent of one another, The population dispersions are equal. For the normal tests, the specific form of the "equal dispersion" assumption is that variances are equal. For the nonparametric tests, it is that all population distributions are of the same shape, differing (if at all) only by a translation that shifts the entire distribution along with the value of the median. The populations are continuous to the extent necessary to avoid "ties" (repeated values). Normal theory tests usually work quite well unless rounding (or some other process) has produced severe granularity (many clumps of repeated values. Nonparametric tests require approximate "correction" procedures to adjust for any ties that may be present due to rounding. There is no evidence that our present data are other than normally distributed. For example, the dotplots and boxplots show no marked skewness or probable outliers. Even so, you should experiment with the nonparametric procedures kruskal and mood to see how they work. Here, they yield the same conclusion as the normal theory tests: the potency of the stored bottles is less than for the fresh ones.

10 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-10 Problems: Theoretically, for continuous data, there should be no ties at all. In reality, we are always dealing with rounded data, so ties may be present. (For example, truly distinct values and would both be recorded here as "tied" at 10.2; even with two-decimal accuracy both would be recorded as ) Looking at the 20 observations in our dataset, do you find any ties? If so, how many observations are involved in ties? The W-statistic reported in the output of Minitab's implementation mann of the Mann- Whitney-Wilcoxon test is computed as follows: consider all of the data in both groups as a whole, find the ranks of these observations, and find the sum of the ranks of the observations in Group 1. A small value of W indicates that Group 1 comes from a population with a smaller median than Group 2; a large value indicates that the population median for Group 1 may be larger. (a) Under the null hypothesis that the two populations are the same, the expected value of W can be shown to be µ W = n 1 (n 1 + n 2 + 1)/2. What is this value for our data? (b) Assume that c5, c6, and c7 are empty columns, that the stacked data are in c1 and that the subscripts are in c2. Then the following Minitab commands can be used to illustrate how W is computed: MTB > rank c1 c5 MTB > unstack c5 c6 c7; SUBC> subs c2. MTB > sum c6. Go through these steps carefully, looking at the worksheet after each step and making sure you understand what each step does. Then unstack the data and use the mann command to perform the Mann-Whitney-Wilcoxon test. Compare the value of W with your computations above. Carefully compare the interpretation of this nonparametric test with the interpretation of the t test and the ANOVA above? Justify your answer In the Stored group change observed potency 9.5 to 2.0. (Maybe a stored sample gets damp and loses nearly all its potency. As a result, the group means become more different than for the real data.) What change does this make in the results of the pooled 2-sample t test? What change does this make in the results of the Wilcoxon test? In R the Mann-Whitney-Wilcoxon test is performed, on the original data, as shown below (notice the two variations, one with what Minitab would call stacked data and one with unstacked data). Compare the results with Minitab output. fresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6) stored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9) potency = c(fresh, stored); n1 = length(fresh); n2 = length(stored) typ = as.factor(c(rep(1, times=n1), rep(2, times=n2))) wilcox.test(potency ~ typ) wilcox.test(fresh, stored) 1.6. Additional Nonparametric Procedures One kind of nonparametric test is done by using a rank transform. Each observation in c1 (Potency) of the worksheet is replaced, in c5 by its rank (RankPote). Then a standard t-test or ANOVA is done on the ranked data.

11 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-11 MTB > rank c1 c5 MTB > name c5 'RankPote' MTB > twot c5 c2; SUBC> pool. Two-Sample T-Test and CI: RankPote, Group Two-sample T for RankPote Group N Mean StDev SE Mean Difference = mu (1) - mu (2) Estimate for difference: % CI for difference: ( , ) T-Test of difference = 0 (vs not =): T-Value = 4.33 P-Value = DF = 18 Both use Pooled StDev = There is no reason to believe the residuals from rank-transformed data are normal. But if the original data are far from normal (say there are a couple of far outliers among the residuals, one low and one high), then the transformed data may be more nearly normal than the original data. At best, the t statistic is only roughly normal, so the P-value is only approximate. But in this case, the P-value is not distinguishable from the P-value of the t test on the original data. (The CI from this procedure is difficult to interpret and is best ignored.) With a rank transformation, as with any transformation of the data, care has to be taken in interpreting estimates and confidence intervals. These are on the rank scale, not on the original potency scale. So rank-transformed data are more convenient for doing a test than for doing estimation. Yet another nonparametric procedure is the permutation test. Under the null hypothesis that the two groups are the same, any permutation of the Potency data in c1 is as likely as any other. If we could compute the t statistic corresponding to each of the 20! permutations of the data, computing as if the first 10 were Fresh and the last 10 were Stored, then we would get an empirical distribution of the t statistic. In practice, there are too many possible permutations to carry out this computational task, but we can get a pretty good idea of this empirical distribution by looking at a large number of randomly chosen permutations. That is what the program below in R does. Again here, the conclusion is that the P-value (area outside the vertical blue lines in the plot below) is very small, here about P = Also, we see that the empirical permutation distribution of the ' t values' generated is pretty close to a Student's t distribution with df = 18. fresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6) stored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9) potency = c(fresh, stored); n1 = length(fresh); n2 = length(stored) typ = as.factor(c(rep(1, times=n1), rep(2, times=n2))) m = 10000; d = numeric(m) for (i in 1:m) { xp = sample(potency, n1 + n2) # permutation of n1 + n2 potency values d[i] = mean(xp[typ==1]) - mean(xp[typ==2]) }

12 d.data = mean(fresh) - mean(stored); d.data mean(abs(d) >= abs(d.data)) hist(d, col="lightgrey") abline(v = d.data, col="blue", lwd=2) abline(v = -d.data, col="blue", lwd=2, lty="dashed") > d.data = mean(fresh) - mean(stored); d.data [1] 0.54 > mean(abs(d) >= abs(d.data)) [1] 8e-04 Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-12 In order to get a confidence interval without making distributional assumptions, one can do a bootstrap procedure. It is based on the idea that all the information we have about the populations corresponding to Fresh and Stored samples in contained in the samples themselves. By repeatedly sampling (with replacement) from the samples, viewing them as pseudo-populations, one can get a good idea of the variability of the difference in sample means. The resulting 95% nonparametric bootstrap CI is (0.31, 0.78), which is a little shorter than the CI (0.27, 0.81) from the t procedure, and noticeably shorter than the CI from the rank-based Mann-Whitney-Wilcoxon procedure. Because of the relatively small sample sizes of the groups, the accuracy of the bootstrap CI is in some doubt. (Often bootstrap CIs based on small samples are too short.) This example is shown just to introduce the idea of the bootstrap. Because there is no evidence of nonnormality, the t procedure is preferred. fresh = c(10.2, 10.5, 10.3, 10.8, 9.8, 10.6, 10.7, 10.2, 10.0, 10.6) stored = c(9.8, 9.6, 10.1, 10.2, 10.1, 9.7, 9.5, 9.6, 9.8, 9.9) potency = c(fresh, stored); n1 = length(fresh); n2 = length(stored) typ = c(rep(1, times=n1), rep(2, times=n2)) m = 10000; d = numeric(m) for (i in 1:m) { b.fresh = sample(fresh, n1, repl=t) b.stored = sample(stored, n1, repl=t) d[i] = mean(b.fresh) - mean(b.stored) }

13 qnt = quantile(d, c(.975,.025)) boot.ci = 2*(mean(fresh)-mean(stored)) - qnt boot.ci hist(d, col="lightgrey") abline(v = boot.ci, col="blue", lwd=2) > boot.ci 97.5% 2.5% Minitab Notes for STAT 6305 One-Factor ANOVA & 2-Sample t Unit 1-13 Notice that the traditional nonparametric procedures and the test on rank-transformed data lose information by considering ranks, whereas the permutation test uses precisely the observed data. Problems: Use the altered data of problem (a) What change does altering the data make in the results of the pooled 2-sample t test of the rank-transformed data? (b) What change does this make in the results of the permutation test? (c) What change in the bootstrap CI? (Hint: in making the histogram for the bootstrap distribution, use the parameter breaks=100 to force a more detailed look at the results.) (d) How do you account for the unusual appearance of the permutation and bootstrap distributions. Minitab Notes for Statistics 6305: ANOVA Models by Bruce E. Trumbo, Department of Statistics, CSU East Bay, East Bay CA, Copyright 1991, 2011 by Bruce E. Trumbo. All rights reserved. Partial support for the 1991 version from NSF grant USE The current version with Minitab professional graphics and examples using R is a draft. For comments, errata, selected answers, related materials, and permission to use beyond CSU East Bay please bruce.trumbo@csueastbay.edu or eric.suess@csueastbay.edu.

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Minitab Guide for MA330

Minitab Guide for MA330 Minitab Guide for MA330 The purpose of this guide is to show you how to use the Minitab statistical software to carry out the statistical procedures discussed in your textbook. The examples usually are

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

Meet MINITAB. Student Release 14. for Windows

Meet MINITAB. Student Release 14. for Windows Meet MINITAB Student Release 14 for Windows 2003, 2004 by Minitab Inc. All rights reserved. MINITAB and the MINITAB logo are registered trademarks of Minitab Inc. All other marks referenced remain the

More information

Statistical Graphics

Statistical Graphics Idea: Instant impression Statistical Graphics Bad graphics abound: From newspapers, magazines, Excel defaults, other software. 1 Color helpful: if used effectively. Avoid "chartjunk." Keep level/interests

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Numerical Descriptive Measures

Numerical Descriptive Measures Chapter 3 Numerical Descriptive Measures 1 Numerical Descriptive Measures Chapter 3 Measures of Central Tendency and Measures of Dispersion A sample of 40 students at a university was randomly selected,

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

2010 by Minitab, Inc. All rights reserved. Release Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are

2010 by Minitab, Inc. All rights reserved. Release Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are 2010 by Minitab, Inc. All rights reserved. Release 16.1.0 Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are registered trademarks of Minitab, Inc. in the United

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see

More information

Minitab Notes for Activity 1

Minitab Notes for Activity 1 Minitab Notes for Activity 1 Creating the Worksheet 1. Label the columns as team, heat, and time. 2. Have Minitab automatically enter the team data for you. a. Choose Calc / Make Patterned Data / Simple

More information

Getting Started with Minitab 17

Getting Started with Minitab 17 2014, 2016 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are all registered trademarks of Minitab, Inc., in the United States and other countries. See minitab.com/legal/trademarks

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Getting Started with Minitab 18

Getting Started with Minitab 18 2017 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are registered trademarks of Minitab, Inc., in the United States and other countries. Additional trademarks

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

MINITAB 17 BASICS REFERENCE GUIDE

MINITAB 17 BASICS REFERENCE GUIDE MINITAB 17 BASICS REFERENCE GUIDE Dr. Nancy Pfenning September 2013 After starting MINITAB, you'll see a Session window above and a worksheet below. The Session window displays non-graphical output such

More information

Statistics 528: Minitab Handout 1

Statistics 528: Minitab Handout 1 Statistics 528: Minitab Handout 1 Throughout the STAT 528-530 sequence, you will be asked to perform numerous statistical calculations with the aid of the Minitab software package. This handout will get

More information

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation: Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,

More information

Introductory Applied Statistics: A Variable Approach TI Manual

Introductory Applied Statistics: A Variable Approach TI Manual Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

15 Wyner Statistics Fall 2013

15 Wyner Statistics Fall 2013 15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Nonparametrics on Minitab Version 1

Nonparametrics on Minitab Version 1 θ Nonparametrics on Minitab 11.11 Version 1? ρπθ σχωµ σχωµ µρπ ρ πθ σ χω Stat 313 2000 By Julian Visch & Irene Hudson Department of Mathematics and Statistics University of Canterbury Nonparametrics on

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Macros and ODS. SAS Programming November 6, / 89

Macros and ODS. SAS Programming November 6, / 89 Macros and ODS The first part of these slides overlaps with last week a fair bit, but it doesn t hurt to review as this code might be a little harder to follow. SAS Programming November 6, 2014 1 / 89

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

for statistical analyses

for statistical analyses Using for statistical analyses Robert Bauer Warnemünde, 05/16/2012 Day 6 - Agenda: non-parametric alternatives to t-test and ANOVA (incl. post hoc tests) Wilcoxon Rank Sum/Mann-Whitney U-Test Kruskal-Wallis

More information

In Minitab interface has two windows named Session window and Worksheet window.

In Minitab interface has two windows named Session window and Worksheet window. Minitab Minitab is a statistics package. It was developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in 1972. Minitab began as a light

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Fly wing length data Sokal and Rohlf Box 10.1 Ch13.xls. on chalk board

Fly wing length data Sokal and Rohlf Box 10.1 Ch13.xls. on chalk board Model Based Statistics in Biology. Part IV. The General Linear Model. Multiple Explanatory Variables. Chapter 13.6 Nested Factors (Hierarchical ANOVA ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6,

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots. MINITAB Guide PREFACE Preface This guide is used as part of the Elementary Statistics class (Course Number 227) offered at Los Angeles Mission College. It is structured to follow the contents of the textbook

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments; A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Learn What s New. Statistical Software

Learn What s New. Statistical Software Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Data Customization

More information

For our example, we will look at the following factors and factor levels.

For our example, we will look at the following factors and factor levels. In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball

More information

Getting Started With R

Getting Started With R Installation. Getting Started With R The R software package can be obtained free from www.r-project.org. To install R on a Windows machine go to this web address; in the left margin under Download, select

More information

Chapter 1 Histograms, Scatterplots, and Graphs of Functions

Chapter 1 Histograms, Scatterplots, and Graphs of Functions Chapter 1 Histograms, Scatterplots, and Graphs of Functions 1.1 Using Lists for Data Entry To enter data into the calculator you use the statistics menu. You can store data into lists labeled L1 through

More information

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.

More information

Sta$s$cs & Experimental Design with R. Barbara Kitchenham Keele University

Sta$s$cs & Experimental Design with R. Barbara Kitchenham Keele University Sta$s$cs & Experimental Design with R Barbara Kitchenham Keele University 1 Comparing two or more groups Part 5 2 Aim To cover standard approaches for independent and dependent groups For two groups Student

More information

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113 Index A Add-on packages information page, 186 187 Linux users, 191 Mac users, 189 mirror sites, 185 Windows users, 187 aggregate function, 62 Analysis of variance (ANOVA), 152 anova function, 152 as.data.frame

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Laboratory #11. Bootstrap Estimates

Laboratory #11. Bootstrap Estimates Name Laboratory #11. Bootstrap Estimates Randomization methods so far have been used to compute p-values for hypothesis testing. Randomization methods can also be used to place confidence limits around

More information

Week 6, Week 7 and Week 8 Analyses of Variance

Week 6, Week 7 and Week 8 Analyses of Variance Week 6, Week 7 and Week 8 Analyses of Variance Robyn Crook - 2008 In the next few weeks we will look at analyses of variance. This is an information-heavy handout so take your time reading it, and don

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

Practical 2: Using Minitab (not assessed, for practice only!)

Practical 2: Using Minitab (not assessed, for practice only!) Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

Chapter 3: Describing, Exploring & Comparing Data

Chapter 3: Describing, Exploring & Comparing Data Chapter 3: Describing, Exploring & Comparing Data Section Title Notes Pages 1 Overview 1 2 Measures of Center 2 5 3 Measures of Variation 6 12 4 Measures of Relative Standing & Boxplots 13 16 3.1 Overview

More information

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler JMP in a nutshell 1 HR, 17 Apr 2018 The software JMP Pro 14 is installed on the Macs of the Phonetics Institute. Private versions can be bought from

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

Sections 4.3 and 4.4

Sections 4.3 and 4.4 Sections 4.3 and 4.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 32 4.3 Areas under normal densities Every

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions Technical Support Free technical support Worksheet Size All registered users, including students Registered instructors Number of worksheets Limited only by system resources 5 5 Number of cells per worksheet

More information

Distributions of random variables

Distributions of random variables Chapter 3 Distributions of random variables 31 Normal distribution Among all the distributions we see in practice, one is overwhelmingly the most common The symmetric, unimodal, bell curve is ubiquitous

More information

Minitab on the Math OWL Computers (Windows NT)

Minitab on the Math OWL Computers (Windows NT) STAT 100, Spring 2001 Minitab on the Math OWL Computers (Windows NT) (This is an incomplete revision by Mike Boyle of the Spring 1999 Brief Introduction of Benjamin Kedem) Department of Mathematics, UMCP

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

Intro To Excel Spreadsheet for use in Introductory Sciences

Intro To Excel Spreadsheet for use in Introductory Sciences INTRO TO EXCEL SPREADSHEET (World Population) Objectives: Become familiar with the Excel spreadsheet environment. (Parts 1-5) Learn to create and save a worksheet. (Part 1) Perform simple calculations,

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

GETTING STARTED WITH MINITAB INTRODUCTION TO MINITAB STATISTICAL SOFTWARE

GETTING STARTED WITH MINITAB INTRODUCTION TO MINITAB STATISTICAL SOFTWARE Six Sigma Quality Concepts & Cases Volume I STATISTICAL TOOLS IN SIX SIGMA DMAIC PROCESS WITH MINITAB APPLICATIONS CHAPTER 2 GETTING STARTED WITH MINITAB INTRODUCTION TO MINITAB STATISTICAL SOFTWARE Amar

More information

Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11}

Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11} Entering Data: Basic Commands Consider the data set: {15, 22, 32, 31, 52, 41, 11} Data is stored in Lists on the calculator. Locate and press the STAT button on the calculator. Choose EDIT. The calculator

More information

The Power and Sample Size Application

The Power and Sample Size Application Chapter 72 The Power and Sample Size Application Contents Overview: PSS Application.................................. 6148 SAS Power and Sample Size............................... 6148 Getting Started:

More information

Lecture Notes 3: Data summarization

Lecture Notes 3: Data summarization Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &

More information

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit. Continuous Improvement Toolkit Normal Distribution The Continuous Improvement Map Managing Risk FMEA Understanding Performance** Check Sheets Data Collection PDPC RAID Log* Risk Analysis* Benchmarking***

More information

Nonparametric Testing

Nonparametric Testing Nonparametric Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Data Analysis Guidelines

Data Analysis Guidelines Data Analysis Guidelines DESCRIPTIVE STATISTICS Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula

More information

4. Descriptive Statistics: Measures of Variability and Central Tendency

4. Descriptive Statistics: Measures of Variability and Central Tendency 4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and

More information