An Introduction to Minitab Statistics 529 1 Introduction MINITAB is a computing package for performing simple statistical analyses. The current version on the PC is 15. MINITAB is no longer made for the Macintosh. MINITAB is also available for unix and linux systems (but not on our systems). These instructions should work with the version 10 and above (there are some changes to the menus for displaying graphs in versions 14 and 15) To start MINITAB on a PC, click on the start button, select All Programs, Minitab Solutions and then Minitab 15 Statistical Software. 2 Getting help Select the menu Help Help to bring up a help window. There is help available on a variety of topics. In particular there is help on handling files, editing and manipulating data, graphing, using session and macro commands, and most importantly doing statistics with MINITAB. The Methods and Formulas subsection under References is especially helpful in explaining various statistical and computing terms. 3 Using worksheets Data is stored in a worksheet. Think of it as a spreadsheet. The variables are columns. Variables can be given a name or are referred to by C1, C2,... Remember to always name your variables The individuals are stored in the numbered rows. A case is an entire row of the variables. A new worksheet called Worksheet 1 is available whenever you start the program. If you need to create a new worksheet use the menu option File New: A dialog box will appear, select Minitab Worksheet and OK. The new worksheet will appear in the MINITAB program window. 1
Figure 1: The MINITAB worksheet 3.1 A motivating example We consider data derived from a dataset presented in Mackowiak, P. A., Wasserman, S. S., and Levine, M. M. (1992), A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich, Journal of the American Medical Association, 268, 1578 1580. The data are a random sample of 130 individuals. For each individual the body temperature (in degrees Fahrenheit) and heart rate (in beats per minutes, bpm) were measured. The gender for each subject was also recorded. If you wanted to enter the data you would: Select a column for the first variable e.g., C1. Type the name of the variable below the column number; e.g., Body temperature (Fahrenheit) (the column width of the worksheet increases to fit longer names). You could then enter the body temperatures in rows 1 to 130. Instead, we will load the data in from a file. 3.2 Loading and saving data Use the File Open Worksheet option to load saved data into MINITAB. Datasets for this class are available from http://www.stat.osu.edu/~hans/courses/529/data.html Just click on the file, and it should open automatically into MINITAB. You can import data from different sources (e.g., ASCII text file) by changing the Files of Type option in the Open Worksheet dialog. To save the current worksheet use the menu option File Save Current Worksheet. The first time you do this you need to enter a filename for the worksheet. It helps to store your data and work in a new folder for each homework/project you are working on. To save the current worksheet under a different filename use the option File Save Current Worksheet As. 2
4 Graphical summaries of data 4.1 Histograms To produce a histogram of Body temperature: Select Graph Histogram. Choose the Simple version of the histogram (you can try more complicated histograms later). A dialog box now appears. This is how you set up your plot. In graph variables under X you select the variable(s) for which you want to produce a histogram. Either type the variable number (e.g. C1) into the box, or in the right hand panel click on C1 Body temperature (Fahrenheit) and choose Select. Click on Scale to change the appearance of the scales. In particular under the Y-Scale type you can select between Frequency, Percent, and Density. Press OK to produce the plot a new graph window will appear with the resulting graph. You can only change the X-scale after you have created the plot. Double click on the X-scale to change the scale. (Right clicking the X-scale of the histogram and selecting Edit X-Scale will also work). Changing the binning can be especially useful. You can edit any other feature of a graph, in a similar way by doubling clicking on it. To include your graph in a word processed document: Select the graph you are interested in. Press the right mouse button - a menu appears. Select Copy graph. Paste the plot into your favorite word processor using Edit Paste. 4.2 Boxplots Use Graph Boxplot to create boxplots. This is similar to the histogram command. Choose the One Y, Simple plot for histograms of one variable. To produce a side-by-side boxplot of the body temperature by gender use the One Y, With Groups command. Enter C1 (Body temperature) for the Graph variables and C3 (Gender) for the Categorical variables for grouping. Then press OK. 4.3 Normal quantile-quantile (Q-Q) plots To produce a normal quantile-quantile (Q-Q) plot (also called a normal quantile plot) of Body temperature: Select the menu command Graph Probability Plot. Select the Simple graph type. In the dialog box for Graph variables select C1. Click Distribution: Under the Data Display tab, untick Show confidence interval and click OK. Click Scale: 3
Under the Axes and Ticks tab, select Transpose Y and X. Under tab Y-scale Type, select Score, and click OK. Click OK again to produce the figure. Rename the labels on the figure by double clicking on the text you want to change. Enter a new label in the Text box and click OK. 4.4 Bar and pie charts Now we consider a different dataset. Suppose we have the following summary data stored in a Minitab worksheet: C1-T C2 gender count 4 male 3 To make a bar chart of these data, use the Graph Bar Chart command. Choose Value from a table for the Bars represent option and click OK. Enter count for the Graph variables and gender for the Categorical variable and click OK. If you data is represented instead as C1-T gender male male male choose Counts of unique values for the Bars represent option, click OK and then enter gender in the Categorical Variables box. If you want to change the Y-scale, then double click on the Y-axis, and after selecting Position of ticks, enter the Y label values. If you do not want a gap between the X-values (i.e., a histogram, not a bar chart), double click on the Y-axis, and de-select Gaps between clusters, and enter a value of 0. Pie charts are not a useful summary in practice. Use the Graph Pie Chart menu command to create them, if you must. 4.5 Time series plots Will use the Lake Huron levels dataset from the website. The dataset contains the annual measurements of the level, in feet, of Lake Huron 1875-1972. (Source: Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York). Select Graph Time Series Plot and then the Simple plot. Choose level for the series. Now select Time/Scale, choose Stamp, and enter year for the Stamp Columns. Click OK to close the Time/Scale dialog box, and then OK again to produce the graph. 4
5 Numerical summaries To summarize our body temperatures dataset: Select Stat Basic Statistics Display Descriptive Statistics. A dialog box now appears. In variable select the variable you want to summarize. Either type the variable number (e.g. C1) into the box, or in the right hand panel click on C1 body temperatures and choose Select. You can customize the statistics calculated by clicking on the Statistics button (e.g., the IQR) To produce the summaries click OK in the dialog. You can also produce plots of the data by clicking on Graphs. Do not use this option! These graphs contain extra elements that are hard to interpret Use the commands you learned earlier. The summaries are presented in the Session window: Descriptive Statistics: Body temperature (Fahrenheit) Variable N N* Mean SE Mean StDev Minimum Q1 Body temperature (Fahren 130 0 98.249 0.0643 0.733 96.300 97.800 Variable Median Q3 Maximum IQR Body temperature (Fahren 98.300 98.700 100.800 0.900 Here are the headings (you should already know these quantities): N: number of observations N*: number of missing values Mean: sample mean StDev: standard deviation SE Mean: standard error for the mean (StDev divided by the square root of N) Minimum: sample maximum Q1: first sample quartile Median: sample median Q3: third sample quartile Maximum: sample maximum IQR: interquartile range (Q3 minus Q1) 5
To produce a summary of the body temperatures by gender enter C3 Gender in the By variables of the Display Descriptive Statistics dialog box: Descriptive Statistics: Body temperature (Fahrenheit) Variable Gender N N* Mean SE Mean StDev Minimum Body temperature (Fahren 65 0 98.394 0.0922 0.743 96.400 male 65 0 98.105 0.0867 0.699 96.300 Variable Gender Q1 Median Q3 Maximum IQR Body temperature (Fahren 98.000 98.400 98.800 100.800 0.800 male 97.600 98.100 98.600 99.500 1.000 6