TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise # 63 on page 46 of the textbook. Exercise 63: Asampleofsize77individualsworkingataparticularo cewas selected and the noise level (dba) experienced by each individual was determined, yielding the following data: 55.3 55.3 55.3 55.9 55.9 55.9 55.9 56.1 56.1 56.1 56.1 56.1 56.1 56.8 56.8 57.0 57.0 57.0 57.8 57.8 57.8 57.9 57.9 57.9 58.8 58.8 58.8 59.8 59.8 59.8 62.2 62.2 63.8 63.8 63.8 63.9 63.9 63.9 64.7 64.7 64.7 65.1 65.1 65.1 65.3 65.3 65.3 65.3 67.4 67.4 67.4 67.4 68.7 68.7 68.7 68.7 69.0 70.4 70.4 71.2 71.2 71.2 73.0 73.0 73.1 73.1 74.6 74.6 74.6 74.6 79.3 79.3 79.3 79.3 83.0 83.0 83.0 The publisher has already provided this data, and all other datasets, in MINITAB format. Make sure you download the data from either the Canvas page or the course webpage (http://www.auburn.edu/~carpedm/courses/stat3610) Opening an Existing MINITAB worksheet: To open an existing MINITAB dataset, after you have started a MINITAB session. Go to File, then Open Worksheet. At this point you must navigate to either an existing MINTAB worksheet (*.mtw) or project (*.mtp). The textbook data is saved as *.mtw. So, navigate to the folder where you have downloaded the textbook data and scroll down till you see Ex01-63.mtw. When you click on this dataset it should open up into the worksheet/spreadsheet screen in MINITAB. Make sure you confirm that the data is now open inside of MINITAB before proceeding. In the pages that follow, we show you how to produce the following for the above example data. 1. Stem-and-Leaf Plot 2. Numerical Statistics 3. Boxplots 4. Historgrams 5. Normal Probability Plot Page 1
1. Producing a Stem-and-Leaf Plot in MINITAB: Go to Graph, then Stem-and-leaf. When the dialogue box appears, you should see a list of variables in the worksheet, which in this case is one variable, C1: Noise (dba). Click your mouse in the Graph Variables box, then bring the mouse over to the variable list and double-click on the variable you want to build a stem-and-leaf plot for. This action should result in the variable name appearing in the Graph Variables box. Note: for this example we will use the default setting for the plot, so just click OK after you have completed the previous steps. The following stem-and-leaf plot should appear on your screen inside the session area (note: this can be directly copied and pasted into any document). Stem-and-Leaf Display: noise (dba) Stem-and-leaf of noise (dba) N = 77 Leaf Unit = 1.0 7 5 5555555 24 5 66666666777777777 30 5 888999 30 6 38 6 22333333 (10) 6 4445555555 29 6 7777 25 6 88889 20 7 00111 15 7 3333 11 7 4444 7 7 7 7 9999 3 8 3 8 333 The default plot used the tens place as the stems and the ones place for the leaves and the decimal place was ignored. The first column in the above plot represents the cumulative frequency of each row till you get to the middle and then its the reverse order cumulative frequency from the largest number to the middle. Reporting the data like this makes it easy for us to examine the shape of the sample distribution, location, the quantiles like the median and the first and third quartiles and find any gaps in the data. Note: since the sample size, n=77, is odd, the median has rank (n +1)/2 =78/2 =39. So, the median is located in the 39 t h position in the ordered dataset. MINITAB orders the leaves, so we know that the median is the first value Page 2
on the center stem, since the first 38 smallest values are 63 or less. Therefore, the median is 64.7, so x =64.7. 2. Computing Numerical Descriptive Statistics: To find the numerical statistics using MINITAB, go to Stat, Basic Statistics, and Display Descriptive Statistics. As before, click your mouse in the Variables box first, then go over to the list of variables and double-click on the variable of interest. This variable name (and/or label) should appear in the variables box. Now, to choose the statistics you want computed click on the Statistics button and check the boxes of the statistics you want computed. For this example, I chose Mean, Standard Deviation, Variance, First Quartile, Median and Third Quartile. After you have checked the appropriate boxes, click OK and then again click OK in the original dialogue box. The statistics you selected should appear in you session window. I have copied and pasted them below. Descriptive Statistics: noise (dba) Variable N N* Mean StDev Variance Q1 Median Q3 noise (dba) 77 0 64.887 7.803 60.882 57.800 64.700 70.400 Note that the above indicates that n =77. ThevalueN = 0 in MINITAB means the number of missing values is zero. The sample mean is x =64.89, the sample standard deviation is s =7.80, the sample variance is s 2 =60.88, the first quartile is Q 1 =57.8, the median is x =64.7 andthethirdquartileisq 3. Page 3
3. BoxPlot, a graphical representation of the sample distribution: To get a BoxPlot from MINITAB, go to Graph, Boxplot, and chose Simple for the type of boxplot and then click OK. This brings you to the selection of variables and options dialogue box, as before. Click your mouse in the Graph Variables box then move the mouse over to the variables list and double-click on the variable you wish to graph. To use the default settings, click OK and the following boxplot should appear. Recall, the boxplot graphically represents the sample distribution by plotting the First Quartile (top of the box), the Median (the line in between the top and bottom of the box) and the third quartile (the top of the box), along with the whiskers, Q1 1.5(Q3 Q1) and Q3+1.5(Q3 Q1). Page 4
4. Histogram: another way to examine the sample distribution: To produce a histogram of the data, go to Graph, then Histogram and choose Simple or with fit. Simple just gives you a Histogram, with fit gives you the same histogram but with a Normal curve overlaid (or whatever distribution you choose in the options). I chose with fit. As before, click your mouse in the Graph Variables box and then move your mouse to the variable list and double-click the appropriate variables. For this example, we will use the default histogram. Notice that the Histogram reveals some gaps in the data (as noted earlier), that the distribution looks to be bimodal, and skewed right. The box in the upper right hand corner reports the sample size, the sample mean and sample standard deviation (all statistics we computed previously). The bell shaped curve is the pdf for a Normal random variable with mean and standard deviation equal to the sample mean and standard deviation (which is why its call a fit ). Notice that the histogram doesn t fit neatly beneath the curve which is indicative that the data might not have come from a Normal population. Page 5
5. Normal Probability Plot for assessing normality: Graph, Probability Plot, then choose single plot. As before, make sure the appropriate variable name appears in the Graph Variables Box. Make sure that the distribution that the sample quantiles are plot agains is the Normal distribution (it is the default distribution), by clicking on Distribution and choosing Normal in the distribution box (note: the are several distributions to choose from). Also, to remove the default confidence intervals, click on the Data Display tab and uncheck the box that says show confidence intervals. Click OK and OK. These actions should produce the following PP-Plot Note: the normal probability plot is a scatter plot of the quantiles from a normally distributed population versus the sample quantiles. If the sample came from a normal population, then the points should tend to fall on a straight line. This plot indicates that the data tend to deviate from a normal population because the lower tail is much heavier (notice how the lower quantiles systematically fall below the line). Page 6