Averages and Variation

Similar documents
Chapter 2 Describing, Exploring, and Comparing Data

3.2-Measures of Center

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

15 Wyner Statistics Fall 2013

Measures of Central Tendency

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

CHAPTER 3: Data Description

CHAPTER 2: SAMPLING AND DATA

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

September 11, Unit 2 Day 1 Notes Measures of Central Tendency.notebook

Chapter 3 - Displaying and Summarizing Quantitative Data

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

UNIT 1A EXPLORING UNIVARIATE DATA

CHAPTER 2 DESCRIPTIVE STATISTICS

Measures of Dispersion

Descriptive Statistics

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

Frequency Distributions

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Exploratory Data Analysis

AND NUMERICAL SUMMARIES. Chapter 2

appstats6.notebook September 27, 2016

Section 6.3: Measures of Position

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Chapter 6: DESCRIPTIVE STATISTICS

Lecture 3: Chapter 3

Measures of Central Tendency

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

Univariate Statistics Summary

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Basic Statistical Terms and Definitions

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Measures of Position

Table of Contents (As covered from textbook)

Chapter 3: Describing, Exploring & Comparing Data

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Lecture Notes 3: Data summarization

The Normal Distribution

+ Statistical Methods in

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

STA 570 Spring Lecture 5 Tuesday, Feb 1

No. of blue jelly beans No. of bags

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

LESSON 3: CENTRAL TENDENCY

DAY 52 BOX-AND-WHISKER

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

2.1: Frequency Distributions and Their Graphs

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

Chapter 3 Analyzing Normal Quantitative Data

MAT 110 WORKSHOP. Updated Fall 2018

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Chapter 1. Looking at Data-Distribution

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

L E A R N I N G O B JE C T I V E S

Math 155. Measures of Central Tendency Section 3.1

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Downloaded from

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

MAT 155. Z score. August 31, S3.4o3 Measures of Relative Standing and Boxplots

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Applied Statistics for the Behavioral Sciences

MATH NATION SECTION 9 H.M.H. RESOURCES

Chapter 6 Normal Probability Distributions

SCHOOL OF BUSINESS, ECONOMICS AND MANAGEMENT BBA240 STATISTICS/ QUANTITATIVE METHODS FOR BUSINESS AND ECONOMICS

Understanding and Comparing Distributions. Chapter 4

Chapter 2 Modeling Distributions of Data

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Chapter2 Description of samples and populations. 2.1 Introduction.

Numerical Summaries of Data Section 14.3

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

Box Plots. OpenStax College

1.2. Pictorial and Tabular Methods in Descriptive Statistics

How individual data points are positioned within a data set.

IT 403 Practice Problems (1-2) Answers

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Let s take a closer look at the standard deviation.

1.3 Graphical Summaries of Data

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

Mean,Median, Mode Teacher Twins 2015

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Section 3.2 Measures of Central Tendency MDM4U Jensen

Transcription:

Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1

Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2

Focus Points Compute mean, median, and mode from raw data. Interpret what mean, median, and mode tell you. Explain how mean, median, and mode can be affected by extreme data values. Compute a weighted average. 3.1-3

Arithmetic Mean Arithmetic Mean (Mean) the measure of center obtained by adding the values and dividing the total by the number of values What most people call an average. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-4

Notation denotes the sum of a set of values. x n is the variable usually used to represent the individual data values. represents the number of data values in a sample. N represents the number of data values in a population. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-5

Notation x is pronounced x-bar and denotes the mean of a set of sample values x x = n µ is pronounced mu and denotes the mean of all values in a population µ = x N Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-6

Mean Advantages Is relatively reliable, means of samples drawn from the same population don t vary as much as other measures of center Takes every data value into account Disadvantage Is sensitive to every data value, one extreme value can affect it dramatically; is not a resistant measure of center Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-7

Trimmed mean(optional part) A measure of center that is more resistant than the mean but still sensitive to specific data values is the trimmed mean. A trimmed mean is the mean of the data values left after trimming a specified percentage of the smallest and largest data values from the data set. 3.1-8

Trimmed Mean Usually a 5% trimmed mean is used. This implies that we trim the lowest 5% of the data as well as the highest 5% of the data. A similar procedure is used for a 10% trimmed mean. Procedure: 3.1-9

Median Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude often denoted by x ~ (pronounced x-tilde ) is not affected by an extreme value - is a resistant measure of the center Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-10

Finding the Median First sort the values (arrange them in order), the follow one of these 1. If the number of data values is odd, the median is the number located in the exact middle of the list. 2. If the number of data values is even, the median is found by computing the mean of the two middle numbers. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-11

5.40 1.10 0.42 0.73 0.48 1.10 0.42 0.48 0.73 1.10 1.10 5.40 (in order - even number of values no exact middle shared by two numbers) 0.73 + 1.10 2 MEDIAN is 0.915 5.40 1.10 0.42 0.73 0.48 1.10 0.66 0.42 0.48 0.66 0.73 1.10 1.10 5.40 (in order - odd number of values) exact middle MEDIAN is 0.73 Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-12

Mode Mode the value that occurs with the greatest frequency Data set can have one, more than one, or no mode Bimodal two data values occur with the same greatest frequency Multimodal more than two data values occur with the same greatest frequency No Mode no data value is repeated Mode is the only measure of central tendency that can be used with nominal data Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-13

Mode - Examples a. 5.40 1.10 0.42 0.73 0.48 1.10 b. 27 27 27 55 55 55 88 88 99 c. 1 2 3 6 7 8 9 10 Mode is 1.10 Bimodal - 27 & 55 No Mode Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-14

Definition Midrange the value midway between the maximum and minimum values in the original data set Midrange = maximum value + minimum value 2 Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-15

Midrange Sensitive to extremes because it uses only the maximum and minimum values, so rarely used Redeeming Features (1) very easy to compute (2) reinforces that there are several ways to define the center (3) Avoids confusion with median Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-16

Round-off Rule for Measures of Center Carry one more decimal place than is present in the original set of values. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-17

Critical Thinking Think about whether the results are reasonable. Think about the method used to collect the sample data. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-18

Weighted Average 3.1-19

Weighted Average Sometimes we wish to average numbers, but we want to assign more importance, or weight, to some of the numbers. For instance, suppose your professor tells you that your grade will be based on a midterm and a final exam, each of which is based on 100 possible points. However, the final exam will be worth 60% of the grade and the midterm only 40%. How could you determine an average score that would reflect these different weights? 3.1-20

Weighted Average The average you need is the weighted average. 3.1-21

Example Weighted Average Suppose your midterm test score is 83 and your final exam score is 95. Using weights of 40% for the midterm and 60% for the final exam, compute the weighted average of your scores. If the minimum average for an A is 90, will you earn an A? Solution: By the formula, we multiply each score by its weight and add the results together. 3.1-22

Example Solution cont d Then we divide by the sum of all the weights. Converting the percentages to decimal notation, we get Your average is high enough to earn an A. 3.1-23

Example 2 Weighted Mean In her first semester of college, a student of the author took five courses. Her final grades along with the number of credits for each course were A (3 credits), A (4 credits), B (3 credits), C (3 credits), and F (1 credit). The grading system assigns quality points to letter grades as follows: A = 4; B = 3; C = 2; D = 1; F = 0. Compute her grade point average. Solution Use the numbers of credits as the weights: w = 3, 4, 3, 3, 1. Replace the letters grades of A, A, B, C, and F with the corresponding quality points: x = 4, 4, 3, 2, 0. 3.1-24

Example 2 Weighted Mean Solution x w x w 3 4 4 4 3 3 3 2 1 0 3 4 3 3 1 43 3. 07 14 3.1-25

Mean from a Frequency Distribution Assume that all sample values in each class are equal to the class midpoint. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-26

Mean from a Frequency Distribution use class midpoint of classes for variable x Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-27

Example Estimate the mean from the IQ scores in Chapter 2. x ( f x) 7201.0 f 78 92.3 3.1-28

Best Measure of Center Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-29

Skewed and Symmetric Symmetric distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half Skewed distribution of data is skewed if it is not symmetric and extends more to one side than the other Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-30

Skewed Left or Right Skewed to the left (also called negatively skewed) have a longer left tail, mean and median are to the left of the mode Skewed to the right (also called positively skewed) have a longer right tail, mean and median are to the right of the mode Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-31

Shape of the Distribution The mean and median cannot always be used to identify the shape of the distribution. Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-32

Skewness Copyright 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 3.1-33

Section 3.2 Measures of Variation Copyright Cengage Learning. All rights reserved. 3.1-34

Focus Points Find the range, variance, and standard deviation. Compute the coefficient of variation from raw data. Why is the coefficient of variation important? 3.1-35

Definition The range of a set of data values is the difference between the maximum data value and the minimum data value. Range = (maximum value) (minimum value) Example: Range of {1, 3, 14} is 14-1=13. It is very sensitive to extreme values; therefore not as useful as other measures of variation. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-36

Round-Off Rule for Measures of Variation When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data. Round only the final answer, not values in the middle of a calculation. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-37

Definition The standard deviation of a set of sample values, denoted by s, is a measure of variation of values about the mean. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-38

Sample Standard Deviation Formula (x x) 2 s = n 1 Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-39

Sample Standard Deviation (Shortcut Formula) s = n x 2 ) ( x) 2 n (n 1) Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-40

Example Use either formula to find the standard deviation of these numbers of a sample of chocolate chips: 22, 22, 26, 24 3.1-41

Example x x 22 22 26 24 n 4 23.5 s x x n 1 2 22 23.5 22 23.5 26 23.5 24 23.5 2 2 2 2 4 1 11 1.9149 3 3.1-42

Another Example: Publix checkout waiting times in minutes Dataset: {1, 4, 10}. Find the sample mean and sample standard deviation. Using the shortcut 2 x x x 2 ( x x) formula: 15 x 5.0 min 3 s n=3 x x n 1 1-4 16 1-5= 4-1 1 10 5 25 15 42 x ( x x) 2 42 3 1 21 x 1 16 100 117 x 2 2 4.6 min s 2 n x 21 n( n 1) 3(117) 15 3(3 1) 351 225 6 4.6 min 2 x 2 126 6 3.1-43

Standard Deviation - Important Properties The standard deviation is a measure of variation of all values from the mean. The value of the standard deviation s is usually positive. The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others). The units of the standard deviation s are the same as the units of the original data values. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-44

Comparing Variation in Different Samples It s a good practice to compare two sample standard deviations only when the sample means are approximately the same. When comparing variation in samples with very different means, it is better to use the coefficient of variation, which is defined later in this section. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-45

Population Standard Deviation = 2 (x µ) N This formula is similar to the previous formula, but instead, the population mean and population size are used. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-46

Variance The variance of a set of values is a measure of variation equal to the square of the standard deviation. Sample variance: s 2 - Square of the sample standard deviation s Population variance: 2 - Square of the population standard deviation Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-47

Unbiased Estimator The sample variance s 2 is an unbiased estimator of the population variance 2, which means values of s 2 tend to target the value of 2 instead of systematically tending to overestimate or underestimate 2. Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-48

Variance - Notation s = sample standard deviation s 2 = sample variance = population standard deviation 2 = population variance Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-49

Properties of the Standard Deviation Measures the variation among data values Values close together have a small standard deviation, but values with much more variation have a larger standard deviation Has the same units of measurement as the original data Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-50

Properties of the Standard Deviation For many data sets, a value is unusual if it differs from the mean by more than two standard deviations Compare standard deviations of two different data sets only if the they use the same scale and units, and they have means that are approximately the same Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-51

Coefficient of Variation The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean. Sample Population s CV = 100% x CV = 100% m Copyright 2010, Pearson 2007, 2004 Education Pearson Education, Inc. All Rights Reserved. 3.1-52

Example: How to compare the variability in heights and weights of men? Sample: 40 males were randomly selected. The summarized statistics are given below. Sample mean Height 68.34 in 3.02 in Weight 172.55 lb 26.33 lb Sample standard deviation Solution: Use CV to compare the variability Heights: CV s x s x 100% 3.02 68.34 26.33 172.55 100% 4.42% Weights: CV 100% 100% 15.26% Conclusion: Heights (with CV=4.42%) have considerably less variation than weights (with CV=15.26%) 3.1-53

Section 3.3 Percentiles and Box-and-Whisker Plots Copyright Cengage Learning. All rights reserved. 3.1-54

Focus Points Interpret the meaning of percentile scores. Compute the median, quartiles, and five-number summary from raw data. Make a box-and-whisker plot. Interpret the results. Describe how a box-and-whisker plot indicates spread of data about the median. 3.1-55

Percentiles and Box-and-Whisker Plots We ve seen measures of central tendency and spread for a set of data. The arithmetic mean x and the standard deviation s will be very useful in later work. However, because they each utilize every data value, they can be heavily influenced by one or two extreme data values. In cases where our data distributions are heavily skewed or even bimodal, we often get a better summary of the distribution by utilizing relative position of data rather than exact values. 3.1-56

Percentiles and Box-and-Whisker Plots We know that the median is an average computed by using relative position of the data. If we are told that 81 is the median score on a biology test, we know that after the data have been ordered, 50% of the data fall at or below the median value of 81. The median is an example of a percentile; in fact, it is the 50th percentile. The general definition of the P th percentile follows. 3.1-57

Percentiles and Box-and-Whisker Plots In Figure 3-3, we see the 60th percentile marked on a histogram. We see that 60% of the data lie below the mark and 40% lie above it. A Histogram with the 60th Percentile Shown Figure 3-3 3.1-58

Percentiles and Box-and-Whisker Plots There are 99 percentiles, and in an ideal situation, the 99 percentiles divide the data set into 100 equal parts. (See Figure 3-4.) However, if the number of data elements is not exactly divisible by 100, the percentiles will not divide the data into equal parts. Percentiles Figure 3-4 3.1-59

Percentiles and Box-and-Whisker Plots There are several widely used conventions for finding percentiles. They lead to slightly different values for different situations, but these values are close together. For all conventions, the data are first ranked or ordered from smallest to largest. A natural way to find the Pth percentile is to then find a value such that P% of the data fall at or below it. This will not always be possible, so we take the nearest value satisfying the criterion. It is at this point that there is a variety of processes to determine the exact value of the percentile. 3.1-60

Percentiles and Box-and-Whisker Plots We will not be very concerned about exact procedures for evaluating percentiles in general. However, quartiles are special percentiles used so frequently that we want to adopt a specific procedure for their computation. Quartiles are those percentiles that divide the data into fourths. 3.1-61

Percentiles and Box-and-Whisker Plots The first quartile Q 1 is the 25th percentile, the second quartile Q 2 is the median, and the third quartile Q 3 is the 75th percentile. (See Figure 3-5.) Quartiles Figure 3-5 Again, several conventions are used for computing quartiles, but the convention on next page utilizes the median and is widely adopted. 3.1-62

Percentiles and Box-and-Whisker Plots Procedure 3.1-63

Percentiles and Box-and-Whisker Plots In short, all we do to find the quartiles is find three medians. The median, or second quartile, is a popular measure of the center utilizing relative position. A useful measure of data spread utilizing relative position is the interquartile range (IQR). It is simply the difference between the third and first quartiles. Interquartile range = Q 3 Q 1 The interquartile range tells us the spread of the middle half of the data. Now let s look at an example to see how to compute all of these quantities. 3.1-64

Example Quartiles In a hurry? On the run? Hungry as well? How about an ice cream bar as a snack? Ice cream bars are popular among all age groups. Consumer Reports did a study of ice cream bars. Twenty-seven bars with taste ratings of at least fair were listed, and cost per bar was included in the report. Just how much does an ice cream bar cost? The data, expressed in dollars, appear in Table 3-4. Cost of Ice Cream Bars (in dollars) Table 3-4 3.1-65

Example Quartiles cont d As you can see, the cost varies quite a bit, partly because the bars are not of uniform size. (a) Find the quartiles. Solution: We first order the data from smallest to largest. Table 3-5 shows the data in order. Ordered Cost of Ice Cream Bars (in dollars) Table 3-5 3.1-66

Example Solution cont d Next, we find the median. Since the number of data values is 27, there are an odd number of data, and the median is simply the center or 14th value. The value is shown boxed in Table 3-5. Median = Q 2 = 0.50 There are 13 values below the median position, and Q 1 is the median of these values. 3.1-67

Example Solution cont d It is the middle or seventh value and is shaded in Table 3-5. First quartile = Q 1 = 0.33 There are also 13 values above the median position. The median of these is the seventh value from the right end. This value is also shaded in Table 3-5. Third quartile = Q 3 = 1.00 3.1-68

Example Quartiles cont d (b) Find the interquartile range. Solution: IQR = Q 3 Q 1 = 1.00 0.33 = 0.67 This means that the middle half of the data has a cost spread of 67. 3.1-69

Box-and-Whisker Plots 3.1-70

Box-and-Whisker Plots The quartiles together with the low and high data values give us a very useful five-number summary of the data and their spread. We will use these five numbers to create a graphic sketch of the data called a box-and-whisker plot. Box-and-whisker plots provide another useful technique from exploratory data analysis (EDA) for describing data. 3.1-71

Box-and-Whisker Plots Procedure Box-and-Whisker Plot Figure 3-6 The next example demonstrates the process of making a box-and-whisker plot. 3.1-72

Example Box-and-whisker plot Make a box-and-whisker plot showing the calories in vanilla-flavored ice cream bars. Use the plot to make observations about the distribution of calories. (a) We ordered the data (see Table 3-7) and found the values of the median, Q 1, and Q 3. Ordered Data Table 3-7 3.1-73

Example Box-and-whisker plot cont d From this previous work we have the following fivenumber summary: low value = 111; Q 1 = 182; median = 221.5; Q 3 = 319; high value = 439 3.1-74

Example Box-and-whisker plot cont d (b) We select an appropriate vertical scale and make the plot (Figure 3-7). Box-and-Whisker Plot for Calories in Vanilla-Flavored Ice Cream Bars Figure 3-7 3.1-75

Example Box-and-whisker plot cont d (c) Interpretation A quick glance at the box-and-whisker plot reveals the following: (i) The box tells us where the middle half of the data lies, so we see that half of the ice cream bars have between 182 and 319 calories, with an interquartile range of 137 calories. (ii) The median is slightly closer to the lower part of the box. This means that the lower calorie counts are more concentrated. The calorie counts above the median are more spread out, indicating that the distribution is slightly skewed toward the higher values. 3.1-76

Example Box-and-whisker plot cont d (iii) The upper whisker is longer than the lower, which again emphasizes skewness toward the higher values. 3.1-77