B.2 Measures of Central Tendency and Dispersion

Similar documents
DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

Telecommunications and Internet Access By Schools & School Districts

Distracted Driving- A Review of Relevant Research and Latest Findings

2018 NSP Student Leader Contact Form

The Lincoln National Life Insurance Company Universal Life Portfolio

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

Ocean Express Procedure: Quote and Bind Renewal Cargo

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

CostQuest Associates, Inc.

Department of Business and Information Technology College of Applied Science and Technology The University of Akron

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

Fall 2007, Final Exam, Data Structures and Algorithms

Global Forum 2007 Venice

Silicosis Prevalence Among Medicare Beneficiaries,

State IT in Tough Times: Strategies and Trends for Cost Control and Efficiency

What Did You Learn? Key Terms. Key Concepts. 68 Chapter P Prerequisites

Presented on July 24, 2018

IT Modernization in State Government Drivers, Challenges and Successes. Bo Reese State Chief Information Officer, Oklahoma NASCIO President

NSA s Centers of Academic Excellence in Cyber Security

CSE 781 Data Base Management Systems, Summer 09 ORACLE PROJECT

Amy Schick NHTSA, Occupant Protection Division April 7, 2011

The Outlook for U.S. Manufacturing

Post Graduation Survey Results 2015 College of Engineering Information Networking Institute INFORMATION NETWORKING Master of Science

Charter EZPort User Guide

Name: Business Name: Business Address: Street Address. Business Address: City ST Zip Code. Home Address: Street Address

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

Geographic Accuracy of Cell Phone RDD Sample Selected by Area Code versus Wire Center

Tina Ladabouche. GenCyber Program Manager

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Section 3.2 Measures of Central Tendency MDM4U Jensen

2018 Supply Cheat Sheet MA/PDP/MAPD

2015 DISTRACTED DRIVING ENFORCEMENT APRIL 10-15, 2015

MERGING DATAFRAMES WITH PANDAS. Appending & concatenating Series

PulseNet Updates: Transitioning to WGS for Reference Testing and Surveillance

Best Practices in Rapid Deployment of PI Infrastructure and Integration with OEM Supplied SCADA Systems

DSC 201: Data Analysis & Visualization

Contact Center Compliance Webinar Bringing you the ANSWERS you need about compliance in your call center.

AASHTO s National Transportation Product Evaluation Program

Team Members. When viewing this job aid electronically, click within the Contents to advance to desired page. Introduction... 2

Moonv6 Update NANOG 34

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Presentation Outline. Effective Survey Sampling of Rare Subgroups Probability-Based Sampling Using Split-Frames with Listed Households

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

2013 Product Catalog. Quality, affordable tax preparation solutions for professionals Preparer s 1040 Bundle... $579

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

LESSON 3: CENTRAL TENDENCY

Sideseadmed (IRT0040) loeng 4/2012. Avo

State HIE Strategic and Operational Plan Emerging Models. February 16, 2011

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Measures of Central Tendency

ACCESS PROCESS FOR CENTRAL OFFICE ACCESS

Prizm. manufactured by. White s Electronics, Inc Pleasant Valley Road Sweet Home, OR USA. Visit our site on the World Wide Web

Strengthening connections today, while building for tomorrow. Wireless broadband, small cells and 5G

Panasonic Certification Training January 21-25, 2019

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

CIS 467/602-01: Data Visualization

A Capabilities Presentation

Averages and Variation

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

UNIT 1A EXPLORING UNIVARIATE DATA

25 Suggested Time: 30 min

MAT 110 WORKSHOP. Updated Fall 2018

Univariate Statistics Summary

Jurisdictional Guidelines for Accepting a UCC Record Presented for Filing 2010 Amendments & the 2011 IACA Forms

15 Wyner Statistics Fall 2013

Name Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2.

Homework Assignment #5

DTFH61-13-C Addressing Challenges for Automation in Highway Construction

MATH NATION SECTION 9 H.M.H. RESOURCES

Expanding Transmission Capacity: Options and Implications. What role does renewable energy play in driving transmission expansion?

Includes all of the following a value of over $695:

Development and Maintenance of the Electronic Reference Library

Presentation to NANC. January 22, 2003

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

CMPE 180A Data Structures and Algorithms in C++ Spring 2018

Fundamentals Drive the Market. Population Employment Money

2.1: Frequency Distributions and Their Graphs

On All Forms. Financing Statement (Form UCC1) Statutory, MARS or Other Regulatory Authority to Deviate

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

WHAT S NEW IN CHECKPOINT

September 11, Unit 2 Day 1 Notes Measures of Central Tendency.notebook

Descriptive Statistics

Statistics. MAT 142 College Mathematics. Module ST. Terri Miller revised December 13, Population, Sample, and Data Basic Terms.

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

National Continuity Programs

Touch Input. CSE 510 Christian Holz Microsoft Research February 11, 2016

The Normal Distribution

2013 Certification Program Accomplishments

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Organizing and Summarizing Data

Box Plots. OpenStax College

2010 Tax Year Media Guide

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

CHAPTER 2: SAMPLING AND DATA

Transcription:

Appendix B. Measures of Central Tendency and Dispersion B B. Measures of Central Tendency and Dispersion What you should learn Find and interpret the mean, median, and mode of a set of data. Determine the measure of central tendency that best represents a set of data. Find the standard deviation of a set of data. Create and use box-and-whisker plots. Why you should learn it Measures of central tendency and dispersion provide a convenient way to describe and compare sets of data. For instance, in Exercise 6 on page B, the mean and standard deviation are used to analyze the price of gold for the years 98 through 00. Mean, Median, and Mode In many real-life situations, it is helpful to describe data by a single number that is most representative of the entire collection of numbers. Such a number is called a measure of central tendency. The most commonly used measures are as follows.. The mean, or average, of n numbers is the sum of the numbers divided by n.. The numerical median of n numbers is the middle number when the numbers are written in order. If n is even, the median is the average of the two middle numbers.. The mode of n numbers is the number that occurs most frequently. If two numbers tie for most frequent occurrence, the collection has two modes and is called bimodal. Example Comparing Measures of Central Tendency On an interview for a job, the interviewer tells you that the average annual income of the company s employees is $60,89. The actual annual incomes of the employees are shown below. What are the mean, median, and mode of the incomes? $7,0, $78,0, $,678, $8,980, $7,08, $,676, $8,906, $,00, $,0, $,0, $,00, $,8, $7,0, $0,, $8,96, $,98, $6,0, $0,9, $6,8, $6,0, $,6, $98,, $8,980, $9,0, $,67 The mean of the incomes is Mean 7,0 78,0,678 8,980...,67,, $60,89. To find the median, order the incomes as follows. $,00, $,00, $6,0, $7,0, $7,08, $8,980, $0,, $,0, $,676, $8,906, $8,96, $,6, $,0, $,8, $,98, $,67, $6,0, $6,8, $7,0, $,678, $8,980, $9,0, $98,, $0,9, $78,0 From this list, you can see that the median income is $,0. You can also see that $,00 is the only income that occurs more than once. So, the mode is $,00. Now try Exercise. In Example, was the interviewer telling you the truth about the annual incomes? Technically, the person was telling the truth because the average is (generally) defined to be the mean. However, of the three measures of central tendency mean: $60,89 median: $,0 mode: $,00 it seems clear that the median is most representative. The mean is inflated by the two highest salaries.

B6 Appendix B Concepts in Statistics Choosing a Measure of Central Tendency Which of the three measures of central tendency is the most representative? The answer is that it depends on the distribution of the data and the way in which you plan to use the data. For instance, in Example, the mean salary of $60,89 does not seem very representative to a potential employee. To a city income tax collector who wants to estimate % of the total income of the employees, however, the mean is precisely the right measure. Example Choosing a Measure of Central Tendency Which measure of central tendency is the most representative of the data shown in each frequency distribution? a. Number 6 7 8 9 7 0 8 0 b. c. Number 6 7 8 9 9 8 7 6 6 7 8 9 Number 6 7 8 9 6 0 a. For these data, the mean is., the median is, and the mode is. Of these, the mode is probably the most representative measure. b. For these data, the mean and median are each and the modes are and 9 (the distribution is bimodal). Of these, the mean or median is the most representative measure. c. For these data, the mean is.9, the median is, and the mode is. Of these, the mean or median is the most representative measure. Now try Exercise. Variance and Standard Deviation Very different sets of numbers can have the same mean. You will now study two measures of dispersion, which give you an idea of how much the numbers in a data set differ from the mean of the set. These two measures are called the variance of the set and the standard deviation of the set. Definitions of Variance and Standard Deviation Consider a set of numbers x, x,..., x n with a mean of x. The variance of the set is v x x x x... x n x n v and the standard deviation of the set is ( is the lowercase Greek letter sigma).

Appendix B. Measures of Central Tendency and Dispersion B7 The standard deviation of a data set is a measure of how much a typical number in the set differs from the mean. The greater the standard deviation, the more the numbers in the set vary from the mean. For instance, each of the following data sets has a mean of.,,,,,, 6, 6, and,, 7, 7 The standard deviations of the data sets are 0,, and. 0 6 6 7 7 Example Estimations of Standard Deviation Consider the three frequency distributions represented by the bar graphs in Figure B.. Which data set has the smallest standard deviation? Which has the largest? Data Set A Data Set B Data Set C FIGURE B. 6 7 6 7 6 7 Number Number Number Of the three data sets, the numbers in data set A are grouped most closely to the center and the numbers in data set C are the most dispersed. So, data set A has the smallest standard deviation and data set C has the largest standard deviation. Now try Exercise 7.

B8 Appendix B Concepts in Statistics Example Finding Standard Deviation Find the standard deviation of each data set shown in Example. Because of the symmetry of each bar graph, you can conclude that each has a mean of x. The standard deviation of data set A is ( 0.. The standard deviation of data set B is 0. The standard deviation of data set C is 0.. These values confirm the results of Example. That is, data set A has the smallest standard deviation and data set C has the largest. Now try Exercise 9. 7 6 The following alternative formula provides a more efficient way to compute the standard deviation. Alternative Formula for Standard Deviation The standard deviation of x, x,..., x n is x x... x n n x. Because of lengthy computations, this formula is difficult to verify. Conceptually, however, the process is straightforward. It consists of showing that the expressions x x x x... x n x n and x x... x n n x are equivalent. Try verifying this equivalence for the set x x x x. x, x, x with

Appendix B. Measures of Central Tendency and Dispersion B9 Example Using the Alternative Formula Use the alternative formula for standard deviation to find the standard deviation of the following set of numbers., 6, 6, 7, 7, 8, 8, 8, 9, 0 Begin by finding the mean of the set, which is 7.. So, the standard deviation is 6 7 8 9 0 68 0.0...76 0 7. You can use the one-variable statistics feature of a graphing utility to check this result. Now try Exercise 7. AK. AL 0.0 AR.8 AZ. CA. CO 7.0 CT 6.6 DC. DE.9 FL. GA. HI.0 IA 9. ID. IL 6. IN. KS.7 KY 8.6 LA 0.7 MA 9.0 MD 6. ME.7 MI.9 MN 8.9 MO.9 MS.0 MT.7 NC.8 ND.9 NE. NH.0 NJ.9 NM. NV. NY 6.8 OH 8. OK.8 OR 8. PA. RI. SC 7. SD. TN 0.0 TX. UT.8 VA 0.8 VT. WA 9.7 WI.6 WV.7 WY 0.9 A well-known theorem in statistics, called Chebychev s Theorem, states that at least k of the numbers in a distribution must lie within k standard deviations of the mean. So, at least 7% of the numbers in a data set must lie within two standard deviations of the mean, and at least 88.9% of the numbers must lie within three standard deviations of the mean. For most distributions, these percentages are low. For instance, in all three distributions shown in Example, 00% of the numbers lie within two standard deviations of the mean. Example 6 Describing a Distribution The table at the left shows the number of outpatient visits to hospitals (in millions) in each state and the District of Columbia in 00. Find the mean and standard deviation of the data. What percent of the data values lie within two standard deviations of the mean? (Source: Health Forum) Begin by entering the numbers into a graphing utility. Then use the one-variable statistics feature to obtain x 0.9 and The interval that contains all numbers that lie within two standard deviations of the mean is 0.9.0, 0.9.0 or.89,.7. From the table you can see that all but two of the data values (96%) lie in this interval all but the data values that correspond to the numbers of outpatient visits to hospitals in California and New York. Now try Exercise 6..0.

B0 Appendix B Concepts in Statistics Box-and-Whisker Plots Standard deviation is the measure of dispersion that is associated with the mean. Quartiles measure dispersion associated with the median. Definition of Quartiles Consider an ordered set of numbers whose median is m. The lower quartile is the median of the numbers that occur before m. The upper quartile is the median of the numbers that occur after m. Example 7 Finding Quartiles of a Data Set Find the lower and upper quartiles for the data set.,,, 6,, 8, 0,, 6, 6,, 7 Begin by ordering the data.,,, 6, 6, 8, 0,,, 6, 7, st % nd % rd % th % The median of the entire data set is 9. The median of the six numbers that are less than 9 is. So, the lower quartile is. The median of the six numbers that are greater than 9 is. So, the upper quartile is. Now try Exercise 9(a). Quartiles are represented graphically by a box-and-whisker plot, as shown in Figure B.. In the plot, notice that five numbers are listed: the smallest number, the lower quartile, the median, the upper quartile, and the largest number. Also notice that the numbers are spaced proportionally, as though they were on a real number line. 9 FIGURE B. The next example shows how to find quartiles when the number of elements in a data set is not divisible by.

Appendix B. Measures of Central Tendency and Dispersion B Example 8 Sketching Box-and-Whisker Plots Sketch a box-and-whisker plot for each data set. a. 7, 8, 0,,, 0, 0, 6, 6, 6, 66 b. 8, 8, 8, 8, 87, 89, 90, 9, 9, 9, 96, 98, 99 c.,,,, 7, 8, 0,,, 7 a. This data set has numbers. The median is 0 (the sixth number). The lower quartile is 0 (the median of the first five numbers). The upper quartile is 6 (the median of the last five numbers). See Figure B.6. 7 0 0 6 66 FIGURE B.6 b. This data set has numbers. The median is 90 (the seventh number). The lower quartile is 8 (the median of the first six numbers). The upper quartile is 9. (the median of the last six numbers). See Figure B.7. 8 8 90 9. 99 FIGURE B.7 c. This data set has 0 numbers. The median is 7. (the average of the fifth and sixth numbers). The lower quartile is (the median of the first five numbers). The upper quartile is (the median of the last five numbers). See Figure B.8. 7. 7 FIGURE B.8 Now try Exercise (b). B. Exercises VOCABULARY CHECK: Fill in the blanks.. A single number that is the most representative of a data set is called a of.. The of n numbers is the sum of the numbers divided by n.. If there is an even number of data values in a data set, then the is the average of the two middle numbers.. If two numbers of a data set are tied for the most frequent occurrence, the collection has two and is called.. Two measures of dispersion associated with the mean are called the and the of a data set. 6. measure dispersion associated with the median. 7. You can represent quartiles graphically by creating a.

B Appendix B Concepts in Statistics In Exercises 6, find the mean, median, and mode of the set of measurements..,, 7,, 8, 9, 7. 0, 7,, 9,,,.,, 7,, 8, 9, 7. 0, 7,, 9,,,.,, 7,, 9, 7 6. 0, 7,, 9,, 7. Reasoning Compare your answers for Exercises and with those for Exercises and. Which of the measures of central tendency is sensitive to extreme measurements? Explain your reasoning. 8. Reasoning (a) Add 6 to each measurement in Exercise and calculate the mean, median, and mode of the revised measurements. How are the measures of central tendency changed? (b) If a constant k is added to each measurement in a set of data, how will the measures of central tendency change? 9. Electric Bills A person had the following monthly bills for electricity. What are the mean and median of the collection of bills? January $67.9 February $9.8 March $.00 April $.0 May $7.99 June $6. July $8.76 August $7.98 September $87.8 October $8.8 November $6. December $7.00 0. Car Rental A car rental company kept the following record of the numbers of miles a rental car was driven. What are the mean, median, and mode of the data? Monday 0 Tuesday 60 Wednesday 0 Thursday 0 Friday 60 Saturday 0. Families A study was done on families having six children. The table shows the numbers of families in the study with the indicated numbers of girls. Determine the mean, median, and mode of this set of data. Number of girls 0 6 0 9 7. Sports A baseball fan examined the records of a favorite baseball player s performance during his last 0 games. The numbers of games in which the player had 0,,,, and hits are recorded in the table. Number of hits 0 6 7 (a) Determine the average number of hits per game. (b) Determine the player s batting average if he had 00 at-bats during the 0-game series.. Think About It Construct a collection of numbers that has the following properties. If this is not possible, explain why it is not. Mean 6, median, mode. Think About It Construct a collection of numbers that has the following properties. If this is not possible, explain why it is not. Mean 6, median 6, mode. Test Scores A professor records the following scores for a 00-point exam. 99, 6, 80, 77, 9, 7, 87, 79, 9, 88, 90,, 0, 89,, 00, 98, 8, 78, 9 Which measure of central tendency best describes these test scores? 6. Shoe Sales A salesman sold eight pairs of men s black dress shoes. The sizes of the eight pairs were as follows: 0 8,, 0, 9, and 0, 0,,. Which measure (or measures) of central tendency best describes the typical shoe size for the data? In Exercises 7 and 8, line plots of sets of data are given. Determine the mean and standard deviation of each set. 7. (a) (b) (c) (d) 8. (a) (b) (c) (d) 8 0 6 6 8 0 8 0 6 6 8 0 6 8 6 8 6 8 6 8

Appendix B. Measures of Central Tendency and Dispersion B In Exercises 9 6, find the mean x, variance v, and standard deviation of the data set. 9., 0, 8, 0.,, 6, 9,. 0,,,,,,,,.,,,,,.,,,,, 6, 7.,,,,,. 9, 6, 0, 9,, 70 6.., 0.,., 0.7, 0.8 In Exercises 7, use the alternative formula to find the standard deviation of the data set. 7.,, 6, 6,, 8. 0,, 0, 6,,, 9, 9. 6, 6, 7, 67, 9, 9 0. 6.0, 9.,., 8.7, 0.. 8., 6.9,.7,., 6.. 9.0, 7.,., 7., 6.0. Reasoning Without calculating the standard deviation, explain why the data set,, 0, 0 has a standard deviation of 8.. Reasoning If the standard deviation of a data set of numbers is 0, what does this imply about the set?. Test Scores An instructor adds five points to each student s exam score. Will this change the mean or standard deviation of the exam scores? Explain. 6. Price of Gold The following data represents the average prices of gold (in dollars per fine ounce) for the years 98 to 00. Use a computer or graphing utility to find the mean, variance, and standard deviation of the data. What percent of the data lies within two standard deviations of the mean? (Source: U.S. Bureau of Mines and U.S. Geological Survey) 6, 8, 68, 78, 8, 8, 8, 6,, 6, 8, 86, 89,, 9, 80, 80, 7,, 0 7. Think About It The histograms represent the test scores of two classes of a college course in mathematics. Which histogram has the smaller standard deviation? 6 6 8. Test Scores The scores of a mathematics exam given to 600 science and engineering students at a college had a mean and standard deviation of and 8, respectively. Use Chebychev s Theorem to determine the intervals containing at least and at 8 least 9 of the scores. How would the intervals change if the standard deviation were 6? In Exercises 9, (a) find the lower and upper quartiles of the data and (b) sketch a box-and-whisker plot for the data without the aid of a graphing utility. 9.,,,,,,, 0, 0., 0,,, 7, 6,,, 8,, 0. 6, 8, 8, 0,, 7,, 7, 9,., 0,, 8,, 8,, 9, 7, 9, 8, In Exercises 6, use a graphing utility to create a box-and-whisker plot for the data.. 9,,, 9,,, 7,, 9,, 0, 9. 9,,,, 6,,,, 7, 0, 7,, 8, 9, 9. 0.,.,.9,.9,.,.,.,.,.7, 7.,.8,., 7., 6.,.8 6. 78., 76., 07., 78., 9., 90., 77.8, 7., 97., 7., 8.8, 6.6 7. Product Lifetime A company has redesigned a product in an attempt to increase the lifetime of the product. The two sets of data list the lifetimes (in months) of 0 units with the original design and 0 units with the new design. Create a box-and-whisker plot for each set of data, and then comment on the differences between the plots. Original Design. 78. 6. 68.9 0.6 7...7 7.7 0..0..0 8. 8. 0.8 8. 8. 0.0.6 New Design.8 7..6 9.0. 7. 60.0. 8.9 80. 6.7. 67.9. 99..0...8 87.8 86 90 9 98 Score 8 88 9 96 Score