STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

Similar documents
STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 3 - Displaying and Summarizing Quantitative Data

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 1. Looking at Data-Distribution

Averages and Variation

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

CHAPTER 3: Data Description

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

+ Statistical Methods in

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

15 Wyner Statistics Fall 2013

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

STA 570 Spring Lecture 5 Tuesday, Feb 1

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

UNIT 1A EXPLORING UNIVARIATE DATA

Univariate Statistics Summary

3.3 The Five-Number Summary Boxplots

CHAPTER 2: SAMPLING AND DATA

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Measures of Position

Measures of Dispersion

Chapter 6: DESCRIPTIVE STATISTICS

Measures of Position. 1. Determine which student did better

CHAPTER 2 DESCRIPTIVE STATISTICS

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

AND NUMERICAL SUMMARIES. Chapter 2

Section 6.3: Measures of Position

2.1: Frequency Distributions and Their Graphs

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Measures of Central Tendency

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

Numerical Summaries of Data Section 14.3

Lecture 3: Chapter 3

Chapter 5snow year.notebook March 15, 2018

No. of blue jelly beans No. of bags

LESSON 3: CENTRAL TENDENCY

Basic Statistical Terms and Definitions

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

DAY 52 BOX-AND-WHISKER

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Name Date Types of Graphs and Creating Graphs Notes

NAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Lecture 3 Questions that we should be able to answer by the end of this lecture:

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Lecture Notes 3: Data summarization

Chapter 2 Modeling Distributions of Data

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

SCHOOL OF BUSINESS, ECONOMICS AND MANAGEMENT BBA240 STATISTICS/ QUANTITATIVE METHODS FOR BUSINESS AND ECONOMICS

Box and Whisker Plot Review A Five Number Summary. October 16, Box and Whisker Lesson.notebook. Oct 14 5:21 PM. Oct 14 5:21 PM.

1.2. Pictorial and Tabular Methods in Descriptive Statistics

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

MATH NATION SECTION 9 H.M.H. RESOURCES

Sections 2.3 and 2.4

Understanding and Comparing Distributions. Chapter 4

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

IT 403 Practice Problems (1-2) Answers

How individual data points are positioned within a data set.

Table of Contents (As covered from textbook)

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Numerical Descriptive Measures

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Section 9: One Variable Statistics

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Name Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2.

Frequency Distributions

1. To condense data in a single value. 2. To facilitate comparisons between data.

Chapter 2: Descriptive Statistics

Measures of Central Tendency:

Chapter 3: Describing, Exploring & Comparing Data

Mean,Median, Mode Teacher Twins 2015

IQR = number. summary: largest. = 2. Upper half: Q3 =

Lecture 6: Chapter 6 Summary

Measures of Central Tendency

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

AP Statistics Prerequisite Packet

AP Statistics Summer Assignment:

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

SLStats.notebook. January 12, Statistics:

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter 5: The standard deviation as a ruler and the normal model p131

6-1 THE STANDARD NORMAL DISTRIBUTION

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Transcription:

STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use numbers called descriptive measures to describe data sets. 3.1 Measures of Center Measures of central tendency / measures of center/averages. - where center or most typical value of a data set lies Mean sum of the observations divided by the number of observations. Example:,, 7, 8, 1 Mean = ( + + 7 + 8 + 1)/ = 3/ = 6.8 Median divides the bottom % of the data from the top % Arrange the data in increasing order 1. If the # of observations is odd, the median is the observation exactly in the middle.. If the # of observations is even, the median is the mean of the two middle observations. For n observations, the position of the median is the (n + 1)/ th position in the ordered distribution. Example:,, 7, 8, 1 (in order) Median = 7 Example:,, 7, 8, 1, 1 Median = (7 + 8)/ = 1/ = 7. Mode the value that occurs most frequently in the data set Obtain the frequency of each value 1. If the greatest frequency is 1, then there is no mode.. If the greatest frequency is or greater, then any value with that greatest frequency is the mode of the data set. 1

Example:, 3, 3, 3,,, Mode = 3 Example:, 3, 3, 3,,,, Mode = 3 and Example: 1, 3,,, 7, 9, Mode = none Mode is the only appropriate measure of center for qualitative data. Example: A sample of 1 ASU students had % Democrats, 6% Republicans and 1% Libertarians. Mode = Republicans Mean is affected by extreme observations - nonresistant Median is not sensitive to extreme observations - resistant In a right skewed distribution the mean is pulled more towards the right, no effect on the median (Median<Mean) In a left skewed distribution,the mean is pulled more towards the left, no effect on the median (Median>Mean) In a symmetric distribution, mean and median are at the same position (same value)

Trimmed means cut off a small percentage of the observations on both ends of the distribution and then compute the mean. this makes the mean a more resistant measure since the extreme observations may be deleted from the data set Mean of a data set Data set is population data Data set is sample data Population mean Sample mean This is the same for the median, the mode, or any other descriptive measure. If the data set is population data then the measures are called population measures. If the data set is sample data then the measures are called sample measures. The Sample Mean ( x ) Observations labeled with subscripts, x 1, x, etc Summation notation n i=1 x i = x 1 + x + + x n Σ x = x 1 + x, + Sample mean mean of observations of a sample. x = Σ x i /n Example: 16 1 1 13 118 11 138 131 18 1 119 13 Compute x, Σ x, Σ (x - x ), Σ (x - x ), median, mode 3

x x - x (x- x ) 16-16.7 8.6 11-8.7 76.6 118 -.7.6 119-3.7 1.6 1 -.7 7.6 1 -.7.6 13..6 1 1. 1.6 18. 7.6 13 7..6 131 8. 68.6 138 1. 3.6 173 78. Σ x Σ (x- x ) Σ (x- x ) 1 n 1.7 x 1. median none mode Computing Sample Mean for grouped data: x = Σ (x*f)/n where n=σ f Number of school children=x 1 3 Frequency= f 3 x*f 8 9 16 1 8 x =8/=. If the data is grouped using classes that are not single value, we can estimate the mean by using Midpoints of each class:

Days to maturity Frequency =f Midpoint =x x*f 3< < <6 6<7 7<8 8<9 9<1 3 1 8 1 7 7 3 6 7 8 9 1 6 9 38 7 x =7/=68. (estimate of the sample mean) Using raw data (below) true sample mean is x =731/=68.7, so we did OK 7 6 99 89 87 6 6 38 67 7 6 69 78 39 7 6 71 1 99 68 9 86 7 3 7 81 8 98 1 36 63 66 8 79 83 7 6 3. Measures of Variation; The Sample Standard Deviation Measures of spread (measures of variation) Range = Max. value Min. value - the difference between the largest and smallest observations. Two data sets may have the same mean but different ranges. The Sample Standard Deviation measures variation - tells how far, on average, the observations are from the mean - like the mean, it is not a resistant measure Deviations from the mean: x i x how far each observation is from the mean Sum of squared deviations x i x For a variable x, the standard deviation of the observations for a sample is called a sample standard deviation. It is denoted by s x or, when no confusion will arise simply by s. We have s = x i x n 1 where n is the sample size.

Steps to calculate standard deviation 1. Calculate the sample mean, x. Construct a table to obtain the sum of squared deviations, x i x 3. Substitute into the formula for s above. The more variation there is in a data set, the larger its standard deviation. Computational formula for s is s = x i x i n n 1 Example: (a) Compute standard deviation using Definition: 1 1 11 1 1 1 13 16 1 1 1 16 x x - x (x- x) 1-3 9 1 11-1 1-3 9 1 1 1 13 16 3 9 1-1 1 1 1 1 1-3 9 16 3 9 Σ x = 16 Σ (x- x) = Σ (x- x) = 6 n=1, x=13 S = 6/11=. S=.3 6

(b) Using Computational formula for s: x x 1 1 1 11 11 1 1 1 1 196 13 169 16 6 1 1 1 196 1 1 16 6 Σ x=16 Σ x =88 n=1, S = [88-(16) /1]/(11)=6/11=. and S=.3 If data is grouped, we compute S as follows: s= x f x f /n n 1 where n= f Number of school children=x 1 3 Frequency= f 3 x*f x x *f 8 9 16 1 n= Σ x*f =8 1 9 16 16 7 6 Σx *f = 16 n=, S = [16-(8) /]/19=.6, S=1.7 Three Standard Deviations Rule: Almost all of the observations in any data set lie within three (3) standard deviations to either side of the mean. 7

More Precise Rules for any data set: Chebychev s rule : ~ 89% of the observations in any data set lie within three standard deviations to either side of the mean. Chebychev s rule (more precisely):for any data set and any number k > 1, at least 1(1 1/k )% of the observations lie within k standard deviations to either side of the mean. If the distribution is ~ bell-shaped, the Empirical Rule implies that ~ 99.7% of the observations lie within three standard deviations to either side of the mean. 3.3 The five-number summary; Boxplots Median, Percentiles, Deciles, Quartiles, Interquartile Range are all resistant measures. Percentiles, Deciles, and Quartiles Percentiles divide the distribution into 1 equal parts (P 1, P,,P 99 ) P 1 divides the bottom 1% of the data from the top 99% P divides the bottom % of the data from the top 98% Etc, Thus, the median is the th percentile Deciles divide the distribution into 1 equal parts (D 1, D,, D 9 ) D 1 divides the bottom 1% of the data from the top 9% D divides the bottom % of the data from the top 8% Etc, Thus, the median is D Quartiles divide the distribution into equal parts (Q 1, Q, Q 3 ) Q 1 divides the bottom % of the data from the top 7% Q divides the bottom % of the data from the top % Q 3 divides the bottom 7% of the data from the top % Thus the median is Q To find the Quartiles Arrange the data in increasing order. 1. Q 1 is the median of the data set that lies at or below the median of the entire data set.. Q is the median of the entire data set. 3. Q 3 is the median of the data set that lies at or above the median of the entire data set. 8

Examples: 1. n=7 Data: 3,,, 6, 1, 13, 1 Q 1 =(+)/=. Q = 6 Q 3 = (1+13)/=1.. n=1 Data: 1, 3,,, 6, 1, 13, 1, 1, 18 Q 1 = Q = (6+1)/=9 Q 3 = 1 Interquartile Range (IQR) difference between the first and third quartiles. IQR = Q 3 Q 1 IQR gives the range of the middle % of the observations (approximately) The five-number summary of a data set consists of the minimum, maximum, and the quartiles in increasing order. Min., Q 1, Q, Q 3, Max. Outliers observations well outside of the overall pattern of the data Lower limit = Q 1 1. * IQR Upper limit = Q 3 + 1. * IQR Potential outliers are observations outside of the Lower and Upper Limits. Adjacent values - most extreme obs. that are still lying within the upper and lower limits - most extreme obs. that are not outliers. Boxplot (box-and-whisker diagram) and the modified boxplot To construct a boxplot 1. Determine the number summary (Min, Q 1, Q, Q 3, Max.). Draw a horizontal axis on which the numbers obtained in step 1can be located. Above this axis, mark the quartiles and the minimum and maximum with vertical lines. 3. Connect the quartiles to each other to make a box, and then connect the box to the minimum and maximum with lines. 9

The following is Boxplot for example 1 (top of previous page): 3. 6 1. 1 To construct a modified boxplot 1. Determine the quartiles.. Determine potential outliers and the adjacent values. 3. Draw a horizontal axis on which the numbers obtained in steps 1 and can be located. Above this axis, mark the quartiles and the adjacent values with vertical lines.. Connect the quartiles to each other to make a box, and then connect the box to the adjacent values with lines.. Plot each potential outlier with an asterisk. The two lines stretching out on both sides are the whiskers. 3. Descriptive Measures for Populations; Use of Samples Notation: Size Mean Sample n x Population N µ Population Mean (Mean of a Variable) computed in same manner as for a sample For a variable x, the mean of all possible obs. for the entire population is called the population mean or mean of the variable x. It is denoted by µ x or when no confusion will arise, simply by µ. For a finite population, we have µ = x N where N is the population size. 1

Using a Sample Mean to Estimate a Population Mean Take a sample from the population and compute its mean. This mean is used to estimate the mean of the population (chapter 8) Population Standard Deviation (Standard Deviation of the Variable) For a variable x, the standard deviation of all possible obs. for the entire population is called the population standard deviation or standard deviation of the variable x. It is denoted by σ x or, when no confusion will arise, simply by σ. For a finite population, we have σ = x i N where N is the population size. The population standard deviation can also be obtained from the computing formula σ = x i N Population Variance = σ Using a Sample Standard Deviation to Estimate a Population Standard Deviation: Take a sample. Compute the sample standard deviation, s and then estimate the population standard deviation, σ. Parameter A descriptive measure for a population. Examples: µ, σ Statistic A descriptive measure for a sample. Examples: x, s Standardized Variable For a variable x, the variable z = x μ is called the standardized version of x or the standardized σ variable corresponding to the variable x, z is also called the standard score or z-score µ z = z i N = σ z= z i z N =1 The same rules for the proportions of observations being within 3 standard deviations of the mean apply. 11

Standard score z is negative if x is below the mean, positive if x is above the mean and zero if x is equal to the mean. The value of the z score tells us how many standard deviations above or below the mean is a particular value of x. 1