Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Chpt 3 Data Description 3-2 Measures of Central Tendency 1 /40

Chpt 3 Homework 3-2 Read pages 96-109 p109 Applying the Concepts p110 1, 8, 11, 15, 27, 33 2 /40

Chpt 3 3.2 Objectives l Summarize data using the measures of central tendency, such as the mean, median, mode, and midrange. 3 /40

Data A statistic is a characteristic or measure obtained by using the data values from a sample. A parameter is a characteristic or measure obtained by using the data values from a specific population. 4 /40

Central Tendency Center of data - Often called average, better referred to as measures of central tendency. mean median mode midrange 5 /40

Variance Measures of Variation, the spread of the data (measure of dispersion) Variance Standard Deviation Range IQR 6 /40

Position Measures of position - the relative position of a datum value, relative to other values in the population Percentiles (Obvious) Quartiles (Quarters) Deciles (Class size 10) 7 /40

Remember Measures from a population are called population parameters. Measures found from a sample are called, appropriately, sample statistics. 8 /40

Rounding Rule The general rounding rule is that rounding should not be done until the final answer is calculated. Use of parentheses on calculators help to avoid early rounding error. Let your calculator remember values. If you insist on keying in values that you have calculated in previous steps, use at minimum four (4) decimal places until you round the final value to the appropriate number of decimal places. Except in unusual cases, I expect you give me 4 decimal places in your solutions. Fewer than 4 decimal places risk being marked incorrect. 9 /40

Mean The mean is defined to be the sum of the data values divided by the total number of values. Also known as an arithmetic mean. Find the sum of all data values in the sample and divide by the number of values. Denoted X for a sample X = X n = X 1 + X 2 + X x +...+ X n n Denoted μ for a population. µ = X N The mean, in most cases, is not an actual data value. 10/40

Mean The age (in weeks) of kittens found in an animal shelter: 12, 15, 18, 22, 25, 32, 16, 28, 29, 18, 20 X = 12 + 15 + 18 + 22 + 25 + 32 + 16 + 28 + 29 + 18 + 20 11 = 21.4 Round final solutions to one decimal place more accurate than the original data. (This is one of those exceptions.) 11/40

Mean for Grouped Data If the data is listed in a grouped frequency distribution use the class midpoints to find the mean Caution: The mean cannot be calculated from grouped data with open-ended classes. Multiply each midpoint by the frequency of the class, find the sum and divide by total frequency X = X m i f f 12/40

Example: Mean for Grouped Data We let the class midpoints represent the actual data values. For example: Class Limits f 180-204 4 205-229 5 230-254 4 255-279 1 280-304 3 305-329 3 X m 192 217 242 267 292 317 We could use the class midpoints and list the appropriate number of the data values. 192, 192, 192, 192, 217, 217, 217, 217, 217, 242, 242, 242, 242, 267, 292, 292, 292, 317, 317, 317 Then calculate the mean. But what if we had hundreds of values? 13/40

Example: Mean for Grouped Data We let the class midpoints represent the actual data values and multiply by the frequency of that class. Class Limits f X m f x 180-204 4 192 768 205-229 5 217 1085 230-254 4 242 968 X = = (f i x ) m n 768 + 1085 + 968 + 267 + 876 + 951 20 255-279 1 267 267 280-304 3 292 876 = 4915 20 = 245.8 305-329 3 317 951 14/40

Example Given the table below, find the mean of the distribution. Class f X m f X m X = (f i x m ) n 15.5-20.5 4 18 72 4 i 18 + 5 i 23 + 4 i 28 + 1 i 33 + 3 i 38 20.5-25.5 5 23 115 = 17 25.5-30.5 4 28 112 = 72 + 115 + 112 + 33 + 114 30.5-35.5 1 33 33 17 35.5-40.5 3 38 114 = 446 17 26.23529 15/40

Caution Caution We use the method for finding mean, standard deviation, median, and IQR from a grouped frequency distribution ONLY WHEN WE DO NOT HAVE THE DATA. If we have the data, WE USE THE DATA. Caution Caution 16/40

Median (MD) When a data set is ordered, it is called a data array. We use a data array to find another center, the median. An abbreviation used to denote the median is MD. The median (MD) is the datum in the exact center of the values, An equal number of values fall above and below the median. The median is defined to be the midpoint of the data array. The median is independent of the data values, it is determined solely by the number of data values. To find the median the data must be ordered. 17/40

Median (MD) Arrange the data in ascending order. A good way to order data is to use a stem-and-leaf display. Find the value in the center of the distribution If n is odd, the median will be a datum value If n is even, the median will be the mean of the two central data values The location of the median can be found by: Md = n + 1 2 th score 18/40

Example for Median 24 18 26 32 21 53 19 28 24 Order the 9 values Md = 9 + 1 2 th score = 5th score 4 below 24 and 4 above 24 1 8 9 2 1 4 4 6 8 3 2 4 5 3 The median is 24 19/40

Example - Median The weights (in pounds) of eight army recruits are 180, 201, 220, 191, 213, 219, 209, and 186. Find the median. l Data array: 180, 186, 191, 201, 209, 213, 219, 220. Md = 8 + 1 th score Md = 201 + 209 = 205 2 th score = 4 1 2 2 The median weight of the eight recruits is 205 lbs. 20/40

l Median - Ungrouped For an ungrouped frequency distribution, find the median by examining the cumulative frequencies to locate the middle value. If n is the sample size, compute (n+1)/2. Locate the data point where (n+1)/2 values fall below and (n+1)/2 values fall above. Alonzo Appliance recorded the number of ipods sold per week over a one-year period. The data is given below. x f cf n = 24 25 1 4 4 2 = 12 1 2 2 9 13 12 scores above and below 3 6 19 The 12 1/2th score is the median 4 2 21 5 3 24 The median is 2 21/40

Median - Grouped The median for a grouped frequency distribution can be computed by Md = n + 1 2 cf f mc ( ) w + L mc Do not let the apparent complexity intimidate you. This is not as complicated as it looks. Where n = sum of the frequencies cf = cumulative frequency of the class Md = n + 1 2 th score immediately preceding the median class f mc = frequency of the median class w = width of the median class L mc = lower limit of the median class 22/40

l Median - Grouped In words Md = n + 1 2 cf f mc ( ) w + L mc Median = number of data values in the median class before the median. position of median - cumulative frequency before median class frequency of median class number of data values in the class containing the median. ( width of median class) + Lower Limit of median class Class f 16-20 3 21-25 5 26-30 4 Somewhere in this class lies the median. 31-35 4 36-40 3 23/40

l Example - Grouped Frequency Find the median of the grouped frequency distribution. 19 Class f cf n = 17 2 = 9.5 9 scores above and 9 below So, the 9.5th score is the median 16-20 3 3 Unfortunately we do not have that score. 21-25 5 8 We need 1.5 scores from the 5 in the third 26-30 4 12 class. 31-35 4 36-40 3 16 19 Md = n + 1 2 cf f ( ) w + L m = 9.5 8 4 ( ) 5 + 26 27.875 The median is 27.9 24/40

l Example - Grouped Frequency The possible scores in the class 26-30 are 26, 27, 28, 29, 30. There are 4 scores somewhere in that range. Class f cf We will assume those scores are evenly spaced. 16-20 3 3 To get to the ninth score we need 21-25 5 8 26 27 one of those 4. So we move 1.5/4 of the distance from 26 to 30. 26-30 4 12 28 31-35 4 16 29 = 9.5 8 4 ( ) 5 + 26 27.875 36-40 3 19 30 The median is 27.875 or 27.9. 25/40

Mode The mode is usually defined as the datum appearing most frequently. We will loosen that definition to include a range of values where the frequency bumps up (humps in the data). If all frequencies are essentially the same (uniform distribution), there is no mode One mode = unimodal Two modes = bimodal Several modes = multimodal 27/40

Example In the data array {24 18 26 32 21 53 19 28 24} the mode is 24. 1 8 9 2 1 4 4 6 8 3 2 4 5 3 28/40

Mode - Grouped Frequency Distribution The mode for grouped data is the modal class. The modal class is the class with the largest frequency. Sometimes the midpoint of the class is used rather than the boundaries. Class f 15.5-20.5 3 20.5-25.5 5 Modal Class 25.5-30.5 7 The mode is 28 30.5-35.5 3 35.5-40.5 2 29/40

Midrange (MR) MR = Highest Value + Lowest Value 2 Rarely used since it is rarely reliable as it is affected by outliers, extremely high or low values Simply the mean of the highest and lowest data values 30/40

Weighted Mean The weighted mean is used when the values in a data set are not all equally represented. The weighted mean of a variable X is found by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights. Identical to the mean from grouped frequency distribution, replacing frequency with weight. X m i w X = x 1 w 1 + x 2 w 2 +...+ x n w n X = w + w +...+ w w 1 2 n 31 /40

Example - Weighted Mean We plan to buy several pizzas for rewarding perfect attendance. We buy 3 small, 4 medium, and 2 large pizzas. Small = $8, Medium = $10, Large = $12 To find the mean price of a pizza we cannot simply add the 3 prices and divide by 3. We must multiply by the weight of each size. So 32/40

Example - Weighted Mean X = p m w i w Replace frequency with weight and find the weighted mean. Each pizza is weighted by how many were purchased. = 8 i 3 + 10 i 4 + 12 i 2 9 = $9.78 33/40

Weighted Mean Should we have a grouped frequency distribution, we do not have the actual data values, so we use the midpoint. Exactly as before: 54 + 115 + 112 + 132 + 114 X = = 27.7368 27.7 Class f X mp f X mp 19 16-20 3 18 54 How about we use the calculator? 21-25 5 23 115 Enter the X mp in L 1 and freq in L 2. 26-30 4 28 112 Find the mean 31-35 4 33 132 36-40 3 38 114 STAT CALC 1:1-Var Stats 34/40

Mean Mean uses all data values The mean will vary less than the median from same size sample to sample taken from a given population. The mean will not often be an actual datum value High or low values (outliers) will affect the mean since it uses the actual data values. We cannot use mean if our frequency distribution has open-ended classes. 35/40

Median Finds the center of the data spread Does not use the actual data values Thus, not affected by outliers Used when data has open-ended class Best used when the center, or mid value is needed Also best when data set has extreme values (outliers). 36/40

Mode Used for the most frequently occurring value is needed Used for nominal data Is not necessarily a unique value, there can be several modes 37/40

Effects on Measures Skewed data affects the mean. If the data is positively skewed, the mean will most likely be greater than the median, and greater than a majority of values. The mode will also probably be greater than the median positively skewed - mean > median Md Mean 38/40

Effects on Measures If the data is negatively skewed, the mean is less than the median, the mode is probably less than the median, and the mean is less than the majority of the data values negatively skewed - mean < median Mean Md 39/40

Unimodal and Symmetric If the mean, median, and mode are all equal, the distribution is unimodal and symmetric, or approximately normal. Obviously since the mean = median the mean is in the center of the distribution Md Mean 40/40

TI-84 You already know how to find the mean, and median on the TI-84 Simply put the data into a list and ask for the stats. STAT CAL 1: 1Var Stats List: L1 FreqList: Calculate ENTER x= x= x 2 = Sx= σx= n= minx= Q1= Med= Q3= maxx= 41/40

TI-84 To find the statistics for a grouped frequency distribution we must add a frequency list. Enter the data from the table into two lists L 1 and L 2 List: L1 2nd 1 x f STAT CALC 1: 1Var Stats FreqList: L2 Calculate 2nd 2 ENTER 1 4 2 9 3 6 4 2 5 3 x= 2.625 x=63 x 2 =201 Sx=1.244553351 σx=1.218349293 n=24 minx=1 Q1=2 Med=2 Q3=3 maxx=5 42/40

Handedness Please indicate which hand you use for each of the following activities by putting a + in the appropriate column, or ++ if you would never use the other hand for that activity. If in any case you are really indifferent, put + in both columns. Some of the activities require both hands. In these cases the part of the task, or object, for which hand preference is significant is indicated in parentheses. Thus you will mark a + in one column (preferred hand but occasionally use other hand), a ++ in one column (never use the other hand), or a + in both columns (use both hands approximately the same for the task. Total Right Left: Right + Left: Create a Left and a Right score by counting the total number of + signs in each column. Your handedness score is Right Left : thus, a Right + Left pure right-hander will have a score = 1, and a pure Task Left Right Writing Drawing Throwing Scissors Toothbrush Knife (without fork) Spoon Broom (upper hand) Striking match (hand holding the match) Opening box (hand holding lid) Total left-hander will score = 1. 43/40

Handedness Record your results on the board. Using the class data find the mean and median handedness values. Next, create a grouped frequency distribution (by 10ths) and draw a histogram of the class data. Finally find the mean and median from the grouped frequency distribution. How do the results compare? 44/40

Handedness Handedness 9.00 Mean =.56 6.75 Median =.67 4.50 Mode =.60-.89 2.25-1 0.00-1 -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 45/40