Descriptive Statistics (Part 2) Lecture 4 Justin Kern January 31, 2018
Mean I terpretatio : What ca e say about a data set based o the mea? Examp e: Fa 2016, the average fi a grade for this course as 83. Are fo o i g c aims defi ite y true? I Fa 2016, some stude ts received greater tha 72 YES I Fa 2016, a stude ts received at east 83 NO I Fa 2016, some stude ts received 83 or higher YES This semester, some stude ts i receive at east 83 NO I Fa 2016, at east o e stude t received exact y 83 NO I Fa 2016, the umber of stude ts ho received 83 or higher as equa to the umber of stude ts ho received 83 or o er NO
Comparisons of Mean, Median & Mode Measures of Central Tendency Mean Only one possible value for a data set Very sensitive to extreme values or any change in data Median Only one possible value for a data set Not sensitive to extreme values in data or replacement of values at extremes More sensitive to removal or addition of values in data set Mode None, one, or more possible values for data set Not sensitive to extreme values Not sensitive to replacement of values at extremes (unless mode is an extreme value) Slightly sensitive to removal or addition of values in data set
Comparisons of Mean, Median & Mode Sensitivity to changes in data 57, 63, 75, 75, 75, 75, 92 Mean = 73.1, Median = 75, Mode = 75 57, 63, 75, 75, 75, 75, 96 Mean = 73.7, Median = 75, Mode = 75 57, 63, 75, 75, 75, 75, 96 Mean = 70, Median = 75, Mode = 75
Comparisons of Mean, Median & Mode Sensitivity to changes in data 57, 63, 63, 63, 75, 75, 92 Mean = 69.7, Median = 63, Mode = 63 47, 63, 63, 63, 75, 75, 92 Mean = 68.3, Median = 63, Mode = 63 47, 63, 63, 63, 75, 75, 92 Mean = 71.8, Median = 69, Mode = 63
Comparisons of Mean, Median & Mode Distributions Symmetrica Mea = Media Positive y Ske ed Mea > Media Negative y Ske ed Mea < Media
Which measure to use? Depe ds o the situatio a d hat you are i terested i Mea is the most ide y used statistic Fami iar to most peop e Has the most desirab e statistica properties Ca be mis eadi g (very se sitive to extreme va ues or out iers) Examp e: thi k about mea househo d i come. Media is more i dicative of typica (average?) America fami y due to a fe extreme y ea thy househo ds
Measures of Spread Whe describi g a distributio of scores, a measure that describes the ce ter of the distributio (or most commo va ue or va ues) is key. It is a so importa t to k o ho the va ues i the distributio are dispersed arou d the ce ter. Commo terms: Measures of Dispersio Measures of Spread Variabi ity
The Range Ra ge: The ra ge is the differe ce bet ee the argest va ue (L) a d the sma est va ue (S) is a data set. Ra ge = L S Examp e: Suppose e have five scores: 1, 2, 3, 4, 5 What is the ra ge? 5 1 = 4 This is a very simp e measure of spread that o y accou ts for t o va ues i the dataset. It is ot very usefu for data ith out iers. Examp e: 1, 2, 3, 4, 100 The ra ge here is 100 1 = 99.
Percentiles (and Quartiles) Percentile: The value below which a given percentage of observations in a group of observations fall. They divide the data into 100 equal parts. Example: the 20 th percentile is the value below which 20 percent of the observations may be found. Quartile: A measure that divides the data into four equal parts. Special Percentiles: Minimum = 0 th percentile Lower Quartile = 25 th percentile (Q 1 ) Median = 50 th percentile (Q 2 ) Upper Quartile = 75 th percentile (Q 3 ) Maximum = 100 th percentile
Quartiles (Five-number Summary) Minimum: Smallest value of data set Lower Quartile (Q 1 ): Median of lower half of data set (left of median) Median (Q 2 ): Middle value of data set Upper Quartile (Q 3 ): Median of upper half of data set (right of median) Maximum: Largest value of data set
Interquartile Range I terquarti e ra ge: It is the differe ce bet ee the upper a d o er quarti es. IQR = Q 3 Q 1 It gives a measure of the spread ithi the midd e 50% of a distributio. Its correspo di g measure of ce tra te de cy is the media. Because the IQR exc udes the top a d bottom 25% of scores, out iers have itt e i f ue ce o it. O the other ha d, its ca cu atio imp icit y o y uses 50% of a va ues. The other ha f are eft out.