Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of values in the data set. The mean of a sample data set is denoted by X and the mean of a population data set by the Greek letter. X = n x = Example : Find the mean of the following data set: Quiz Scores:, 5, 7, 7, 6, 8, 0, 9, 5, 0, 8 x X x 76 = = = 6.9 n When calculating the mean from a frequency distribution, this becomes x f mean = X = = n Mean for Grouped Data The mean for grouped data is calculated by multiplying the frequencies and midpoints of the classes. X = f X m n xf f
Page of 6 Example : Miles Run Below is a frequency distribution of miles run per week. Find the mean. Class Boundaries Frequencies 5.5-0.5 0.5-5.5 5.5-0.5 3 0.5-5.5 5 5.5-30.5 4 30.5-35.5 3 35.5-40.5 f = 0 Solution Class Boundaries Frequencies Midpoint, X m f X m 5.5-0.5 8 8 0.5-5.5 3 6 5.5-0.5 3 8 54 0.5-5.5 5 3 5 5.5-30.5 4 8 30.5-35.5 3 33 99 35.5-40.5 38 76 f = 0 m 490 X f X m 490 = = = n 0 4.5 miles Weighted Mean Sometimes, you must find the mean of a data set in which not all values are equally represented. For such cases we compute the Weighted Mean we multiply each value by its corresponding weight and divide the sum of the products by the sum of the weights.
Page 3 of 6 x = x w x + w x + w3 x3 +... + wnx w + w + w +... + w = wx w 3 n n where w, w,..., w n are the weights and x, x,..., xn are the values. Examples: Grade point average. We assign the letter grades the number values A=4, B=3, C=, D=, F=0, and then each grade value is counted into the GPA according to the number of credits earned with that grade. Course grade. Suppose the final grade in a course is calculated according to the following scale: Homework counts for 5%, 3 exams count 0% each, and the final exam is worth 5%. We can weight the score for each component of the final grade with its percentage to calculate the final grade. Properties of Mean.. The algebraic sum of of the deviations of a set of numbers from their arithmetic mean is zero.. If is the mean of a set x,, x n of n numbers and is the mean of another set y,, y m of m numbers, then x c, the mean of the combined set is given by: x y x c = nx + my n+ m The Median The value of the middle term when all values are arranged in ascending or descending order. It is the value which separates the largest 50% of data values from the lowest 50%. In a histogram, half of the area is on either side of the median.
Page 4 of 6 If the number of values,, is odd, the middle value is the median. If is even, the mean of the two middle values is the median. Example 3: The following data set represents the quiz scores of a group of students. Quiz Scores:, 5, 7, 7, 6, 8, 0, 9, 5, 0, 8 Find the median value for the set of quiz scores. Find the median if the low score of is dropped. Example 4: Find the median of the following set of data. Marks 4 5 6 7 8 9 0 Frequency 3 0 8 3 Median with Grouped Data Since the median divides the frequency histogram into two equal areas, this fact gives us a method for determining the median. The median can also be estimated using the following formula: median n = l + ( fm ) f m c l = lower class boundary of the median group n = total frequency f = cumulative frequency of group before median group m c = class width of median group f = frequency of the median group m Example 5: The temperature of a component was monitored at regular intervals on 80 occasions. The frequency distribution was as follows:
Temperature x ( C ) 30.0-30. 30.3-30.5 30.6-30.8 30.9-3. 3.-3.4 Frequency f 6 5 0 3 Temperature x ( ) 3.5-3.7 3.8-3.0 Frequency f 9 5 Find the median using: a) the histogram method b) the formula C Page 5 of 6 The Mode The Mode of a data set is the value of the variable that occurs most often. A data set can also have more than one mode or no mode at all. Example 6: The set.3.4.8.3 4.5 3. has.3 as the mode. It is unimodal. Example 7: The set 3 4 7 8 3 has no mode. Example 8: The set 3 5 5 5 7 7 8 8 8 has two modes, 5 and 8. It is bimodal. For grouped data, the mode is computed as follows: First the modal group is identified. Let l = lower class boundary of the modal group c = class width f = m frequency of modal group fm f m + = fm fm = fm f m + = frequency of group preceding modal group = frequency of group after modal group mode c = l + + Mode can also be found using a histogram. Once the modal class has been identified, the value of the mode itself lies within that range and can be found by a simple construction.
Page 6 of 6 Example 9: the masses of 50 castings gave the following frequency distribution. Mass (kg) 0-3-5 6-8 9- -4 5-7 8-30 Frequency f 3 7 6 0 8 5 x If we draw the histogram, using central values as the midpoints of the bases of the rectangles, we obtain The modal class is the third class with boundaries 5.5 and8.5 kg. The two diagonal lines AD and BC are drawn as shown. The x value of their point of intersection is taken as the mode of the set of observations. For this case the mode = 7.3 Exercise: Find the mode of the frequency distribution above using the formula. ote: The mode is the only measure of central tendency that can be used in finding the most typical case when the data is categorical. Mode is not a very good measure of center as it is not based on all observations.
Page 7 of 6 Properties of Mean, Median, and Mode Mean is the most commonly used measure of central tendency. One drawback of the mean is that it is heavily influenced by a few very high or very low data values (extremes or outliers). In these cases it is more common to use the median e.g. household income in Kenya. The mode has the advantage that it can be used to measure data sets even if they contain only qualitative data. A disadvantage is that a data set may not have a mode. Of the three measures of center, only the mean is based on all observations. Shapes of Data Distributions. Symmetric The data distribution is approximately the same shape on either side of a central dividing line. The mean and median (and mode if unimodal) are equal in a symmetric distribution. 4 0 8 6 4 0 3 4 5 6 7 8 9 Examples: Men s Heights, SAT Math scores
Page 8 of 6. Left-Skewed A few data values are much lower than the majority of values in the set. (Tail extends to the left) Generally the mean is less than the median (and mode) in a left-skewed distribution. 4 0 8 6 4 0 3 4 5 6 7 8 9 Example: Exam scores with a few students doing poorly
Page 9 of 6 3. Right-Skewed A few data values are much higher than the majority of values in the set. (Tail extends to the right) Generally the mean is greater than the median (and mode) in a right-skewed distribution. 0 8 6 4 0 3 4 5 6 7 8 9 Examples: Personal Income in Kenya, Men s weights Question: Homes in a certain area have a mean price of Kshs 0 million but a median price of Kshs.5 million. How can you explain this best?
Page 0 of 6 Measures of Position Fractiles divide a data set into consecutive intervals so that each interval has (at least approximately) the same number of data values. The most common fractiles are: Quartiles divide a data set into fourths. For example, the lower quartile, is found a quarter-way when observations are arranged in ascending order, while the upper quartile, is found three-quarter way. Q 3 Example : The set 6 0 : 30 4 : 48 50 : 56 6 has Q 0 + 30 = 5 as Q And 50 + 56 = 53 as Q 3 For grouped data, the values of Q and Q 3 are computed using the following formulae: Q = l + Q n 4 f Q f Q c Q 3 3n 4 = l + Q 3 f Q 3 f Q The symbols in these two equations have the same meanings as the median formula. 3 c Example : Find the values of Q and 3 Q of the following hypothetical data. Class 0-0 0-30 30-40 40-50 50-60 60-70 70-80 Frequency 0 5 36 4 9 5
Page of 6 Percentiles divide an ordered data set into 00 equal parts. For example, the 36 th percentile is the value which separates the lowest 36% of data values from the highest 64% of data values and is denoted by P36. A percentile rank for a datum represents the percentage of data values below the datum. ( X ) # of values below + 0.5 Percentile = 00% total # of values Deciles divide a data set into 0 equal parts. For example, the 7 th decile is the value which separates the lowest 7/0 of ordered data values from the highest 3/0 of data values and is denoted D7. ote: There are 99 percentiles P-P99, 3 quartiles Q-Q3, and 9 deciles D-D9. P50 = Q = D5 = Median
Page of 6 Measures of Dispersion (Spread) The mean, median and mode give important information regarding the general mass of the data, however they do not tell us anything about how spread out the observations are from the central values. The set 6, 7, 8, 9, 30 has a mean of 8 And 5, 9, 0, 36, 60 also has a mean of 8 These two sets have the same mean but clearly the first is more tightly arranged around the mean than the second. We therefore need a measure to indicate the spread of the values about the mean. Common Measures of Spread. Range the difference between the largest and smallest data values in a data set. range = ( highest value lowest value ) For a grouped frequency distribution, range is the difference between lower limit of lowest class and upper limit of highest class. Range deals only with the extreme values which may be outliers, it does not take care of the intermediate values and is therefore considered the poorest measure of dispersion.. Quartile Deviation Let and is called the Interquartile Range. Q Q 3 be the lower and upper quartiles. The difference 3 Q Q Half the interquartile range, denoted by Q, is the quartile deviation i.e. Q = Q Q ( ) 3 Quartile deviation deals only with the middle 50 percent of the data and ignores the rest. It is therefore not a very good measure of spread though better than the range.
Page 3 of 6 3. Standard Deviation The most commonly used measure of dispersion. It takes into account the deviation of every data value from the mean. Standard deviation is the root mean square (r.m.s.) of deviations from the mean and is calculated as follows:. Calculate the mean of the data set.. Subtract the mean from each data value in the set. These values are called the deviations of the data values. 3. Square each of the deviations calculated in Step. 4. Take the mean of the squares calculated from Step 3. 5. Take the square root of the result of Step 4. Example : Find the standard deviation of the data set of quiz scores: Quiz Scores:, 5, 7, 7, 6, 8, 0, 9, 5, 0, 8 Definition: Standard Deviation Let x, x, x3,..., x be observations with arithmetic mean x, then the standard deviation, S (or ) is S = ( X ) i X i= If x, x, x3,..., x occur with respective frequencies f, f, f3... f, then S = ( ) i i= X X f i where = f i= i
For a grouped frequency distribution, formula. x i Page 4 of 6 represent class midpoints (class-marks) in the above Using the above formula especially when large sets of data are involved can be quite tedious. An equivalent but simpler formula is: S Xi fi = i= ( X ) Example : Determine the standard deviation of the classified data below: Class -5 6-0 -5 6-30 3-35 36-40 Frequency 7 5 4 6 ote: If there are several sets of data of the same sizes but with different standard deviations, then the set with the least standard deviations is said to have its observations most closely clustered around their arithmetic mean. This set of data has the lowest variability and is therefore most consistent. Such a set of data is usually recommended for further analysis. 4. Variance the square of the standard deviation, represented by Exercises:. Find the standard deviation of the data set whose frequency distribution is given by: Class Frequency ( f ) 90-99 4 80-89 6 70-79 4 60-69 3 50-59 40-49 S.
Page 5 of 6. The lengths of 70 bars were measured and the following frequency distribution obtained: Length x (mm).-.4.5-.7.8-.0.-.3 Frequency f 3 5 0 6 Length x (mm).4-.6.7-.9 3.0-3. Frequency f 8 6 Find the mean and standard deviation of the data. 3. A set of 0 observations was found to have mean verification revealed that two observations 30 and 45 were wrong while the correct observations were 54 and 4. X = 40 and S = 5. Subsequent Determine the correct values of the mean and standard deviation if a) The wrong values were discarded and not replaced b) The wrong values were replaced with correct ones. 4. The mean height of students in a class is 5 cm. The mean height of the boys is 58 cm. The mean height of the girls is 48 cm. Determine the percentage of boys in the class. 5. Find the variance of the following data: Length x (cm) Frequency f 8-6 3 7-35 5 36-44 9 45-53 54-6 5 63-7 4 7-80
Page 6 of 6