Teaching univariate measures of location-using loss functions

Size: px

Start display at page:

Download "Teaching univariate measures of location-using loss functions"

Sybil Logan
5 years ago
Views:

1 Original Article Teaching univariate measures of location-using loss functions Jon-Paul Paolino Department of Mathematics and Computer Sciences, Mercy College, Dobbs Ferry, 10522, NY USA Summary Keywords: This article presents a new method for introductory teaching of the sample mean, median and mode(s) from a univariate dataset. These basic statistical concepts are taught at various levels of education from elementary school curriculums to courses at the tertiary level. These descriptive measures of location can be taught as optimized solutions to certain loss functions. Although proving these require some understanding of derivatives as used in a first year calculus course, the attained insight is valuable for higher level statistical thinking. Using the statistical computing software R, we visually illustrate the minimization of these loss functions using some example datasets. teaching statistics; mean; median; mode; loss function. BACKGROUND AND MOTIVATION The sample mean, median and mode(s) are descriptive measures of location that are introduced to students in elementary school and are also taught at the tertiary level across the world. Indeed, many tertiary level students need at least one introductory course in statistics to complete an undergraduate degree. At the elementary school level, the prerequisite to successfully calculate the sample mean, median and mode(s) is a basic working knowledge of counting and arithmetic. In the introductory tertiary level statistics course, this skill set is still assumed, as these skills are considered necessary for tertiary school admittance. Some courses, however, offer introductory statistics classes that take differential and integral calculus as a prerequisite to enrollment. These courses teach topics such as probability as area under a curve and simple linear regression using least squares minimization. The new teaching method proposed in this article would be the most suitable for students in tertiary learning environments, as it assumes a higher level of mathematical maturity. This article also demonstrates the application of this method by using different sequences of univariate data with varying distribution characteristics. Using R (R Core Team 2017), we show how these three measures of location optimize a particular loss function using a graphical representation. REVIEW OF THE MEASURES OF LOCATION FOR A SAMPLE Sample mean The sample mean is a descriptive measure that is calculated by adding all of the observed values and then dividing by the total number of observations. It is also commonly known as the average. The mean is also called the location of balance because the sum of the deviations above the mean will equal the sum of the deviations below the mean. It can be calculated by using the formula below, where x i represents the ith observed data point from a sample size of n. x ¼ n x i n Sample median ¼ x 1 þ x 2 þ þ x n : n The sample median is a descriptive measure that separates the bottom 50% of the data from the top 50% of the data. The median is also referred to as the 50th percentile or the second quartile. Typically, it is taught that the data must be arranged in increasing order where x (i) indicates the ith ordered data value. The formula below can be used to determine the location of the median: 16

2 Teaching univariate measures 17 8 x n þ 1 >< 2 x ¼ 0 1 : 2 x n þ x n A >: 2 2 þ 1 The sample size n determines the location of the median in the dataset. If n is odd, then the top formula must be used. If n is even, then the bottom formula must be used. Calculating the median is certainly not as onerous as calculating the mean. Mean as the minimizer of the squared loss deviations The sample mean can be shown to be the point that minimizes the squared loss function. This point of minimization can be found using either the first derivative from calculus and setting the first derivative equal to zero, or by the vertex formula for parabolas. The squared loss function is shown below: fðcþ ¼ n ðx i cþ 2 : Sample mode(s) A sample mode is a descriptive measure, defined as one or more values that occur most frequently in the dataset. A dataset can have multiple modes, since there may be ties for the most frequently occurring data values. A mode can be used for qualitative or quantitative variables unlike the mean and median, which can only be used for quantitative variables. Unless at least one data value occurs more than once, there is no mode. It must be remembered that if quantitative data are grouped, as in a histogram or stem-and-leaf presentation, the appearance of the graph depends on the grouping choice, and hence the most commonly occurring group is not uniquely defined. In this article, we use the symbol x _ (x with an arc on the top) to indicate a sample mode. The calculation of sample mode(s) is rudimentary. MEASURES OF LOCATION EXPRESSED AS LOSS FUNCTIONS Mean as the balancing point of a distribution As mentioned previously, the sample mean is the balancing point of the distribution. The function below can be used to illustrate this point of balance (i.e. where the deviation above equals the deviation below): fðcþ ¼ n ðx i cþ: Subsequently, f(c) must be set equal to zero, and then the value of c can be solved. By using basic knowledge of summation properties, it can be derived that the sample mean is the point that makes the deviation scores sum to zero. Median as the minimizer of the absolute loss deviations Next, the sample median minimizes the absolute loss function. This property is more difficult to show because it involves taking the derivative of an absolute value function. Consider the absolute loss function below, fðcþ ¼ n jx i cj: Taking the first derivative with respect to c gives sgn(x i c). The next step is to find the value of c which makes the summation equal to zero. Upon inspection of x i c, it becomes clear that c should equal the sample median. When c ¼ ex, sgn(x i c) will equal 1 the same number of times as it will equal 1, this is because the median separates the bottom 50% of the data from the top 50% of the data. It is also noteworthy that when the sample size is even, there may be multiple points around the median that minimize the sum of absolute deviations function. This occurs because the derivative the absolute loss function would not achieve an absolute minimum. Table 1. Sequence datasets with descriptive statistics used for demonstration Dataset (#) Sample Sample Sample _ mean, x median, ex mode, x 1 {2,3,3,4,4,4,5,5,5,5,6,6,6,7,7,8} {3,3,4,4,4,4,4,5,5,5,6,6,6,6,6,7,7} and 6 3 {1,1,1,1,2,2,2,3,3,4,4,4,5,5,5,5} and 5 4 {0,1,2,4,5,8,8,8,9,9,9,9,9,10,10,11} {0,1,1,2,2,2,2,2,3,3,3,6,7,9,10,11} {0,0,0,0,0,0,0,1,1,1,1,1,2,2,2,3,3} {0,0,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3} {0,1,2,3,4,5,6,7,8,9,10} 5 5 No Mode 9 {0,1,2,3,4,5,6,7,8,9} No Mode

3 18 Jon-Paul Paolino Mode as the minimizer of the zero one loss indicator function Next, we explain how a sample mode minimizes the zero one loss indicator function. However, there are some drawbacks when explaining the loss function that it optimizes. Consider the zero one loss indicator function below, where fðcþ ¼ n I xi c; ( I xi ¼ 1ifx i ¹ c : 0ifx i ¼ c The main shortcoming is that the indicator function, as shown above, is not continuous, and therefore, it is not differentiable. Now, the minimum value cannot be found by simply differentiating the function and then setting it equal to zero. In order for f(c) to be as small as possible, I xi c must be zero as many times as possible, and this can be achieved by letting c ¼ x. _ Choosing the data value that occurs most frequently will yield the greatest number of observed zeros, and therefore, minimize f(c). ILLUSTRATIVE EXAMPLES Using some example datasets, we visually explain the loss function minimization principles using R software. These datasets are presented in Table 1 below. The first three datasets (Dataset 1, Dataset 2 and Dataset 3) are symmetric distributions. However, these datasets differ in that Dataset 1 is unimodal, Dataset 2 is bimodal, and Dataset 3 is bimodal with the modes at the extreme ends of the distribution. The next two datasets (Dataset 4 and Dataset 5) are distributions that are negatively skewed and positively skewed, respectively. Dataset 6 is an exponential distribution, and Dataset 7 is its mirror image, sometimes referred to as a J- Shaped distribution. Finally, Dataset 8 and Dataset 9 are both uniform distributions, with an odd and even number of data points respectively. Nine figures are shown below, one for each dataset. We used R to generate the nine figures and the graphs within each figure. We created one figure for each dataset, and within each figure, there are three separate panels. The left panel shows the squared loss function; the centre panel shows the absolute loss function, and the right panel shows the zero one loss function. Each graph shows the location where the loss function attains the minimal value, all of which correspond with the values shown in Table 1. Fig. 1. Graphical displays of loss functions for Dataset 1. Left panel: Squared Loss. Center panel: Absolute Loss.

4 Teaching univariate measures 19 Fig. 2. Graphical displays of loss functions for Dataset 2. Left panel: Squared Loss. Center panel: Absolute Loss. Considering Dataset 1, the mean, median and mode of the dataset all equal 5, which correspond to the minimums of the loss functions shown in Figure 1. Considering Dataset 5, the mean, median and mode of the dataset is 3, 2.5 and 2, respectively (Figures 2 4), which correspond to the minimums of the loss functions shown in Figure 5. Fig. 3. Graphical displays of loss functions for Dataset 3. Left panel: Squared Loss. Center panel: Absolute Loss.

5 20 Jon-Paul Paolino Fig. 4. Graphical displays of loss functions for Dataset 4. Left panel: Squared Loss. Center panel: Absolute Loss. Fig. 5. Graphical displays of loss functions for Dataset 5. Left panel: Squared Loss. Center panel: Absolute Loss.

6 Teaching univariate measures 21 Fig. 6. Graphical displays of loss functions for Dataset 6. Left panel: Squared Loss. Center panel: Absolute Loss. DISCUSSION In this article, we discuss a different approach to teach the mean, median and mode from a sample of data using loss function minimization. We explain how the sample mean and the sample median can be derived using differential calculus techniques, which are taught in the first year calculus course. The graphs can also be used to demonstrate how the sample mode(s) should not be considered a measure of centre. In addition, the sample mode(s) can be poorly defined Fig. 7. Graphical displays of loss functions for Dataset 7. Left panel: Squared Loss. Center panel: Absolute Loss.

7 22 Jon-Paul Paolino Fig. 8. Graphical displays of loss functions for Dataset 8. Left panel: Squared Loss. Center panel: Absolute Loss. if other graphs such as histograms are used to present the data. We use the software R, to visually explain these properties by using graphical methods. Technology can often be a beneficial pedagogical resource in a statistics class, and its efficacy is shown in this article (Figures 6 9). The classical approaches to teaching measures of location, similar to those seen in an elementary statistics class, show how to compute the mean, median and mode(s) by hand, but these approaches typically do not include a demonstration, such as the examples given in this article. Fig. 9. Graphical displays of loss functions for Dataset 9. Left panel: Squared Loss. Center panel: Absolute Loss.

8 Teaching univariate measures 23 Using the loss function optimization methods discussed in this article offers additional insight when attempting to teach the properties of the mean, median and mode. Finally, we hope that understanding these principles will help develop higher order statistical thinking skills that may become useful in future studies. Reference R Development Core Team. (2017), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, Available at: org/ Appendix A: R Code Used for Producing Figures The following R code can be used to produce the figures displayed in found in this article. We provide the code for Figure 1. option < par(mfrow = c(1,3). mean.1 = curve((3-x)^2 + (4-x)^2 + (4-x) ^2 + (5-x)^2 + (5-x)^2 + (5-x)^2 + (6-x) ^2 + (6-x)^2 + _ (7-x)^2, 0, 10, n = 101, add = FALSE, type = l, xlab = c, ylab = Squared Loss, cex.lab = 1.5, cex. axis = 1.5,_cex.sub = 1.5). median.1 = curve(abs(3-x) + abs(4-x) + abs(4- x) + abs(5-x) + abs(5-x) + abs(5-x) + abs(6- x) + abs(6-x) + abs(7-x), 0, 10, n = 101, add = FALSE, type = l, xlab = c, ylab = Absolute Loss, cex.lab = 1.5, cex.axis = 1.5, cex. sub = 1.5). x < seq(0, 10, 0.01). mode.1 < (x > =0&x< 3) * 9 + (x == 3) * 8+(x> 3& x < 4)*9+(x==4)*7+(x> 4& x < 5) * 9 + (x == 5)* 6 + (x > 5& x < 6) * 9 + (x == 6) * 7 + (x > 6&x< 7) * 9 + (x== 7) * 8 + (x > 7&x< =10) * 9. plot(x, mode.1, type = l, cex = 1.5,,xlab = c, ylab = 0 1 Loss, cex.lab = 1.5, cex.axis = 1.5, cex.sub = 1.5).

LESSON 3: CENTRAL TENDENCY

LESSON 3: CENTRAL TENDENCY Outline Arithmetic mean, median and mode Ungrouped data Grouped data Percentiles, fractiles, and quartiles Ungrouped data Grouped data 1 MEAN Mean is defined as follows: Sum