Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the highest score in the class? What if it is below the average of the entire class? Maybe the teacher might curve the grade. We will focus on the act of describing the location of an individual within a distribution. Let s consider the class scores below: 79 81 80 77 73 83 74 93 78 80 75 67 73 77 83 86 90 79 85 83 89 84 82 77 72 Here is a stemplot with the data. Notice that the distribution is roughly symmetric with no apparent outliers. Where does your score in comparison to everyone else? Measuring Position: Percentiles 9 03 One way to describe your position is to tell what percent of students in the class earned scores that were below yours. That is, we can calculate the percentile. Definition: Percentile The pth percentile of a distribution is the value with p percent of the observations less than it. Here, our score is the fourth from the top of the class. Since 21 of the 25 observations are below our score it is at the 84 th percentile in the test score distribution. Using these scores, let s calculate the percentile of the following: a) The score at 72. 6 7 7 2334 7 5777899 8 00123334 8 569 b) The score at 93. c) The two students at 80. *Note: Some may define the pth percentile as the value with p percent less than or equal to it.
Cumulative Relative Frequency Graphs There are some interesting graphs that can be made using percentiles. One of the graphs starts with a frequency table for a quantitative variable. Here is a frequency table that summarizes the ages of the first 44 U.S. presidents when they were inaugurated: Age Frequency Relative Frequency 40-44 2 45-49 7 50-54 13 55-59 12 60-64 7 65-69 3 Cumulative Frequency Cumulative relative Frequency The extra columns will be used to help us determine the relative frequency, cumulative frequency, and cumulative relative frequency. To determine the relative frequency we would divide the count of each class by the total and multiply by 100 to get the percentage. To determine the cumulative frequency we would add the counts in the frequency column for the current class and all classes with smaller values of the variable. To determine the cumulative relative frequency, we would divide the entries in the cumulative frequency by the total and multiply by 100 to receive the percentage. We can make a cumulative relative frequency graph of the data using the table.
What can we learn from this graph? Barack Obama was inaugurated at the age of 47. Is this unusually young? Estimate and interpret the 65 th percentile of the distribution. Measuring Position: z-scores By looking back at your test score, we knew that the score is above what seems to be the average. Let s use the data of the test scores to determine the 1-variable statistics. Mean Median Standard Deviation We can describe the location of your score by telling how many standard deviations above or below the mean score is. Since the mean is 80 and the standard deviation is about 6, the score of 86 is about one standard deviation above the mean. Converting the observations in this manner is called standardizing. Definition: Standardized value (z-score) If x is an observation from a distribution that has known mean and standard deviation, the standardized value of x is x mean z = standard deviation A standardized value is often called a z-score. Let s revisit the scores we calculated the percentiles for and determine their z-scores. a) the grade at 93 b) the grade at 72 We can also use z-scores for comparisons. Suppose you took a Chemistry test and got an 82 on the test. At first you can be disappointed, but your teacher described the scores as fairly symmetric with a mean of 76 and a standard deviation of 4. How does your score compare to your statistics grade?
Transforming Data To find the standardized score(z-score) for an individual observation, we transformed this data value by subtracting the mean and dividing by the standard deviation. Transforming converts the observations from the original units of measurement to a standardized scale. What effect does transforming-adding or subtracting; multiplying or dividing- have on the shape, center, and spread of the entire distribution? Let s investigate. Soon after the metric system was introduced in Australia, a group of students were asked to guess the width of their classroom to the nearest meter. Here are the guesses in order from lowest to highest: 8 9 10 10 10 10 10 10 11 11 11 11 12 12 13 13 13 14 14 14 15 15 15 15 15 15 15 15 16 16 16 17 17 17 17 18 18 20 22 25 27 35 38 40 Let s create a dotplot and examine the 1-variable-statistics to describe the SOCS. Shape: Center: Spread: Outliers: Effect of adding or subtracting a constant The actual width of the room was actually 13 meters wide. How close were the student guesses? We can examine the distribution of students guessing errors by defining a new variable: error = guess 13 That is, we will subtract 13 from each observation. What can you guess would happen to our distribution? How will it effect the SOCS? Let s use the calculator to display the effect.
Effect of Adding (or Subtracting) a Constant Adding the same number a (either positive, zero, or negative) to each observation - adds a to measures of center and location (mean, median, quartiles, percentiles), but - does not change the shape of the distribution or measures of spread (range, IQR, standard deviation). Effect of multiplying or dividing a constant Since the metric system was barely introduced, it may not be useful to tell the students they were wrong by a few meters. So to put it in terms they may understand, we can convert the data into feet. There is roughly 3.28 feet in meter, so for the student that had an error of -5 meters can translate to 3.28 feet 5 meters = 16.4 feet 1meters So let s change the units of measurement from meters to feet. We need to multiply the error values by 3.28. What effect do you think it will have with the graph? Effect of Multiplying (or Dividing) by a Constant Multiplying (or dividing) each observation by the same number b (positive, negative, or zero) - multiplies (divides) measures of center and location (mean, median, quartiles, percentiles) by b, - does not change the shape of the distribution Connecting transformations and z-scores How does transforming deal with z-scores? Well to find a z-score it is a combination of subtracting the mean from every score and dividing it by the standard deviation. Let s use the calculator to plot the z-scores. How do you think the distribution will change? Density Curves We already have a few steps to approach our data since the very beginning. 1) Plot your data: make a graph, usually a dot plot, stemplot, or histogram. 2) Look for the overall pattern (SOCS) 3) Calculate the numerical summary to describe the center and spread (mean/standard deviation or median/iqr) We will add the following: 4) Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.
The following is a histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills (ITBS). A smooth curve is drawn on top as a good description of the overall pattern of the data. The shaded region of scores less than 6.0 or less is shaded to compare to the area that is given in the graph on the right. The total area of the histogram bars is 100% (a proportion of 1), since all the observations are represented. In moving from histogram bars to a smooth curve, we make a specific choice: adjust the scale of the graph so that the total area of the curve is exactly 1. Now the total area represents all the observations, just like the histogram. We can interpret areas under the curve as proportions of the observations. Definition: Density Curve A density curve is a curve that - is always on or above the horizontal axis, and - has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. Density curves come in many shapes. A density curve can give a good approximation of the overall pattern. Outliers, which are departures from the pattern, are not described by the curve. *Note: No set of data is exactly described by a density curve. The curve is an approximation that is easy to use and accurate enough for practical use. Describing Density Curves Our measures of center and spread also apply to density curves as well as to actual sets of observations. Areas under a density curve represent proportions of the total number of observations. The median of a data set is the point with half the observations on either side. So the median of a density curve is the equal-areas point, the point with half the area under the curve to its left and the remaining half of its area to the right.
Because density curves are idealized patterns, a symmetric density curve is exactly symmetric. The median and mean of a symmetric curve are exactly the same. We can see below how a skewed distribution effects the location of the mean. The mean of a set of observations is their arithmetic average. The mean of a density curve is the point at which the curve would balance if it were made of solid material. From the previous section we had described the mean and standard deviation of a set of data with the symbols x and s x respectively. With a distribution curve we will denote the mean with the Greek letter mu (µ) and the standard deviation with the Greek Letter sigma (σ).