CHAPTER 2: Describing Location in a Distribution 2.1 Goals: 1. Compute and use z-scores given the mean and sd 2. Compute and use the p th percentile of an observation 3. Intro to density curves 4. More on skewed data (ex) Suppose that a professional soccer team has the money to sign one additional player and they are considering adding either a goalie or a forward. The goalie has a 90% save percentage and the forward averages 1.2 goals a game. In this league, the average goalie saves 86% of shots with a standard deviation of 5% while the average forward scores 0.9 goals per game with a standard deviation of 0.2. Who is the better player at his position? NOTE: The goalie is 4% higher than the average and the forward is 0.3 goals higher than average. But, since we are comparing different units, we cannot just say the goalie is better since 4 > 0.3. To be continued.. So..to be able to make comparisons possible, we need to compare them with their respective distributions. The tool we will use is the STANDARDIZED SCORE or z-scores: x mean z standard deviation Advantages of standardized scores: z-scores have no units we can compare values that are measured on different scales, with different units, or from different populations. (ex cont.) The goalie: The forward: Therefore the team should sign the, as they are the better player.
Exploring data from a single quantitative variable: 1. 2. 3. 4. Vocabulary Density Curve: A curve that NOTE: A density curve has the following properties Median of a Density Curve: Mean of a Density Curve: Sketches of possible mean/median relationships: A density curve is an idealized description of the distribution of data Notation for the mean of a density curve: Notation for the standard deviation of a density curve:
CHAPTER 2: Describing Location in a Distribution (continued) 2.2 Goals: 1. Defining normal distributions 2. The 68-95-99.7 rule (Empirical Rule) 3. Reading the Standard Normal Table 4. Normal probability plots Vocabulary Normal Distributions: (ex) The 68-95-99.7 Rule (also called the EMPIRICAL RULE): In the normal distribution with mean and standard deviation : Approximately of the observations fall within of the mean. Approximately of the observations fall within of the mean. Approximately of the observations fall within of the mean. Notation: We abbreviate the Normal distribution with mean µ and standard deviation σ as: (ex) Determine the percentages of data of the standard Normal curve between each value: a. Between µ and σ: d. Right tail from µ: b. Between σ and 2σ: e. Left tail from σ: c. Between -3σ and σ: f. Left tail from -2σ:
The heights of students are approximately normal with mean 66 and standard deviation 3. a. sketch the curve: b. Approximately 95% of heights will be within which 2 values? c. What proportion of students will have heights between 63 and 69? d. Between 63 and 72? e. Between 57 and 69? Note: The empirical rule only works for distributions that are approximately normal. Also It is very rare for an observation to be more than 3 standard deviations from the mean in a distribution that is approximately normal! THE Z-TABLE: The z-table is used to find desired probabilities given z-scores and to find corresponding z-scores given a specific probability. (ex) Use the following portion of the table of standard Normal curve areas to answer the questions on the next page. z *.00.01.02.03.04.05 0.8.7881.7910.7939.7967.7995.8023 0.9.8159.8186.8212.8238.8264.8289 1.0.8413.8438.8461.8485.8508.8531 1.1.8643.8665.8686.8708.8729.8749 1.2.8849.8869.8888.8907.8925.8944 1.3.9032.9049.9066.9082.9099.9115 1.4.9192.9207.9222.9236.9251.9265 1.5.9332.9345.9357.9370.9382.9394 1.6.9452.9436.9474.9484.9495.9505 1.7.9554.9564.9573.9582.9591.9599
a. The probability P(z<1.43) is found at the intersection of the 1.4 row and the.03 column of the z-table. The result is: b. Find the probability that z is between 0.91 and 1.12. c. Find the probability that z exceeds 1.74. (ex) Data from the article Determining Statistical Characteristics of a Vehicle Emissions Audit Procedure (Technometrics [1980]: 483-493) suggest that the emissions of nitrogen oxides, which are major constitutes of smog, can be plausibly modeled using a normal distribution. Let x denote the amount of this pollutant emitted by a randomly selected vehicle. The distribution of x can be described by a normal distribution with 1.6 and 0.4.Suppose that the EPA wants to offer some sort of incentive to get the worst polluters off the road. What emission levels constitute the worst 10% of the vehicles? Step 1: Draw a picture Step 2: Use the table Step 3: Unstandardize Step 4: State conclusion in context of the problem
Recall: To use the table we must have data that is. We could be told that it is, state that it is assumed (when appropriate), or we can use data to show that it is. To show that data is approximately normal we. Graphs used to check normality: Histograms and bar graphs Box plots Stem and Leaf Plots Normal Probability Plots (also called Normal Quantile Plots) Normal Probability Plots: If the points on the Normal Probability Plot lie close to, the plot indicates that the data are. Note: Systematic deviations, or indicate a non-normal distribution. Outliers are shown as points that lie How to construct a Norm. Prob. Plot can be found on pg. 149 in your text book. This is not an AP Statistics standard. You will only be required to construct these with the aid of your calculator (ex) Norm. Prob. Plot of data that is approximately normal: Norm. Prob. Plot of data that is not normal:
(ex) Check the following data for normality: L4 77 98 88 72 59 73 65 72 87 72 L5 93 87 78 76 61 66 75 69 88 58 Histograms: Box Plots: Stem and Leaf Plots: Normal Prob. Plots: