11.1 Find Measures of Central Tendency and Dispersion STATISTICS Numerical values used to summarize and compare sets of data MEASURE OF CENTRAL TENDENCY A number used to represent the center or middle of a set of data values. This is represented by the mean, median, and mode. MEASURE OF DISPERSION A statistic that tells you how dispersed, or spread out, data values are STANDARD DEVIATION OUTLIER A measure that describes the typical difference (or deviation) between a data value and the mean. The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. A value that is much greater than or much less than most of the other values in a data set RANGE The spread of the data set found by subtracting the largest number and the smallest number in the set. This lets you know how many numbers your set of data covers. MEASURES OF CENTRAL TENDENCY The mean, or, of n numbers is the of the numbers by n. The mean is denoted by x, which is read as "x-bar." For the data set x 1, x 2, x n, the mean is The median of n numbers is the number when the numbers are written in order. (If n is even, the median is the of the two middle numbers.) The mode of n numbers is the number or numbers that occur. There may be mode, mode, or modes. Example 1: Find measures of central tendency Quiz Scores: The data sets give quiz scores for two different biology classes. Find the mean, median, and mode of each data set. Class A 15, 17, 17, 17, 18, 19, 21, 22, 25 Class B 16, 18, 19, 21, 22, 22, 22, 24, 25 Class A: Mean: x = Median: Mode: Class B: Mean: x = Median: Mode:
WHAT IS IT ASKING YOU TO DO? Example 2: Find the range and standard deviation Find the range and standard deviation for the quiz scores in each data set from Example 1. Class A: Range = = Class B: Range = = Because the range and standard deviation for Class are greater, its quiz scores are spread out. Example 3: Examine the effect of an outlier Soccer: The winning scores for the first 9 games of the soccer season are: 3, 4, 2, 5, 3,1, 4, 3, 2. a. Find the mean, median, mode, range, and standard deviation of the data set. b. The winning score in the next game is an outlier, 9. Find the new mean, median, mode, range, and standard deviation. c. Which measure of central tendency does the outlier affect the most? the least? d. What effect does the outlier have on the range and standard deviation? a. Mean: Median: Mode: Range: Std. Dev.: c. The is most affected by the outlier. The and are not affected by the outlier. b. Mean: Median: Mode: Range: Std. Dev.: d. The outlier caused both the range and standard deviation to.
and Clearing the list memory Press. The word EDIT should be highlighted (if not, arrow over to it). You should see five choices; the fourth is 4:ClrList. Press (you should see L1 above the key), then The screen will now say ClrList L1, L2. Press Entering data for 1-Variable statistics. The screen will now say ClrList. Specify lists one and two, by pressing (you should see L2 above the key).. Calculator will say Done signifying a clear memory. Press. Press (you should see 1:Edit on the screen). You should see 3 columns: L1, L2, L3. The cursor should be at L1 (if not, arrow over to it). Type in the first number, then. Type in the second number, then. When finished, press (you should see the word QUIT above the key). Calculating 1-Variable statistics Press. Use the blue to move the highlighted bar over the CALC menu. Choose the 1-Var stats option (that is, press ). You'll see the words 1-Var Stats on the screen. Press (you should see L1 above the key). You'll see the words 1-Var Stats L1 on the screen. Press top value on the screen.. The mean is the Clearing the list memory To clear the entire memory, press (says EDIT on the screen above the key), then (says CLRxy on the screen above the key). Entering data for 1-Variable statistics Press to enter statistics editing mode. We will use the name xstat for our list. So press. Enter the first number, then press (twice-to get past the y-values). Enter the next number, then. Continue until all the data has been entered. As your final step, press to signal the end of the data set. Calculating 1-Variable statistics Press. Hit (twice). Press the 1-VAR option ( ).The population standard deviation is listed as x.
Clearing the memory To clear the entire memory, press (the word MEM is above the key) (for RESET) (for stored memory; it says MEM). A new screen will ask you Are you sure? Press (for Yes). Entering data for 1-Variable statistics Press to enter statistics editing mode. Type in the first number. Press. Type in the second number. Press. Continue until all the data has been entered. Then press. Calculating 1-Variable statistics Press and/or until the screen is empty. Press (CALC on screen) (OneVa on screen) (LIST) (NAMES on screen) (xstat on screen), and lastly. On the screen you will see a list of statistics. x is the first thing on the list.
11.3 USE NORMAL DISTRIBUTIONS The standard deviation can help you find the story behind the data. To understand this concept, it can help to learn about what statisticians call normal distribution of data. A normal distribution of data means that most of the examples in a set of data are close to the "average" or mean while relatively few examples tend to one extreme or the other. Let's say you collected data about an person s daily calorie intake from a sample of people. You use your data to complete a bar graph and notice that the tops of the bars of the graph could have their lines smoothed to form a bit of a bell shape. You realize, the numbers for people's typical calorie consumption will probably turn out to be normally distributed, meaning most people intake around an average amount of calories, with very few people intake a lot less than that, and very few people intake a lot more than that. That is, for most people, their consumption will be close to the mean, while fewer people eat a lot more or a lot less than the mean. When you think about it, that's just common sense. Not that many people are getting by on a single serving of kelp and rice, OR on eight meals of steak and milkshakes. Most people consume somewhere in between.
Your normally distributed data looks something like this. Your mean (what the average person consumes), being represented by the y-axis is 1800 calories and your standard deviation is 300. 2 s.d. below the mean = 1800 300 300 = 1200 calories 1 s.d. below the mean = 1800 300 = 1500 calories 1 s.d. above the mean = 1800+300 = 2100 calories 2 s.d. above the mean = 1800+300+300 = 2400 calories 3 s.d. below the mean = 1800 300 300 300 = 900 calories 3 s.d. above the mean = 1800+300+300+300 = 2700 calories 3 2 1 1800 1 2 3 MEAN You ll notice that the curve is separated into 3 standard deviations on each side of the mean. The ones on the right are for values (daily calories) above or greater than the mean (1800), and the ones on the left are below or less than the mean (1800). Understanding one standard deviation: One standard deviation away from the mean would be found by adding 300 to 1800 and then subtracting 300 from 1800. One standard deviation in either direction always accounts for 68% of the information surveyed. In this case, it means that 68% of the people consume between 1500 and 2100 calories per day. Specifically, 34% consume between 1800 and 2100 calories, and 34% consume between 1550 and 1800 calories. Understanding two standard deviations: Two standard deviations from the mean would be found by adding 300 + another 300 to 1800 and then subtracting 300 another 300 from 1800. Two standard deviations in either direction represent more information and, thus, a higher percentage. Two s.d.s always account for 95% of the information, meaning that 95% of the people surveyed consume 1200 and 2400 calories per day. Specifically, 47.5% consume between 1800 and 2400 calories, and 47.5% consume between 1500 and 1800 calories. Understanding three standard deviations: Three standard deviations from the mean would be found by adding 300 + 300 + another 300 to 1800 and then subtracting 300 300 another 300 from 1800. Three standard deviations in either direction represent even more information and, thus, a higher percentage. Three s.d.s always account for 99.7% of the information, meaning that 99.7% of the people surveyed consume 900 and 2700 calories per day. Specifically, 49.85% consume between 1800 and 2700 calories, and 49.85% consume between 900 and 1800 calories.
Example 1 (Math Scores): The math scores of the 2004 SAT exam are normally distributed with a mean of 518 and a standard deviation of 114. a. About what percent of the test-takers have scores between 518 and 746? b. About what percent of the test-takers have scores less than 404? a. The scores of 518 and 746 represent standard deviations to the of the mean. So, the percent of test-takers with scores between 518 and 746 is % + % = %. b. A score of 404 is one standard deviation to the left of the mean. So, the percent of scores less than 404 is % + % + % = %. Example 2: A normal distribution has a mean of 63.7 and a standard deviation of 2.9. Find the probability that a randomly selected X-value from the distribution is in the given interval. a. a. Between 57.9 and 66.6 b. At least 66.6 b. Example 3: A normal distribution has mean x and standard deviation. For a randomly selected x-value from the data, find and a. The probability that a randomly selected x-value lies between and is the shaded area under the normal curve. Therefore: = + + =
b. The probability that a randomly selected x-value is less is the shaded area under the normal curve. Therefore: = + + = x x Z-score: Given a data value (x), we find its z-value by z. We then use this z-value and the provided chart to find the probability that the value is less than or equal to that amount. Height: A survey of a group of women found that the height of the women is normally distributed with a mean height of 64.5 inches and a standard deviation of 2.5 inches. Find the probability that a woman is at most 58 inches tall.
11.4 SELECT AND DRAW CONCLUSIONS FROM SAMPLES POPULATION A group of people or objects that you want information about When it is too difficult, time-consuming, or expensive to survey everyone in a population, information is gathered from a sample, or subset of the population being studied. SAMPLE UNBIASED SAMPLE In order to draw accurate conclusions about a population, you should select an unbiased sample. An unbiased sample is representative of the entire population you want information about. Although there are many ways to sample a population, a random sample is preferred because it is most likely to represent the population in an unbiased way. BIASED SAMPLE A biased sample is one that over-represents or under-represents certain parts or groups of the population Example: A teacher wants to survey everyone at her school about the quality of the school lunches. Identify the type of sample described and tell if the sample is biased. 1. The teacher surveys every 7th student that goes through the lunch line 2. From a random name lottery that includes every student s and teacher s name in the school, the teacher randomly selects 150 students and teachers to survey. 3. The teacher walks into the lunchroom and surveys the first 25 people that she sees. 1 Type of Sample: 2. Type of Sample: 3 Type of Sample: Biased or Unbiased: Why? Biased or Unbiased: Why? Biased or Unbiased: Why?
Example: A local politician wants to survey all of his constituents. 4. He calls the constituents that are members of his political party and asks if they will complete the survey. He then mails them the survey, which they mail back to him for use in his study. 4. Type of Sample: Biased or Unbiased: Why? SAMPLE SIZE When conducting a survey, you need to make sure the size of your sample is large enough so that it accurately represents the population. As the sample size increases, your margin or error decreases. MARGIN OF ERROR The number that gives a limit on how much the responses of the sample would differ from the responses of the population. For example, if 30% of the people in a poll prefer vanilla ice cream over chocolate and the margin of error is 2.6 %, then it is likely that between 27.4% and 32.6% of the actual population prefers vanilla ice cream. When a random sample of size n is taken from a large population, the margin of error is approximated by: Margin of error = 1 n Example: In a survey of 1432 people, 26% said that they read the newspaper every day. (a) What is the margin of error for the survey? (b) Give an interval that is likely to contain the exact percent of all people who read the newspaper every day. a. b. Example: In a poll about which movie channel its customers prefer to watch, a cable company wants a margin of error to be ±3%. How many people would they need to survey? Example: A group of students survey the local community about their favorite beverage. How many people did they survey if the margin of error is ±7%?
Types of Models Linear y ax b 11.5 Choose the Best Model for Two-Variable Data when the equation appears to be increasing or decreasing at a constant rate (m), or following the same pattern over and over again a would be your slope and b would be the y-intercept Quadratic y ax 2 bx c when the points appear to make either a U shape or a horseshoe shape. remember that quadratics are symmetric about the axis of symmetry, so look for the points to mirror one another after the graph hits it maximum or minimum (vertex) or Cubic y ax 3 bx 2 cx d Cubic functions must have two turning points, even though sometimes those turning points will not be as defined as the graphs shown Exponential Growth or Decay x y ab when the points decrease rapidly and then appear to level off and get closer together (RIGHT) OR when the points start off close together and then begin to gain value very rapidly (LEFT)
THE GOAL: Given a table of information, determine whether it can be best represented with a linear, quadratic, cubic, exponential, or power equation. Then, find that equation and verify that it was the best model for the data. To figure this out, we will: 1. Make a scatter plot on the calculator. 2. Assess the plot. Using our above descriptions, we will look at the data points and determine which of the above patterns they follow. 3. Use the regression applications on the calculator to determine the equation for our data. 4. Graph the equation to make sure that it is appropriate for our model. Example 1: The table shows the secretaries salaries y (in dollars) for a certain bank, where x is the number of years of experience and y is the salary. Use a graphing calculator to find a model for the data. x 1 2 3 4 5 6 7 y 30,624 32,436 34,167 35,989 37,684 39,311 41,098 1. Make a scatter plot. The points lie approximately. 2. Use the regression feature to find an equation of the model. 3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y Example 2: An environmental group observes a deer population in a park where hunting has been banned. The table shows the population y counted x years after the ban began. Use a graphing calculator to find a model for the data. x 0 5 10 15 20 y 500 729 1271 2206 3765 1. Make a scatter plot. The points are level at first and then begin to rapidly. 2. Use the regression feature to find an equation of the model. 3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y
Example 3: A manager at a local amusement park kept a record of the number of people to ride the most popular roller coaster at the park. The table shows the number of people y that rode the roller coaster x hours after the park had opened. Use a graphing calculator to find a model for the data. x 0 2 4 6 8 10 12 y 85 163 282 341 398 381 304 1. Make a scatter plot. 2. Use the regression feature to find an equation of the model. 3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y Example 4: x -5-4 -3-2 -1 1 2 y -20 0 3 0-4 0 18 1. Make a scatter plot. The points appear to. 2. Use the regression feature to find an equation of the model. 3. Graph the model along with the data to verify that the model fits the data well. A model for the data is y
TI 83-84 Linear Regression 1. Press STAT. 2. Arrow over to CALC. 3. Choose #4 LinReg(ax+b) 4. You should see LinReg(ax+b) on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the linear equation. Quadratic Regression 1. Press STAT. 2. Arrow over to CALC. 3. Choose #5 QuadReg. 4. You should see QuadReg on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the quadratic equation. Cubic Regression 1. Press STAT. 2. Arrow over to CALC. 3. Choose #6 CubicReg. 4. You should see CubicReg on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the cubic equation. Exponential Regression 1. Press STAT. 2. Arrow over to CALC. 3. Choose #0 ExpReg. 4. You should see ExpReg on the screen. 5. Press ENTER. The screen will show you the coefficients and constant of the exponential equation.