Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number summary for this data set and create a boxplot of the results. You should get IQR 7.5.. 5 number summary: 1. Put data in order from smallest to largest.. Determine the median (Q) by choosing the center number if there is an odd number of data points or choosing the two center numbers and gettingg the average if there is an even number of data points. The center numbers are 168 so Q 168 +168 336 168 3. Divide the data set in half. If there is an even number of data points, the data separates into two halves. If there is an odd number of data points, the center number is included in each half. 4. Determine the median of the lower half of data (Q1) and the median off the upper half of data (Q3). Lower half: Q1 164 +165 39 164.5 Upper half: Q3 171 +173 344 17 5. To complete the 5 number summary, find minimum and maximum values: 157, 164.5, 168, 17, 180 Determine outliers: 1. Calculate IQR Q3 Q1 17 164.5 7.5 1
. Find lower limit (LL) and upper limit (UL) using the following equations. Any values in the data set that are less than LL and greater than UL will be outliers. LL Q1 (1.5 X IQR) 164.5 (1.5 X 7.5) 164.5 11.5 153.5 UL Q3 + (1.5 X IQR) 17 + (1.5 X 7.5) 17 + 11.5 183.5 No number in the data set is less than 153.5 or greater than 183.5, so there are no outliers. Graph the box plot: 1. Draw an x axis with a scale that can include all data points from least to greatest. 150 160 170 180 190. Draw large vertical bars above the values for Q1, Q, and Q3. 150 160 170 180 190 3. Draw horizontal bars above and below the vertical lines to complete the box. 150 160 170 180 190 4. There are no outliers, so draw smaller vertical bars above the values for min and max and connect them to the box. 150 160 170 180 190 5. If there is an outlier, draw an asterisk above its value and place the smaller vertical bar above the next value closest to the box (adjacent value) that is not an outlier.
Step by step regression Sports analysts were interested to see if golfers average driving distance (how far they hit the ball off the tee) is related to their driving accuracy (% of drives that land in the fairway). They took data from 10 of the world s top golfers. Distance (x ) Accuracy (y ) 316 54.6 304 63.4 310 57.9 31 56.6 95 68.5 91 66.0 300 58.7 98 59.4 95 61.8 309 50.6 1. Multiply each x and y value: 316 X 54.6 1753.6, 304 X 63.4 1973.6, etc. Distance (x ) Accuracy (y ) xy x 316 54.6 1753.6 304 63.4 1973.6 310 57.9 17949.0 31 56.6 17659. 95 68.5 007.5 91 66.0 1906.0 300 58.7 17610.0 98 59.4 17701. 95 61.8 1831.0 309 50.6 15635.4 3
. Square each x value and record the product in the last column: 316 99856, 304 9416, etc. Distance (x ) Accuracy (y ) xy x 316 54.6 1753.6 99856 304 63.4 1973.6 9416 310 57.9 17949.0 96100 31 56.6 17659. 97344 95 68.5 007.5 8705 91 66.0 1906.0 84681 300 58.7 17610.0 90000 98 59.4 17701. 88804 95 61.8 1831.0 8705 309 50.6 15635.4 95481 3. Sum each column and record the number at the bottom of each column. Distance (x ) Accuracy (y ) xy x 316 54.6 1753.6 99856 304 63.4 1973.6 9416 310 57.9 17949.0 96100 31 56.6 17659. 97344 95 68.5 007.5 8705 91 66.0 1906.0 84681 300 58.7 17610.0 90000 98 59.4 17701. 88804 95 61.8 1831.0 8705 309 50.6 15635.4 95481 3030 597.5 18076.5 91873 The symbols used to indicate each sum are: x 3030 y 597. 5 xy 18076. 5 x 91873 4
4. Use these values to replace symbols in the equation for slope (b 1 ). n sample size 10 b 1 xy x x x) ( y / n / n b 1 18076.5 (3030)(597.5) /10 91873 (3030) /10 Solving the equation for slope: 1. Multiply 3030 X 597.5 on top and square 3030 on bottom. b 1 18076.5 181045 /10 91873 9180900 /10. Divide each of these numbers by 10 (n). b 1 18076.5 18104.5 91873 918090 3. Subtract on top and bottom then divide. b 316 64 1 0.49 Solving the equation for y intercept (b 0 ): 1. Replace the symbols with the appropriate value. y b1 x b0 n b 0 597.5 ( 0.49)(3030) 10. Multiply 0.49 X 3030. b 0 597.5 ( 1490.8) 10 3. If slope is negative, replace the two negatives with a positive and sum 597.5 + 1490.8 (if slope is positive, subtract). Then, divide by 10 (n). b 088.3 10 0 08.8 5
Graphing a scatterplot: 1. Create an x and y axis that will encompass all data point. For x values, a range of 90 to 30 will be sufficient, and for y values a range of 50 to 70 will do.. Plot data points. For the first one, go to 316 on the x axis and up to 54.6 (about 55) on the y axis (this is the dataa point on the far right of the graph). 6
Plotting the regression equation: The regression equation is ŷ 08.88 0.49x 1. Choose at least 3 values for x to put into the equation. They should cover a range that will produce a line that stretches across most of the scatterplot. Values of 95, 305, and 315 will provide such a line with this data set.. Put each of these x values in the equation and solve for ŷ. For x 95, multiply 0.49 X 95 then subtract from 08.8. x ŷ 95 64.3 305 59.4 315 54.55 3. As with the scatterplot data points, place each of these points on the graph. Use different shapess than what was used for the scatterplot points. 4. Connect the 3 regression line points with a line. 7
Step by step regression coefficient (r ) Use the same data on golfers to calculate a regression coefficient. Calculate SST: 1. Calculate mean of y then subtract the mean from each y value. For the data below, 54.6 59.8 5., 63.4 59.8 3.6, etc. Distance (x ) Accuracy (y ) y y 316 54.6 5. 304 63.4 3.6 310 57.9 1.9 31 56.6 3. 95 68.5 8.7 91 66.0 6. 300 58.7 1.1 98 59.4 0.4 95 61.8.0 309 50.6 9. y 59.8. Square each of these values. For the data below, 5. 7.0, 3.6 13.0, etc. Sum these values to get SST ( y y) 58. 0 Distance (x ) Accuracy (y ) y y ( y y) 316 54.6 5. 7.0 304 63.4 3.6 13.0 310 57.9 1.9 3.6 31 56.6 3. 10. 95 68.5 8.7 75.7 91 66.0 6. 38.4 300 58.7 1.1 1. 98 59.4 0.4 0. 95 61.8.0 4.0 309 50.6 y 59.8 9. 84.6 58.0 8
Calculate SSR: 1. Place each x value into the regression equation to calculate ŷ values. ŷ 08.8 0.49x 08.8 (0.49)(316) 08.8 154.8 54.0 ŷ 08.8 (0.49)(304) 08.8 149.0 59.8, etc. Distance (x ) Accuracy (y ) ŷ 316 54.6 54.0 304 63.4 59.8 310 57.9 56.9 31 56.6 55.9 95 68.5 64.3 91 66.0 66. 300 58.7 61.8 98 59.4 6.8 95 61.8 64.3 309 50.6 57.4. Subtract the mean of y (59.8) from each ŷ value. Below, 54.0 59.8 5.8, 59.8 59.8 0.0, etc. Distance (x ) Accuracy (y ) ŷ yˆ y 316 54.6 54.0 5.8 304 63.4 59.8 0.0 310 57.9 56.9.9 31 56.6 55.9 3.9 95 68.5 64.3 4.5 91 66.0 66. 6.4 300 58.7 61.8.0 98 59.4 6.8 3.0 95 61.8 64.3 4.5 309 50.6 57.4.4 9
3. Square each of these values. For the data below, 5.8 34.1, 0.0 0.0, etc. Sum these values to get SSR ( y ˆ y) 157. 0 Calculate r : ˆ Distance (x ) Accuracy (y ) ŷ y y ( yˆ y) 316 54.6 54.0 5.8 34.1 304 63.4 59.8 0.0 0.0 310 57.9 56.9.9 8.4 31 56.6 55.9 3.9 15.1 95 68.5 64.3 4.5 19.8 91 66.0 66. 6.4 41.1 300 58.7 61.8.0 4.0 98 59.4 6.8 3.0 8.9 95 61.8 64.3 4.5 19.8 309 50.6 57.4.4 5.8 157.0 r SSR SST 157.0 58.0 0.608 10
Calculate μ: Step by step calculate population mean (μ) and standard deviation (σ) using table data Let s consider age distribution for only Major champions during the Open Era for male professional tennis players. There have been 180 Open champions since 1968. What is the population mean (μ) and standard deviation (σ) for age for male champions in Majors. 1. Sum column (Count) then divide each number in the column by the sum. P(X17) 3/180 0.0167 P(X18) /180 0.0111, etc.. Multiply each Age (x) by its probability [P(Xx)]. x P(Xx) 17 0.0167 0.833 x P(Xx) 18 0.0111 0.000, etc. 3. Sum the 4 th column [Σ x P(Xx)] to get μ 4.47 11 Age (x ) Count Rel Freq x P(Xx ) P(Xx ) x μ (x μ) (x μ) P(x ) 17 3 0.0167 0.833 18 0.0111 0.000 19 5 0.078 0.578 0 11 0.0611 1. 1 16 0.0889 1.8667 0 0.1111.4444 3 15 0.0833 1.9167 4 7 0.1500 3.6000 5 3 0.178 3.1944 6 16 0.0889.3111 7 10 0.0556 1.5000 8 5 0.078 0.7778 9 8 0.0444 1.889 30 9 0.0500 1.5000 31 4 0.0 0.6889 3 1 0.0056 0.1778 33 1 0.0056 0.1833 34 1 0.0056 0.1889 35 1 0.0056 0.1944 36 1 0.0056 0.000 37 1 0.0056 0.056 38 0 0.0000 0.0000 39 0 0.0000 0.0000 180 1.0000 4.47
Calculate σ: 1. In the 5 th column, subtract each Age (x) from the population mean (μ). x μ 17 4.47 7.5 x μ 18 4.47 6.5, etc.. In the 6 th column, square each of these values. (x μ) ( 7.5) 56.5 (x μ) ( 6.5) 4.5, etc. 3. In the last column, multiply these values by values in the 3 rd column [P(Xx)]. Shorthand notation for P(Xx) is P(x) and that is used in the last column to save space. (x μ) P (x) 0.0167 56.5 0.938 (x μ) P (x) 0.0111 4.5 0.469, etc. 4. Sum the last column [Σ (x μ) P (x) ] to get population variance σ 13.194 then take the square root to get population standard deviation σ 3.63. 1 Age (x ) Count Rel Freq x P(Xx ) P(Xx ) x μ (x μ) (x μ) P(x ) 17 3 0.0167 0.833 7.5 56.5 0.938 18 0.0111 0.000 6.5 4.5 0.469 19 5 0.078 0.578 5.5 30.5 0.840 0 11 0.0611 1. 4.5 0.5 1.38 1 16 0.0889 1.8667 3.5 1.5 1.089 0 0.1111.4444.5 6.5 0.694 3 15 0.0833 1.9167 1.5.5 0.188 4 7 0.1500 3.6000 0.5 0.5 0.038 5 3 0.178 3.1944 0.5 0.5 0.03 6 16 0.0889.3111 1.5.5 0.00 7 10 0.0556 1.5000.5 6.5 0.347 8 5 0.078 0.7778 3.5 1.5 0.340 9 8 0.0444 1.889 4.5 0.5 0.900 30 9 0.0500 1.5000 5.5 30.5 1.513 31 4 0.0 0.6889 6.5 4.5 0.939 3 1 0.0056 0.1778 7.5 56.5 0.313 33 1 0.0056 0.1833 8.5 7.5 0.401 34 1 0.0056 0.1889 9.5 90.5 0.501 35 1 0.0056 0.1944 10.5 110.5 0.613 36 1 0.0056 0.000 11.5 13.5 0.735 37 1 0.0056 0.056 1.5 156.5 0.868 38 0 0.0000 0.0000 13.5 18.5 0.000 39 0 0.0000 0.0000 14.5 10.5 0.000 180 1.0000 4.47 13.194
Steps for finding areas under the Normal Curve 1. Determine if you are starting with x values of interest or an area (or %) of interest. a) If you see >, <, or between x values you will be calculating an area (%). Go to step. b) If you see top or bottom %, you will be calculating an x value. Go to step 3.. If you begin with x values: a) For each x value (you will have one or two), use z formula to get z scores then get area(s) from tables. z x µ σ b) Decide if you need > (greater than, to right on area curve), < (less than, to left on area curve), or between two values. i) If you have >, calculate 1 area and you are done. ii) If you have <, you are done once you get area from the table. iii) If you need an area between values, subtract the smaller area from the larger one to get a positive number. Then you are done. 3. If you begin with %, you will be working the problem in the opposite direction compared to step. a) First, determine if you need top % or bottom %. i) If you need top %, transform % to decimal (divide by 100), then calculate 1 decimal. Example: Need top 15%. Decimal is 0.15 and 1 0.15 0.85. This will be the area that you will need to find in the table (among the many numbers in the center, not in the z column). Next, find 0.85 in the table. The exact value may not be in the table so find the one that is nearest. Next, go to the z column and find the corresponding z score. This is the number you will use in the z formula (above). ii) If you need bottom %, transform % to decimal. This will be the area that you will need to find in the table. Find the corresponding z score in the z column to use in the formula. b) You should have a number for all but one variable in the z formula. You need to solve the formula to find the x value for your final answer. You can use the rearranged formula I gave you in class that has x isolated on one side: x µ + ( z σ ) 13
Step by step confidence intervals The current mean height (μ) of men in the U.S. that are 0 years of age is 69.5 inches with a standard deviation (σ) of.8 inches. Using population SD (σ): I randomly chose 30 male heights (I estimated height from the first 30 men on my cell phone contact list). From this sample, I calculated a sample mean ( x ) of 70.5 and a sample SD (s) of.3. Calculate a 90% CI using population SD, σ. 1. Identify the proper numbers that will replace symbols in the equation. Pay close attention to whether you need to use population or sample mean and standard deviation. For the equation using σ x ± z α / σ n x 70.5, z for 90% 1.64, σ.8, and n 30. Replace symbols with numbers. 70.5 ± 1.64(.8 ) 30 3. Solve. a) Take square root of n b) Divide σ by that number c) Multiply z by that number d) Subtract that number from x e) Add the number to x.8 70.5 1.64( ) 5.48 70.5 1.64 0.51 70.5 0.84 69.7 70.5 + 0.84 71.3 90% Confidence Interval (CI) is 69.7 to 71.3 inches 14
Using sample SD (s): Calculate a 90% CI using sample SD, s. 1. Identify the proper numbers that will replace symbols in the equation. Pay close attention to whether you need to use population or sample mean and standard deviation. For the equation using s x ± tα / s n x 70.5, t for 90% 1.699 (df n 1 30 1 9, use column for t 0.05 ), s.3, and n 30. Replace symbols with numbers. 70.5 ± 1.699(.3 ) 30 3. Solve. a) Take square root of n b) Divide σ by that number c) Multiply z by that number d) Subtract that number from x.3 70.5.699( ) 5.48 70.5.699 0.4 e) Add the number to x 70.5 1.13 69.4 70.5 + 1.13 71.6 90% Confidence Interval (CI) is 69.4 to 71.6 inches 15
Step by step hypothesis tests using z and t For a population of female college students (N 3,64) at a small mid western college, population mean (μ) is 64.4 and population SD (σ) is.4. Using population SD (σ): We collected height for women at another college on the East Coast. We would like to determine if the average height for women at this college is the same as average height at the Mid western college above. We randomly measured height for 0 women at the East Coast college and calculated an average height of 65. inches ( x ) with a sample SD of.68 (s). Using a significance level of 10% (α 0.10), are we very confident that the mean height is the same for both colleges? First, test the hypothesis assuming the standard deviation at the East Coast college is the same as the Mid western college (use σ). 1. Identify the proper numbers that will replace symbols in the equation. Pay close attention to whether you need to use population or sample mean and standard deviation. For the equation using σ x µ z 0 σ n x 65., μ 64.4, σ.4, and n 0, z for 90% 1.64 (compare your test z score to this number to see if there is a significant difference in mean heights). Replace symbols with numbers. 65. 64.4.4 0 3. Solve. a) On bottom, take square root of n b) Divide σ by that number c) On top, subtract d) Divide the top number by the bottom number 65. 64.4 0.54 0.8 0.54 1.48 4. Compare the test z score you calculated with the critical z score of 1.64. If your number is >1.64 or < 1.64, then there is a significant difference between mean heights of women at the colleges. Test z 1.48 falls between these values so there is not a significant difference and we do not reject the null hypothesis (H 0 ) that they are the same. 16
Using sample SD (s): Using a significance level of 10% (α 0.10), are we very confident that the mean height is the same for both colleges? This time, test the hypothesis using the sample standard deviation at the East Coast college (use s). Our critical test value of t will be 1.79. This is from the t table with df 19 (n 1 or 0 1) and column for 90% confidence level (t 0.05 ). So if our test t value is < 1.79 or >1.79 we will reject our null hypothesis. 1. Identify the proper numbers that will replace symbols in the equation. Pay close attention to whether you need to use population or sample mean and standard deviation. For the equation using s x µ t 0 s n x 65., μ 64.4, s.68, and n 0, t for 90% and n of 0 1.79 (above). Replace symbols with numbers. 65. 64.4.68 0 3. Solve. a) On bottom, take square root of n b) Divide s by that number c) On top, subtract d) Divide the top number by the bottom number 65. 64.4 0.6 0.8 1.33 0.6 4. Compare the test t score you calculated with the critical t score of 1.79. Because 1.33 is < 1.79, we do not reject the null hypothesis that average height is the same at both colleges and conclude that they are the same considering our data. 17