12 Statistics. Exercise Set 12-1

Size: px

Start display at page:

Download "12 Statistics. Exercise Set 12-1"

Alvin Maurice Garrison
5 years ago
Views:

1 1 Statistics Exercise Set Measurements or observations that are gathered for an event under study are called data.. The branch of mathematics that involves collecting, organizing, summarizing, and presenting data, and then drawing general conclusions from said data is called statistics. 3. A population consists of all subjects under study; a sample is a representative subgroup, or subset, of the population. 4. Subjects in the population are numbered, and then they are selected according to corresponding random numbers. 5. Number each member of the population, and then select every k th member. The starting number, though, must be selected at random. 6. Divide the population into groups where the members of any group have similar characteristics. Select members from each group at random. 7. Have an existing group of subjects that represent the population. 8. They both are grouping data in ranges of values. 9. Descriptive statistics are used to describe a set of data. These statistics are not used to draw any conclusions about anything other than the data at hand. Inferential statistics involve techniques that are used to describe a population when you only have data from a sample of the population you are trying to describe. 10. Descriptive statistics utilizes deductive reasoning and inferential statistics utilizes inductive reasoning. 11. It is a cluster sample since an existing group of subjects that represent the population is used for a sample. 1. It is a systematic sample since every seventh customer is selected. 13. It is a random sample since each subject of the population has an equal chance of being selected. 14. It is a systematic sample since every hundredth hamburger is checked. 15. It is a stratified sample since the population is divided into groups and members from each group are randomly selected. 16. It is a random sample since every day of the year has an equal chance of being selected. 17. No; students that are well-off might be underrepresented. 18. Yes; the IDs are randomly-chosen, and prison IDs are unlikely to be based on specific characteristics of the prisoner. 19. Yes; the target group is everyone that has a phone, and everyone surveyed obviously has a phone. 0. No; poor or homeless people are far less likely to have a phone. 1. No; the first five products on a shelf are likely to be older so that the store can get rid of older food first.. Yes; the color of an M&M shouldn t affect its weight. 586

2 3. Rank Tally Frequency Fr 18 So 1 Jr 6 Se 4 4. Source Tally Frequency I 13 N 5 R 3 T 4 5. Show Tally Frequency S 6 D 5 B 7 A 7 587

3 6. highest value lowest value Round this up to 66. Start with lowest value and add 66 to get the lower class limits: 1, 87, 153, 19, 85, 351. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class Tally Frequency highest value lowest value Start with the lowest value and add 7 to get the lower class limits: 7, 34, 41, 48, 55, 6, 69. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class Tally Frequency

4 8. highest value lowest value difference number of classes 6 which rounds up to 3.9. Start with the lowest value and add 3.9 to get the lower class limits: 4.5, 8.4, 3.3, 36., 40.1, and Set up the classes by subtracting 0.1 from each lower class limit except the first lower class limit. Class Tally Frequency The lower limit for the next class will be 60. So the class width is Successively add 60 to get the lower limits: 0, 60, 10, 180, 40, 300, 360, and 40. Class Tally Frequency

5 30. highest value lowest value Start with the lowest value and add 97 to get the lower limits: 5, 10, 199, 96, 393, 490, 587, 684. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class Tally Frequency

31. highest value lowest value 11, 413 150 11, 63 Start with the lowest value and add 1,17 to get the lower class limits: 150, 1,77,,404, 3,531, 4,658, 5,785, 6,91, 8,039, 9,166, 10,93.

6 31. highest value lowest value 11, , 63 Start with the lowest value and add 1,17 to get the lower class limits: 150, 1,77,,404, 3,531, 4,658, 5,785, 6,91, 8,039, 9,166, 10,93. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class Tally Frequency 150 1,76 1,77,403,404 3, ,531 4, ,658 5, ,785 6, ,91 8, ,039 9, ,166 10,9 3 10,93 11,

7 3. combined highest combined lowest difference which rounds up to 31. number of classes 8 Start with the lowest value and add 31 to get the lower class limits: 306, 337, 368, 399, 430, 461, 49, 53. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class McGwire: Tally & Frequency Sosa: Tally & Frequency highest lowest value 1, ,15 difference 1, number of classes 6 which rounds up to 188. Start with the lowest value and add 111 to get the lower class limits: 75, 913, 1,101, 1,89, 1,477, and 1,665. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class Tally Frequency , ,101 1,88 3 1,89 1,476 1,477 1, ,665 1, Arrange the data in order. (Note 3, 8, and 9 are written as 03, 08, and 09.) Separate the data according to the first digit. Use the first digits as stems and the second digits as leaves. Stems Leaves Analysis: The most number of calls were made by ten executives in the interval. The least were made by two executives in the group. The most common numbers were 1 calls and 14 calls made by three executives each. 59

8 35. Arrange the data in order. Separate the data according to the first digit. Use the first digits as stems and the second digits as leaves. Stems Leaves Analysis: Most registered vehicles per car stolen are in the range 80 89, while the least are in the 0 49 range and the most common are 84 and Arrange the data in order. Separate the data according to the first digit. Use the first digits as stems and the second digits as leaves. Stems Leaves Arrange the data in order. Separate the data according to the whole number. Stems Leaves 38. Arrange the data in order. Separate the data according to the whole number. Stems Leaves Inferential statistics is used since an inference is made. 40. Descriptive statistics is used since the average describes the data. 41. Inferential statistics is used since a prediction or inference is made. 4. Descriptive statistics is used since the total attendance describes the data. 43. Inferential statistics is used since an inference is made. 593

44. Inferential statistics is used since a prediction or inference is made. 45. Answers may vary. 46. Answers may vary. 47. Answers may vary. 48. Answers may vary. 49. Answers may vary. 50.

9 44. Inferential statistics is used since a prediction or inference is made. 45. Answers may vary. 46. Answers may vary. 47. Answers may vary. 48. Answers may vary. 49. Answers may vary. 50. Answers may vary. 51. The majority of states have taxes below $1.80, and just a handful are over $3.00. Explanations may vary. 5. The vast majority of cities have theft rates between 1 in and 1 in Explanations may vary. Exercise Set 1-1. Bar Graph: Label horizontal axis with data values and vertical axis with frequencies. Draw a bar the height of the corresponding frequency over the given data value. Pie Chart: Find the degree measures by dividing the frequencies by the total number of data and multiplying by 360. Then divide the pie chart up accordingly.. Histograms are continuous and represent the data without gaps. This is perfect for grouped data since the lower class boundary of one class is the upper class boundary of the previous class. Bar graphs and pie charts are best for categorical data since we are typically comparing individual items which can be easily visualized with a bar or pie chart. 3. Histograms and frequency polygons both represent the frequency of grouped data. A histogram is similar to a bar graph whereas a frequency polygon is similar to a line graph. 4. A time series graph is used to see how something changes over time. 5. Draw the bars with heights corresponding to the number of transplants. By far the most common type of transplant is kidney, while pancreas is easily the least common. 6. Draw the bars with heights corresponding to the number of students. While flu was the most common condition, the ailments were fairly spread out, with all but flu and ear infection having between 10 and 40 patients. 594

7. Draw the bars with heights corresponding to the number of taxicabs. 8. Draw the bars with heights corresponding to the number of unemployed people. 9.

100% 40% 90 360 68 17 100% 19% 90 n 90 10.

10 7. Draw the bars with heights corresponding to the number of taxicabs. 8. Draw the bars with heights corresponding to the number of unemployed people. 9. Rank Frequency, f degrees f 360 n percent f n 100% Freshman 1 Sophomore 5 Junior 36 Senior % 13% % 8% % 40% % 19% 90 n Cause of death Number of deaths, f Heart disease 43 Cancer 7 Stroke 93 Accidents 4 Other 4 degrees f 360 n percent f n 100% , % 43.% 1, , %.7% 1, , % 9.3% 1, , %.4% 1, , %.4% 1,000 n 1,

f 11. Reason Number, f degrees 360 percent f 100% n n Interest in subject 6 6 100 360 3.

11 f 11. Reason Number, f degrees 360 percent f 100% n n Interest in subject % 6% 100 Future earning Potential 18% Future earning potential Pressure from parents % 18% % 1% 100 Pressure from Parents 1% Good Job Prospects 8% Interest In Subject 6% Good job prospects % 8% n Major field Number, f degrees f 360 n percent f n 100% Preschool 893 Elementary 605 Middle 45 Secondary 1, , % 31.5%, , % 1.3%, , % 8.6%,839 1, ,839 1, % 38.6%,839 n, For the histogram, draw vertical bars corresponding to the frequencies for each class. Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of the graph. For the frequency polygon, find the midpoints for each class: 94, 103, 11, 11, and 130. Label the horizontal axis with the midpoints. 596

Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of graph.

12 14. For the histogram, draw vertical bars corresponding to the frequencies for each class. Frequency Years of Service For the frequency polygon, find the midpoints for each class: 3, 8, 13, 18, 3, and 8. Label the horizontal axis with the midpoints. Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of graph. Frequency For the frequency polygon, find the midpoints for each class: 10, 15, 0, 5, and 30. Label the horizontal axis with the midpoints. Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of graph. Frequency Frequency Miles per Gallon 16. For the histogram, draw vertical bars corresponding to the frequencies for each class. Seconds Years of Service 15. For the histogram, draw vertical bars corresponding to the frequencies for each class. Frequency Miles per Gallon 597

13 For the frequency polygon, find the midpoints for each class:.6, 3.3, 4.0, 4.7, 5.4, and 6.1. Label the horizontal axis with the midpoints. Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of graph. 18. Represent the years on the x axis and the number sold on the y axis, and then draw lines connecting the points. Frequency Seconds Analysis: The most-occurring times are from.3 to 3.6 seconds. Except for a neighborhood of 4.7 seconds, the frequency decreases fast from a neighborhood of 3.3 seconds to greater times. 17. Represent the years on the x axis and the number of people on the y axis, and then draw lines connecting the points. 19. Represent the years on the x axis and the downloads on the y axis, and then draw lines connecting the points. Downloads increased very steadily from 004 to 010, then decreased from 010 to Represent the years on the x axis and the sales on the y axis, and then draw lines connecting the points. Restaurant sales grew dramatically from 1970 to 01, with the rate of growth getting faster. 598

1. a) Frequency charts and histograms may vary. One with 7 classes is shown below b) Most states have less than 50,000 employed registered nurses, and very few have more than 100,000. 4.

b) The most likely high temperature in May is between 80 and 85 degrees; highs less than 75 are very unusual, and the 90s occur occasionally.. a) Frequency charts and histograms will vary.

14 1. a) Frequency charts and histograms may vary. One with 7 classes is shown below b) Most states have less than 50,000 employed registered nurses, and very few have more than 100, a) Frequency charts and histograms may vary. One with 6 classes is shown below. b) The most likely high temperature in May is between 80 and 85 degrees; highs less than 75 are very unusual, and the 90s occur occasionally.. a) Frequency charts and histograms will vary. One with 7 classes is shown below. b) The average high for Honolulu in May is remarkably consistent, especially compared to Las Vegas. It was between 73 and 80 every day, with most days falling between 75 and a) Frequency charts and histograms may vary. One with 6 classes is shown below. b) More than half of all states have less than 3,000 employed LPNs, and very few have more than 6,000. The histogram shape is strikingly similar to the one for RNs; the major difference is the labeling. 5. Answers may vary. 6. Answers may vary. 7. Time series graph 8. Pie chart 9. Bar graph 30. Time series graph 31. Bar graph 3. Pie chart 33. Answers may vary. 34. The scale makes the increase look greater. 35. Answers may vary. 36. Answers may vary.. 599

15 Exercise Set Answers may vary.. The mean is calculated by adding the data values and dividing by the total number of values. 3. The median is the middle value (or mean of the middle two values) in a set of data when the data is in either ascending or descending order. 4. The mode of a data set is the value that occurs most often. 5. The midrange is found by averaging the highest and lowest data values. 6. The mode can be used to measure the average of categorical data since it doesn t depend on numerical values. 7. Answers may vary. 8. Answers may vary. 9. Answers may vary. 10. Answers may vary. X 11. mean X n Arrange the data in order The median is the middle value: median 7. The value that occurs most often is 3: mode 3. L+ H midrange 31 X 1. mean X n Arrange the data in order The median is the middle value median 98. The value that occurs most often is 80. mode 80. L+ H midrange 139 X 13. mean X n , , , Since the data are in thousands, the mean is 61,600. Arrange the data in order ,350 1,380 The median is the middle value median 475. Since each value occurs only once, there is no mode. L+ H , 380 midrange

16 X mean X 9.54 n 35 Arrange the data in order The median is the middle value: median 10. There are 4 values that occur 6 times mode 7, 8, 10, and 11 L+ H midrange 10 X 9, 077 mean X, n 10 The data already is in order. The median is the average of the middle values,779 and,668.,779 +,668 median, 73.5 Since each data value occurs only once there is no mode. L+ H 4,313 +,394 midrange 3, X 8.65 mean X 4.35 n 19 Arrange the data in order The median is the middle value: median No data value occurs most often so there is no mode. L+ H midrange X 948 mean X n 5 Arrange the data in order The median is the middle value. median 151 Each data value occurs only once so there is no mode. L+ H midrange 07.5 X 175, 908 mean X 9, 58.3 n 19 Arrange the data in order. 1,364 1,976 1,99 3,15 3,31 3,56 3,831 3,916 5,599 6,908 9,010 9,90 10,901 11,900 1,817 13,807 14,134 7,400 30,45 The median is the middle value median 6,908 Each data value occurs only once so there is no mode. L+ H midrange 1, ,45 15,894.5 X 11 mean X 7.5 n 15 Arrange the data in order The median is the middle value. median 7 The data values 5 and 9 occurs most often. modes are 5 and 9 L+ H midrange

17 X 101, 476 mean X 14, n 7 Arrange the data in order. 11,047 11,970 14,009 15,105 16,111 16,1 17,11 The median is the middle value. median 15,105 Each data value occurs only once so there is no mode. L+ H midrange 11, ,11 14, X 69, 70 mean X 16, n 16 Arrange the data in order. 14,748 15,399 15,5 15,586 16,037 16,148 16,9 16,44 16,58 16,740 16,914 16,99 17,030 18,08 19,650 1,610 The median is the average of the two middle 16, , 58 values, median 16, 485. Each data value occurs only once so there is no mode. L+ H 14, ,610 midrange 18,179 X mean X 9.48 n 1 Arrange the data in order The median is the average of the two middle values, median The data value 8.6 occurs most often, so the mode is 8.6. L+ H midrange Class Frequency Midpoint Frequency Midpoint ,305 X ,305 Class Frequency Midpoint Frequency Midpoint ,005 X 5.15 mpg ,005 60

18 5. Class Frequency Midpoint Frequency Midpoint 7. Class Frequency Midpoint Frequency Midpoint X 4.4 seconds ,99 1,99 X 4.87 or $4.87 million Class Frequency Midpoint Frequency Midpoint 8. Class Frequency Midpoint Frequency Midpoint , , ,05.5 5, , , , , , , ,590 X 1, hours , ,183 1,183 X

19 Class Frequency Midpoint Frequency Midpoint , , , , , , 08 X $ ,08 Class Frequency Midpoint Frequency Midpoint ,779 X 3.7 days Answers may vary. 3. (a) Mode (b) Mode (c) Median 33. Median 34. Mean 35. Mode 36. Mode 37. Mode 38. Mean 75 1, Answers may vary. 40. Answers may vary. 41. Mean and median tend to be close when the data set doesn t have one or two terms that are unusually high or low compared to the others. But when there are outliers like that, it skews the average up or down without affecting the median much. 4. Answers may vary. 43. Estimating that the values in a class will average out to be in the middle of the class. 44. Answers may vary It must be in the range of The 9 th value is 4.9% of the way 1 through the class. The 10 th value is % through the class. If the values 1 are equally spread out, then the first value would be 177 and the 9 th value would be (8) 180 and the 10 th value would be (8) 181. The average of these two values is

20 There are 35 data values, so the median is the 18 th data value. The 18 th data value will fall in the class The first data value in this class is the 10 th data value, so the 18 th data value is the 9 th data value in this class or 9 75% of the way through the class. If the 1 values are equally spread out, then the first value would be 7 and the 9 th value would be (6) Exercise Set Range, variance, and. The range is the difference between the highest value and the lowest value in a data set. 3. Because (a) the range uses only two of the values in the data set, and (b) an extremely large and/or an extremely low value can make the range very large thus giving the impression of more variability than is actually the case. 4. Standard deviation square root of variance. 5. Find the mean and subtract it from each value in the data set. Square each difference, and find the sum of the squares. Divide this sum by (n 1), where n number of values in the data set. Take the square root of the quotient to obtain the. 6. For a given data set, its provides an indication of how far from the mean the individual values (members of the set) are. If two data sets D 1 and D have the same mean, but different s s 1 and s with say, s1 < s it can be concluded that the members of D are more variable than those of D In data set 1 the data are fairly close together with a range of 6; in data set they are more spread out with a range of 31; consequently, the data in set 1 will have a smaller standard deviation than that of set. 8. To be an NFL running back you must be pretty fast, so the times to complete the 40-yard dash will be more consistent. Therefore data set 1 will have a smaller than data set. 9. Since winters in Chicago are much colder than summers, and winters in Los Angeles are not as cold, the in data set 1 will be greater than that of data set. 10. Weights of dogs of the same breed will tend to be more consistent than weights of dogs of varying breeds; therefore data set 1 will have a smaller than data set. 605

21 11. R highest value lowest value X X 14 n 9 9 X X X ( X X) 61 47, ,54 ( X X) 3,54 variance n 1 8 s s The number of junk s varies pretty widely. 1. R highest value lowest value X X n X X X ( X X) , , , ( X X) variance n 1 46, , 63.8 s s 11, The number of hospitals varies quite widely. 13. R highest value lowest value 1, ,799 X 1, ,166 X n X X X ( X X) 1,90 1, ,405, , , ,901 1, ,40, , , , , , ,67.36 ( X X) variance n 1 3, 943, , ,943,0.4 s s 438, The odometer readings vary pretty widely , , ,

22 14. R highest value lowest value X X 4. n X X X ( X X) R highest value lowest value X X n 9, X X X ( X X) , , , , , , ( X X) variance n 1 14 s s The number of hours varies pretty widely. ( X X) 10, variance 1, n 1 8 s s 1, The weights don t vary all that much. 607

23 16. R highest value lowest value X X 9.8 n R highest value lowest value X X 7.7 n 9 9 X X X ( X X) X X X ( X X) ( X X) variance n 1 10 s s The number of stories is fairly uniform ( X X) 7.01 variance 9 n 1 8 s s 9 3 The heights are pretty uniform. 608

24 18. R highest value lowest value 1, X X n 1 8, R highest value lowest value $9.40 $0.7 $8.68 X X n X X X ( X X) X X X ( X X) , , , , , , , , , , , ( X X) variance n 1 5 s s , ,494.9 ( X X) variance n 1 441, , s s 40, The number of calories varies pretty widely. 609

25 0. R highest value lowest value X ,168 X n X X X ( X X) 1. R highest value lowest value $48.84 $9.34 $39.50 X X n X X X ( X X) , , , , , , , ,97.6 ( X X) 6, 97.6 variance, n 1 9 s s, , ( X X) 1, variance 4.59 n 1 7 s s

26 . R highest value lowest value X , 418 X n 9 9 X X X ( X X) , , , , , , , , , variance ( X X) 560,354.4 n 1 560, , R highest value lowest value X , 533 X n 9 9 X X X ( X X) , , , , , , , , , , ( X X) 46, variance 57, n 1 8 s s 57, s s 70,

27 4. R highest value lowest value X X 84 n 5 5 X X X ( X X) ( X X) 558 variance n 1 4 s s R highest value lowest value 5,840 4, 551 1,89 X X n 4, ,357 X X X 5, , ,690 8 ( X X) 5, ,54 5, ,649 5, ,19 ( X X) 1,70,76 variance 45,818 n 1 7 s s 45, R highest value lowest value X X 6.4 n 1 1 X X X ( X X) ( X X) variance 3.7 n 1 11 s s , ,89 5, ,161 5, ,449 4, ,636 4, ,889 1,70,76 61

28 7. The variation is not the same. Find the for each data set. (a) R 1; X 11 X X X ( X X) variance 18.7; s (b) R 1; X 11 X X X ( X X) (c) R 1; X (a) X 30 X X X ( X X) variance 36; s X X X ( X X) ,000 1,000 variance 50; s variance 5.7; s

29 (b) X 35 (e) X 6 X X X ( X X) X X X ( X X) ,000 1,000 variance 50; s (c) X 5 X X X ( X X) ,000 variance 50; s (d) X 150 X X X ( X X) , , , , variance 10; s (f) The remains unchanged when a constant is added or subtracted from the data values. The is multiplied (or divided) by the same constant that is multiplied (or divided) by the data values. 9. a) Average would be close to the hole for Pat, close to the center of the green for Ron. 30. a) b) Pat has a large variation, Ron has a very small variation. c) Answers may vary, but this example shows that variation can sometimes be more meaningful than average. Helena: Juanita: X X.8 n X X 3.0 n b) The variation for Juanita is much greater. c) Answers may vary. 31. Answers may vary. 3. Answers may vary. 5,000 5, 000 variance 6, 50; 4 s 6,

30 Exercise Set If your score is in the 60th percentile that means you scored higher than 60% of the students in the class.. A score in the 90th percentile does not mean you got 90% of the questions right; it means you scored better than 90% of the people who took the test. 3. Quartiles are the 5 th, 50 th, and 75 th percentiles. 4. The second quartile and the median are the same. Since the second quartile is the 50th percentile, that makes it the middle value in the data set, which is how the median is defined. 5. The portion inside the box in a box plot represents the data values between the 1 st and 3 rd quartiles. It is the middle portion of all of the data. 6. An outlier is a data point that just doesn t jive well with the other data points. You can casually spot outliers, but there is also a specific definition when you look at a box plot. An outlier is any data point that is at least specific distance from the box. The distance is 1.5 times the difference between the 1 st and 3 rd quartiles. 7. Arrange the 0 data values in order (a) There are 4 values below % 0 A score of 3 is equivalent to the 0th percentile. (b) There are 15 values below % 0 A score of 44 is equivalent to the 75th percentile. (c) There are 7 values below % 0 A score of 36 is equivalent to the 35th percentile. (d) There is 1 value below % 0 A score of 7 is equivalent to the 5th percentile. (e) There are 18 values below % 0 A score of 49 is equivalent to the 90th percentile. 8. Arrange the 1 data values in order (a) There are 6 values below % 1 A height of 67 in. is equivalent to the 50th percentile. (b) There are 8 values below % 1 A height of 70 in. is equivalent to the 67th percentile. (c) There are values below % 1 A height of 63 in. is equivalent to the 17th percentile. (d) There are 7 values below % 1 A height of 68 in. is equivalent to the 58th percentile. (e) There are 5 values below % 1 A height of 66 in. is equivalent to the 4nd percentile. 9. There are values below Carveta s rank % or 75th percentile There are values below John s rank % or 80th percentile

31 11. There are places after her % or 79%, or 79th 00 percentile 1. There are 1, ,861 stocks with lower dividends. 1, % or 96th percentile; so yes, 1,941 the stock qualifies for her portfolio % of So, 10 students scored lower than Angela % of So, 150 scored below her % of So, scored above him % of So, applicants scored above her, and she will not get an offer. 17. Find Maurice s percentile rank and compare it to Lea s rank of % 600 Maurice is ranked higher because he is in the 63rd percentile. 18. Find Maranda s percentile rank and compare it to Audrelia s rank of % 30 Maranda s rank is higher because she is in the 33rd percentile. 19. Find the basketball team s percentile rank and compare it to the football team s rank. Basketball team: % 344 Football team: % 47 The football team had a better ranking. 0. Find the brother s percentile rank and compare it to the sister s rank. Brother: % 30 Sister: % 193 The brother has a better ranking. 1. Arrange the data in order (a) There are 1 values below % 0 The age of 33 is equivalent to the 60th percentile. (b) 8, since 33 is the 8th value from the top. (c) 0% of The age of 3 corresponds to the 0th percentile because there are 4 values below it.. Arrange the data in order (a) There are 5 values below 96. (b) % 0 The IQ score of 96 is in the 5th percentile. 15, since 96 is the 15th value from the top. (c) 40% of The IQ score of 101 corresponds to the 40th percentile since there are 8 values below it. 3. Arrange the data in order The median is the middle value. Q 34 Find the median of the values less than Q Q1.5 Find the median of the values above Q Q

32 4. Arrange the data in order The median is the middle point Q.03 Find the median of the values less than Q. Q.00 Find the median of the values above Q. Q Arrange the data in order The median is the middle point Q 88.5 Find the median of the values less than Q. Q 1 78 Find the median of the values above Q. Q Arrange the data in order The median is the middle point Q Find the median of the values less than Q Q Find the median of the values above Q Q Arrange the data in order The median is the middle point Q Find the median of the values less than Q. Q Find the median of the values above Q. Q Arrange the data in order The median is the middle point Q.85 Find the median of the values less than Q. Q 1.00 Find the median of the values above Q. Q Find the quartiles: Q 55 Q 1 4 Q Draw a number line ranging from the least to the greatest value and label Q 1, Q, and Q 3 on it. Draw in the rectangular box and vertical line. a) The majority of the values are on the low end of the distribution. b) and 45. are outliers; there are two countries that have far more Internet users than all others. 30. Q Q Q Draw a number line ranging from the least to the greatest value and label Q 1, Q, and Q 3 on it. Draw in the rectangular box and vertical line a) The data set is distributed fairly evenly. b) No; none of the years had unusually high or low prices. 617

33 31. Use the quartile values found in the solution to exercise 7. Draw a number line ranging from the least to the greatest value and label Q 1, Q, and Q 3 on it. Draw in the rectangular box and vertical line. a) The majority of the years have homicides on the higher end of the range. b) 66 is an outlier; one year had an exceptionally low homicide total. 3. Use the quartile values found in the solution to exercise 8. Draw a number line ranging from the least to the greatest value and label Q 1, Q, and Q 3 on it. Draw in the rectangular box and vertical line. a) At least half of the top ten are between and 4 billion. b) To say the least, Google is an outlier. It dwarfs the sizes of the other companies. Groupon is also an outlier 33. It is possible to score a 90% on a test and have a percentile rank of less than 90 or exactly 90, depending on how the other students did. 34. It is not possible to rank in the 100th percentile because you d have to do better than everyone, including yourself. 35. Answers may vary. 36. Answers may vary. 37. Find the quartiles in the current order, then switch Q 1 and Q. 38. a) If a data set has 10 values, the value at the 80 th percentile is ranked ninth, but 0.8 times 10 is 8. b) When the data set has a very large number of values the claim is close to true. Exercise Set In a normal distribution the data is usually centered about the middle of the range.. Answers may vary. 3. The area under a portion of the normal curve corresponds to the percentage of data that fall in that range. 4. The empirical rule says that about 68% of the data are within one of the mean, 95% are within two s of the mean, and about 99.7% are within three s of the mean. 5. Since the normal curve is symmetric about the mean, 50% of the area lies to the left of the mean and 50% of the area lies to the right of the mean. 6. The total area under the normal curve is 1 since it represents the probability of the entire sample space which should equal It is a normal distribution with mean zero and Obviously, not all sets of data that are normally distributed have a mean of zero and 1, which means that we can t apply the standard normal deviation. But, finding z scores converts your data set into a data set that does have a mean of zero and a To find the z score of a data value, you subtract the mean from it and then you divide it by the. 10. Because the area between z 0 and a positive z score is the same as the area between z 0 and the negative of that z score and so the values are one below and above the mean respectively. Since about 68% of the data are within one of the mean and , approximately 34 data values fall between 10.5 and

34 and , so the values are two below and above the mean respectively. Since about 95% of the data are within two s of the mean and , approximately 190 data values fall between 84 and and , so the values are three below and above the mean respectively. Since about 99.7% of the data are within two standard deviations of the mean and , approximately 499 data values fall between 35 and and , so the values are two s below and above the mean respectively. Since about 95% of the data are within two s of the mean and , approximately 114 data values fall between 17.4 and and , so the values are one below and above the mean respectively. Since about 68.% of the data are within two s of the mean and , approximately 54 data values fall between 3. and and , so the values are two s below and above the mean respectively. Since about 95% of the data are within two s of the mean and , approximately 713 data values fall between 175 and The area between z 0 and z is The area between z 0 and z 0.5 is The area between z 0 and z.05 is The area to the right of z 1.0 is The area to the right of z 0.5 is The area between z 0 and z 1.95 is The area to the left of z 0.40 is

35 The area to the left of z 1.45 is The area between z 0.5 and z 1.10 is The area between z 1.5 and z 1.90 is The area between z.45 and z 1.05 is The area between z 0.8 and z 1.3 is The area to the left of z 1.0 is The area between z 0.85 and z 0.0 is The area to the left of z.15 is The area between z 1.55 and z 1.85 is The area to the right of z 1.90 is

36 34. The area to the right of z 0.0 is The area to the left of z 0.60 is The area to the right of z 1.10 is The area between z 1.90 and z 1.95 is The area between z 0.1 and z 0. is a) , so 606 is two s the mean. Since.% of the data falls more than two standard deviations below the mean, about 97.8%. b) and , so the values are one below and above the mean respectively. Since about 68.% of the data are within one s of the mean and , approximately 341 data values fall between 3. and a) , so the value is two s above the mean. Since about.5% of the data is two standard deviations above the mean, the probability that you were born more than 98 days after you were conceived is b) There are 83 days between April 19 and January 7, so we are taking about 83 days after conception , so 83 is one above the mean. Since 50% % 84.1% of the data is less than one above the mean, the probability that the baby will be born before January 7 is about If the mean is 8 ounces, then only 50% of the bags would have at least 8 ounces and 97% of the bags would have ounces. So, if the mean were set to 8.8, then the manager would have his wish. 4. a) and , so the values are one above and one below the mean. Since 68.% of the data is less than one standard deviation away from the mean, the probability that a student will get a score between 63 and 85 is 0.6. b) , so the values are greater than two s below the mean. Since 97.5 percent of the data is greater than two s above the man, the probability of a score over 53 is Find a z value for which the area between it and 0 is From z-table, z Find a z value for which the area between it and 0 is From z-table, z (a) From z-table, z ±.05. (b) From z-table, z ±1.75. (c) From z-table, z ± Answers may vary. 61

37 Exercise Set Many real-life situations, with a large and random population, closely resemble the normal distribution (that is, the theoretical one). Since the mathematics-statistics of this distribution are well known, certain conclusions or probabilities can be drawn from an appropriate real-life situation.. Plot the data, and see if the graph has properties similar to those of the normal distribution. (In more advanced texts, certain tests do exist for deciding.) 3. The area under a normal distribution between two data values is the probability that a randomly selected value is between those two data values. 4. In order to calculate the probabilities of a variable being in some range, you need to find the z scores for the lowest and highest data values for the range. 5. (a) (b) z The area between z 0 and z 0.5 is Since the desired area is in the right tail, subtract from The probability that a randomly selected production worker earns more than $15 is z The area between z 0 and z 0.9 is Since the desired area is in the left tail, subtract 0.31 from The probability that a randomly selected production worker earns less than $14.00 is value mean 6 z The area between z 0 and z 1.33 is Since the desired area is in the right tail, subtract from The probability that it will cost two people more than $6 to go to a movie is (a) z 55, , 547 9, The area between z 0 and z 0.49 is Since the desired area is in the right tail, subtract from The probability that a teacher earns more than $55,000 is (b) z 45, , 547 9, The area between z 0 and z 0.6 is 0.3. Since the desired area is in the left tail, subtract from The probability that the teacher earns less than $45,000 is (a) z The area between z 0 and z 1.3 is Since the area under the curve to the right of z 1.3 is desired, add to The probability that he or she spent more than $60 per purchase is

38 (b) z The area between z 0 and z 1.05 is Since the area under the curve to the left of z 1.05 is desired, add to The probability that he or she spent less than $80 per purchase is (a) value mean z The area between z 0 and z.58 is Since the desired area is in the left tail, subtract from The probability that he or she owned the set less than.5 years is value mean (b) z value mean z The area between z 0 and z.0 is The area between z 0 and (c) z 0.90 is Since the desired area is between z.0 and z 0.90, subtract from The probability that he or she owned the set between 3 and 4 years is value mean z The area between z 0 and z 0.67 is Since the area under the normal curve to the right of z 0.67 is desired, add to The probability that he or she owned the set more than 4. years is value mean (a) z value mean z The area between z 0 and z 0.75 is The area between z 0 and z 0.75 is also The total area is The probability that the CEO is between 53 and 59 years old is value mean (b) z value mean z The area between z 0 and z 0.5 is The area between z 0 and z 1.75 is Since the desired area is between z 0.5 and z 1.75, subtract 0.19 from The probability that the CEO is between 58 and 63 years old is value mean (c) z value mean z The area between z 0 and z 1.5 is The area between z 0 and z 0.5 is Since the desired area is between z 0.5 and z 1.5, subtract from The probability that the CEO is between 50 and 55 years old is

39 11. (a) z1 5, , 000,000.5 z 8, , 000,000 1 The area between z 0 and z.5 is The area between z 0 and z 1 is Since the desired area is between z 1 and z.5, subtract from The probability that a tire s lifetime is between 5,000 and 8,000 miles is (b) z1 7, , 000, z 3, , 000,000 1 The area between z 0 and z 1.5 is The area between z 0 and z 1 is The total area is The probability that a tire s lifetime is between 7,000 and 3,000 miles is (a) 13. (a) (c) z1 31, , 000, z 33, , 000, The area between z 0 and z 0.75 is The area between z 0 and z 1.75 is Since the desired area is between z 0.75 and z 1.75, subtract 0.73 from The probability that a tire s lifetime is between 31,500 and 33,500 miles is value mean z The area between z 0 and z 9.83 is approximately Since the desired area is to the right of z 9.83, subtract from The probability that a visitor spends at least 180 minutes per visit is 0. value mean 50 6 (b) z 1 1 The area between z 0 and z 1 is Since the desired area is to the right of z 1, add to The probability that a visitor spends at least 50 minutes per visit is value mean z 1 6 The area between z 0 and z 1 is Since the desired area is to the left of z 1, add to The probability that at most 50 inches of snow will be received is

40 value mean (b) z The area between z 0 and z 1.5 is Since the desired area is in the right tail, subtract from The probability that at least 53 inches of snow will be received is value mean (a) z value mean z The area between z 0 and z 1.6 is The area between z 0 and z 0.31 is 0.1. The total area is The probability that the customer will have to wait between 5 and 10 minutes is value mean 6 9. (b) z value mean 9 9. z The area between z 0 and z 1.3 is Since the desired area is to the left of z 1.3, subtract from The area between z 0 and z 0.08 is Since the desired area is to the right of z 0.08, add to The total desired area is The probability that a customer will have to wait less than 6 minutes or more than 9 minutes is value mean (a) z value mean z The area between z 0 and z 1.66 is The area between z 0 and z 0.93 is The total area is The probability that it will take a student between 15 and 30 minutes to complete the test is (a) value mean (b) z value mean z The area between z 0 and z 1.14 is Since the desired area is to the left of z 1.14, subtract from The area between z 0 and z 0.59 is 0.. Since the desired area is to the right of z 0.59, subtract 0. from The total desired area is The probability that it will take a student less than 18 minutes or more than 8 minutes to complete the test is value mean z.5 8 The area between z 0 and z.5 is Since the desired area is to the right of z.5, add to The probability that a person burns more than 80 calories is value mean (b) z The area between z 0 and z 0.88 is Since the desired area is in the left tail, subtract from The probability that a person burns less than 93 calories is (c) z z The area between z 0 and z 1.88 is The area between z 0 and z.5 is The total area is The probability that a person burns between 85 and 30 calories is

41 value mean (a) z The area between z 0 and z 0.69 is Since the desired area is to the right of z 0.69, add to The probability that the temperature will be above 6 is (a) value mean (b) z The area between z 0 and z 0.88 is Since the desired area is to the left of z 0.88, add to The probability that the temperature will be below 67 is value mean (c) z The area between z 0 and z 0.5 is The area between z 0 and z 1.19 is Since the desired area is between z 0.5 and z 1.19, subtract from The probability that the temperature will be between 65 and 68 is value mean z The area between z 0 and z 0.5 is Since the desired area is to the right of z 0.5, add to The probability that the person s blood pressure is above 130 is value mean (b) z 1 8 The area between z 0 and z 1 is Since the desired area is to the left of z 1, add to The probability that the person s blood pressure is below 140 is (a) value mean (c) z value mean z The area between z 0 and z 0.13 is The area between z 0 and z 0.5 is The total area is The probability that the person s blood pressure is between 131 and 136 is value mean z The area between z 0 and z 0.47 is Since the desired area is in the left tail, subtract from , Therefore 638 people will score below 93. value mean (b) z The area between z 0 and z 1.33 is Since the desired area is in the right tail, subtract from , Therefore 184 people will score above 10. value mean (c) z The area between z 0 and z 1.33 is The area between z 0 and z 0.33 is The total area is ,000 1,074 Therefore 1,074 people will score between 80 and

42 value mean (d) z The area between z 0 and z 1.67 is The area between z 0 and z 1.0 is Since the desired area is between 1.67 and 1.0, subtract from , So 136 people will score between 75 and (a) z1 1,900, z,000, The area between z 0 and z.91 is The area between z 0 and z.6 is The desired area is So 5 homes will have between 1,900 and,000 square feet. (b) z 3,000, The area between z 0 and z 4.3 is 0.5. Since the desired area is to the right of z 4.3, subtract from Therefore no homes will have more than 3,000 square feet. (c) (d) 1. (a) z,000, The area between z 0 and z.6 is Since the desired area is to the left of z.6, subtract from Therefore 6 homes will have less than,000 square feet. z 1,500, The area between z 0 and z 5.55 is 0.5. Since the desired area is to the right of z 5.55, add to All the homes will have more than 1,500 square feet. z 300, , , The area between z 0 and z.09 is Since the desired area is in the right tail, subtract 0.48 from Therefore 14 homes cost more than $300,

(b) z1 00,000 14, 300 41,000 0.35 z 300, 000 14, 300 41, 000.09 The area between z 0 and z 0.35 is 0.137. The area between z 0 and (c). (a) z.09 is 0.48. The desired area is 0.137 + 0.48 0.619 0.

43 (b) z1 00,000 14, , z 300, , , The area between z 0 and z 0.35 is The area between z 0 and (c). (a) z.09 is The desired area is Therefore, 495 homes cost between $00,000 and $300,000. z 150, , , The area between z 0 and z 1.57 is Since the desired area is to the left of z 1.57, subtract 0.44 from Approximately 46 homes cost less than $150,000. value mean z The area between z 0 and z 1.49 is Since the desired area is in the left tail, subtract 0.43 from , Therefore, 68 books were sold for less than $8.00. value mean (b) z The area between z 0 and z 0.47 is Since the desired area is in the right tail, subtract from , So, 319 books were sold for more than $ (c) z z The area between z 0 and z 0.0 is The area between z 0 and z 0.96 is The desired area is , So, 340 books were sold for between $9.50 and $ value mean (d) z The area between z 0 and z 0.7 is The area between z 0 and z 0.5 is The desired area is , Therefore, 93 books were sold for between $9.80 and $ We need to find the percentage of workers who make less than Earl. z The area between z 0 and z 0.59 is 0.. Since the desired area is the left half and the portion in the right half below 0.59, add So, 77.% (or approximately 7%) of the workers make less than Earl. Earl s wage is in the 7nd percentile. 68

4. We need to find the percentage of people who keep their TVs less than 4 years. value mean 4 4.8 z 0.90 0.89 The area between z 0 and z 0.90 is 0.316.

44 4. We need to find the percentage of people who keep their TVs less than 4 years. value mean z The area between z 0 and z 0.90 is Since the desired area is in the left tail, subtract from So 18.4% (or approximately 18%) of people keep their TVs less than 4 years. That puts the amount of time you kept your TV in about the 18th percentile. 5. We need to find the percentage of people who spend fewer than 0 minutes at a time on a social networking site. value mean 0 6 z The area between z 0 and z 3.50 is approximately Since the desired area is in the left tail, subtract from You are in the 0th percentile; that is, no one spends less time than you. 6. We need to find the percentile rank for,000 and 5,000. z,000, The area between z 0 and z.6 is Since the desired area is to the left of z.6, subtract from So,000 square feet is in about the 1st percentile. z 5,000, The area between z 0 and z is Since the desired area is to the left of z 17.49, add to So the 5,000 square foot home is in the 99th percentile (remember there is no 100th percentile). The change in percentile rank is a) b) Mean.34; s 0.58 c) z z z The area between z 0 and z 1.45 is Since the desired area is in the left tail, subtract 0.47 from So the probability of a reaction time less than 1.5 seconds is The area between z 0 and z 0.59 is The area between z 0 and z.86 is Since the desired area is between these two values, the probability of a reaction time between and 4 seconds is So the probability of a reaction time between and 4 seconds is

45 8. a) 9. b) Mean 1.67; s 0.44 c) z z z The area between z 0 and z is Since the desired area is in the left tail, subtract from So the probability of a reaction time less than 1.5 seconds is The area between z 0 and z 0.75 is The area between z 0 and z.86 is Since the desired area is between these two values, the probability of a reaction time between and 4 seconds is So the probability of a reaction time between and 4 seconds is 0.5. b) Mean ; s 5.95 c) z z z The area between z 0 and z 1.75 is Since the desired area is in the right tail, subtract from So the probability of a salary over $60,000 is The area between z 0 and z is The area between z 0 and z is Since the desired area is between these two values, the probability of salary between $40,000 and $50,000 is

30. a) b) Mean 55.83; s 5.91 c) z1 60 55.83 5.91 0.706 z 40 55.83 5.91.679 z3 50 55.83 5.91 0.986 The area between z 0 and z 0.706 is 0.61. Since the desired area is in the right tail, subtract 0.

46 30. a) b) Mean 55.83; s 5.91 c) z z z The area between z 0 and z is Since the desired area is in the right tail, subtract 0.61 from So the probability of a salary over $60,000 is The area between z 0 and z.679 is The area between z 0 and z is Since the desired area is between these two values, the probability of salary between $40,000 and $50,000 is a) Very unlikely. Gas prices tend to fluctuate pretty wildly. b) This would most likely be normally distributed with mean something a bit more than pounds. c) Possibly, but not necessarily. Since basketball favors tall players, but there are still some shorter players, the heights with the largest number of players would probably be somewhat above the mean. d) Probably, although the number of hits may fluctuate depending on the day of the week, which could affect the distribution. e) Probably not, for pretty much the same reason in part c. The ages probably go from 18 up to the 60s, but the distribution would be very strongly skewed toward the younger side. 3. The regional campus with a z score of 0.90, compared to the main campus with a z score of No, it would have the same shape. 34. The A s and F s will have areas under the standard normal curve of and 0.450, respectively. Let X A score that divides the As and X F score that divides the Fs. The z values corresponding to an area of on either side of z 0 are 1.64 and X A ; A 10(1.64) X + X F ; X F 10( 1.64) The Cs will have an area under the standard normal curve of on either side of z 0. Let X C1 score that divides the Cs from Ds and let X C score that divides Cs from Bs. The z values corresponding to an area of on either side of z 0 are 0.84 and XC ; X C1 10(0.84) XC ; 10 X C 10( 0.84) The As will have scores above 76. The Bs will have scores between 69 and 76. The Cs will have scores between 5 and 68. The Ds will have scores between 44 and 51. The Fs will have scores of 43 and below. 631

47 35. The areas under the normal curve to the left and right of z 0 are 0.5 and 0.5. The z values corresponding to these areas are z 0.68 and z lower limit lower limit 15( 0.68) upper limit upper limit 15(0.68) The approximate limits are 90 to The area under the normal curve between z 0 and the cutoff time is The z value corresponding to an area of to the left of z 0 is cutoff time cutoff time 4.3( 0.84) The cutoff time is approximately 55 minutes. 37. P(of taking less than 1.5 seconds to react) In Exercise 7, the probability was 5 found to be P(of taking between and 4 seconds to react) In Exercise 7, the probability 5 was found to be 0.7. The probabilities are reasonably close, so our assumption that the data was approximately normal distributed seems reasonable. 38. P(of taking less than 1.5 seconds to react) In Exercise 8, the probability was 5 found to be P(of taking between and 4 seconds to react) In Exercise 8, the probability 5 was found to be 0.5. The probabilities are reasonably close, so our assumption that the data was approximately normal distributed seems reasonable. Exercise Set A scatter plot is a plot of ordered pairs of data from two data sets and is used to predict whether there may be a correlation between the data sets.. Generally, as x increases, so does y. The points would form a straight, or roughly straight, stream from lower left to upper right. 3. Generally, as x increases, y decreases. The points would form a straight or roughly straight, stream from upper left to lower right. 4. The correlation coefficient tells you the strength of the correlation between two data sets. 5. The regression line for two data sets is the line that best fits the scatter plot of the data. 6. If there is a linear correlation between two data sets the regression line can be used to predict the value of a dependent variable by plugging in the value for the corresponding independent variable into the regression equation. 7. Correlation shows that two variables are related. A correlation does not explain WHY the variables are related. Causation can explain 8. A 5% significance level means that there is a 5% chance of being wrong about the conclusion and a 95% chance of being wrong about the conclusion. 9. Answers may vary. 10. Answers may vary. For this exercise set, use the following formulas. r n( xy) ( x)( y) ( ) ( ) ( ) ( ) n x x n y y ( ) ( )( ) n x x n xy x y b a ( ) ( ) y b( x) n 63

48 11. (a) 1. (a) (b) x y xy x y Σ ,791 7(496) 8(105) r [7(140) (8) ][7(1,791) (105) ] (c) n 7, 5% level, Table 1-4 value n 7, 1% level, Table 1-4 value Since r 9.77 is greater than each value, r is significant at the 5% and the 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 7(496) (8)(105) b (140) (8) (8) a The equation of the regression line is y x. (e) There is a positive linear relationship. (b) x y xy x y , ,600 4 Σ ,006 5, (1, 006) (176)(39) r [6(5,438) (176) ][6(37) (39) ] (c) n 6, 5% level, Table 1-4 value n 6, 1% level, Table 1-4 value Since r is greater than each value, r is significant at the 5% and the 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 6(1, 006) (176)(39) b (5, 438) (176) 39 ( 0.501)(176) a 1. 6 The equation of the regression line is y x. (e) There is a negative linear relationship. 633

49 13. (a) 14. (a) (b) x y xy x y , , , , Σ ,445 7, (, 445) 330(30) r 4(7,350) (330) ][4(6) (30) ] (c) n 4, 5% level, Table 1-4 value n 4, 1% level, Table 1-4 value Since r is not greater than either value, r is not significant at 5% nor at 1% level. (d) Since r is not significant, the computing and drawing of a regression line would be meaningless. (e) No relationship exists. (b) x y xy x y , , , , , , ,600 Σ , ,575 6(4,150) (70)(349) r [6(840) (70) ][6(0,575) (349) ] (c) n 6, 5% level, Table 1-4 value n 6, 1% level, Table 1-4 value Since r is greater than each value, r is significant at the 5% and 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 6(4,150) (70)(349) b (840) (70) 349 (3.36)(70) a The equation of the regression line is y x. (e) There is a positive linear relationship. 634

50 15. (a) 16. (a) (b) x y xy x y , , , ,04 5 Σ ,6 7, (, 6) (185)(65) r [5(7,131) (185) ][5(919) (65) ] (c) n 5, 5% level, Table 1-4 value n 5, 1% level, Table 1-4 value Since r is greater than each value, r is significant at the 5% and 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 5(, 6) (185)(65) b 0.5 5(7,131) (185) 65 ( 0.5)(185) a The equation of the regression line is y x. (e) There is a negative linear relationship. (b) x y xy x y , , , ,116 5 Σ ,47 7,0 43 5(1, 47) (188)(38) r [5(7,0) (188) ][5(43) (38) ] (c) n 5, 5% level, Table 1-4 value n 5, 1% level, Table 1-4 value Since r is less than each value, r is not significant at the 5% nor the 1% level. (d) Since r is not significant, the computing and drawing of a regression line would be meaningless. (e) No relationship exists. 635

51 17. (a) (b) x y xy x y , Σ , (734.6) 6.3(13.6) r [5(150.17) (6.3) ][5(3, ) (13.6) ] (c) n 5, 5% level, Table 1-4 value n 5, 1% level, Table 1-4 value Since r is greater than the 5% value but less than the 1% value, r is significant at the 5% level but not significant at the 1% level. (d) Since r is significant, calculate and draw the line. See graph in part (a). 5(734.6) (6.3)(13.6) b 3.1 5(150.17) (6.3) 1306 (3.1)(6.3) a The equation of the regression line is y x. (e) There is a positive linear relationship. 636

52 18. (a) (b) x y xy x y Σ (51.34) 6.5(48.) r [5(11.95) (6.5) ][5(530.14) (48.) ] (c) n 5, 5% level, Table 1-4 value n 5, 1% level, Table 1-4 value Since r is not greater than either value, r is not significant at either level. (d) Since r is not significant, we can save ourselves some time and not bother calculating or graphing the regression line. (e) No relationship exists. 637

53 19. (a) (b) x y xy x y , ,809 3, , ,64, ,68 43,964, ,160 79,841 1, ,950 0, , , ,30 184, ,90 176, , ,561 1,04 Σ 5, ,309 3,049,155 13,690 9(03, 309) (5, 007)(336) r 0.94 [9(3, 049,155) (5, 007) ][9(13, 690) (336) ] (c) n 9, 5% level, Table 1-4 value n 9, 1% level, Table 1-4 value Since r 0.94 is larger than both values, r is significant at the 5% and 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 9(03, 309) (5, 007)(336) b (3, 049,155) (5, 007) 336 ( )(5, 007) a.76 9 The equation of the regression line is y x. (e) There is a positive linear relationship. (f) y (500) 33; a 500-foot-tall building would have about 33 stories. 638

54 0. (a) Hours per week Regression Line: y x Age (in years) (b) x y xy x y , , , ,844 1 Σ , (671) ()(18) r [5(10,94) () ][5(83.5) (18) ] (c) n 5, 5% level, Table 1-4 value n 5, 1% level, Table 1-4 value Since r is larger than 0.878, but not larger than 0.959, r is significant at the 5% level, but not at the 1% level. (d) Since r is significant at the 5% level, draw the line. See graph in part (a). 5(671) ()(18) b 0.1 5(10, 94) () 18 ( 0.1)() a The equation of the regression line is y x. (e) There is a negative linear relationship. (f) y (35) 4.73; the number of hours for a 35-year-old would be about

55 1. (a) Amount per month (in dollars) Regression Line: y x ,000 1,100 1,00 1,300 x Income (in dollars) (b) x y xy x y , ,000 5,600 1, ,000 1,440,000 90,000 1, ,000 1,000,000 67, , ,000 55, ,50 7,500 1, ,330 8,649 36,100 1, ,000 1,10,000 6,500 Σ 6,757 1,540 1,530,080 6,645, ,050 7(1, 530, 080) (6, 757)(1, 540) r [7(6, 645,149) (6, 757) ][7(358, 050) (1, 540) ] (c) n 7, 5% level, Table 1-4 value n 7, 1% level, Table 1-4 value Since r is larger than both values, r is significant at the 5% and 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 7(1, 530, 080) (6, 757)(1, 540) b (6, 645,149) (6, 757) 1, 540 (0.3548)(6, 757) a The equation of the regression line is y x. (e) There is a positive linear relationship. (f) y (95) ; a student earning $95 per month would spend about $05.88 on recreation. 640

56 . (a) (b) x y xy x y , , , , , , Σ ,464 18, (, 464) (407)(49) r [10(18,65) (407) ][10(391) (49) ] 0.84 (c) n 10, 5% level; Table 1-4 value 0.63 n 10, 1% level; Table 1-4 value Since r 0.84 is larger than both values, r is significant at the 5% and 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 10(, 464) (407)(49) b (18, 65) (407) 49 (0.8)(407) a The equation of the regression line is y x. (e) There exists a positive linear relationship, except for two points. (f) y (56) 8.388; a 56-year-old employee would miss about 8 days. 641

57 3. (a) (b) x y xy x y ,1 7,569 6, ,096 8,464 7, ,760 4,64 4, ,38 5,184 5, ,550 9,05 8, ,77 6,084 5, ,889 6,889 6, ,70 9,604 9,801 Σ ,318 57,443 55,75 8(56, 318) (673)(661) r [8(57, 443) (673) ][8(55, 75) (661) ] (c) n 8, 5% level, Table 1-4 value n 8, 1% level, Table 1-4 value Since r is larger than both values, r is significant at the 5% and 1% levels. (d) Since r is significant, draw the line. See graph in part (a). 8(56, 318) (673)(661) b (57, 443) (673) 661 (0.8603)(673) a The equation of the regression line is y x. (e) There is a positive linear relationship. (f) y (90) 87.65; a student who got a 90 on the Stat 101 one final would be expected to get an 88 on the Stat 10 final. 64

58 4. (a) (b) x y xy x y Σ , ,46 15(,16) (110)(74) r [15(894) (110) ][15(5, 46) (74) ] (c) n 15, 5% level, Table 1-4 value n 15, 1% level, Table 1-4 value Since r is larger than both values, r is significant at the 5% and 1% levels. 643

59 (d) Since r is significant, draw the line. See graph in part (a). 15(,16) (110)(74) b (894) (110) 74 (1.748)(110) a The equation of the regression line is y x. (e) There is a positive linear relationship. (f) y (8) 19.45; a team with eight wins would expect to get 19 goals. 5. (a) (b) x y xy x y , , ,37,610, , , ,998,090, , , ,916,000, ,800 91, ,84,840, , , ,806,50, , , ,339,560, ,300 68, ,664,890, , , ,061,160,000 Σ ,00 3,111, ,308,400,000 8(3,111,50) (6.5)(396, 00) r 0.1 [8(496.83) (6.5) ][8(0, 308, 400, 000) (396, 00) ] (c) n 8, 5% level, Table 1-4 value n 8, 1% level, Table 1-4 value Since r 0.1 is less than both values, r is not significant at either the 5% or 1% level. (d) Since r is no significant, move one to part (e) 644

60 (e) There is no relationship. (f) Since there is no relationship, we can t make a prediction. 6. (a) (b) x y xy x y ,63, , ,660 3, , , , , , , , ,09 39,10 1,444 1,058, ,345,05 9, , ,16 Σ 3.4 4,706 09, ,89.4 3,4,838 7(09, 091.) (3.4)(4, 706) r [7(15,89.4) (3.4) ][7(3, 4,838) (4, 706) ] (c) n 7, 5% level, Table 1-4 value n 7, 1% level, Table 1-4 value Since r is less than both values, r is not significant at either the 5% or 1% level. (d) Since r is no significant, move one to part (e) (e) There is no relationship. (f) Since there is no relationship, we can t make a prediction. 645

61 7. x y xy x y Σ (15) (15)(35) r 1 [5(55) (15) ][5(85) (35) ] Interchange the values for x and y. x y xy x y b) Answers may vary. c) x y xy x y Σ (0) (0)(8) r 0 [7(8) (0) ][7(196) (8) ] r 0 8.a) Σ (15) (35)(15) r 1 [5(85) (35) ][5(55) (15) ] The value of r is the same. a) The variables seem to have a positive linear relationship. b) Answers may vary. x y xy x y c) 3(36) (6)(14) r [3(14) (6) ][3(98) (14) ] The variables show a definite pattern, so that should mean that there is a relationship. But it s definitely not a linear relationship. If you only use the x values that are greater than zero, then there is a positive linear relationship and the r value is going to back that up. 646

30. Answers may vary. 31. Answers may vary. 3. Answers may vary. Some of the answers for 33 40 are open to interpretation and the explanation of reasoning will vary. 33. Positive 34. Negative 35.

62 30. Answers may vary. 31. Answers may vary. 3. Answers may vary. Some of the answers for are open to interpretation and the explanation of reasoning will vary. 33. Positive 34. Negative 35. None; Some people might argue that a small town school district with fewer primary schools would have students who get a better education that than big urban districts with a lot of primary schools. 36. Negative 37. Positive 38. None, though surprisingly studies show that there is a positive correlation. 39. Negative 40. None Review Exercises 1. Item Tally Frequency 3. highest value lowest value B 4 F 5 G 5 S 5 T 6. Arrange the data in order. Separate the data according to the first digit. Use the first digit as the stems and the second digit as the leaves. Stems Leaves The number of minutes spent on the computers ranged from 1 to 39. The biggest group was the middle group (i.e., in the 0s). The rest were evenly divided between the 10s and the 30s. Start with the lowest value and add 15 to get the lower class limits: 10, 117, 13, 147, 16, 177. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Rank Tally Frequency

4. Draw the bars with heights corresponding to the number. 7.

For the histogram, draw vertical bars corresponding to the frequencies for each class.

Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of graph. 8.

63 4. Draw the bars with heights corresponding to the number. 7. Represent the years on the x axis and the amount earned on the y axis, and then draw lines connecting the points For the histogram, draw vertical bars corresponding to the frequencies for each class. For the frequency polygon, find the midpoints for each class: 109, 14, 139, 154, 169, and 184. Label the horizontal axis with the midpoints. Connect adjacent midpoints with straight lines. Finish the graph by drawing lines back to the horizontal at the beginning and end of graph. 8. Janine s earnings increased at an increasing rate throughout the five years. X X n ,08 9. The data is already in order. The median is the middle value Median 45 Since each value occurs only once, there is no mode Midrange 187 The four measures are completely different. This is probably due to the fact that the handful of very large numbers at the beginning skews the mean and the midrange a lot. 648

64 9. Hours Frequency Midpoint Frequency Midpoint X Answers may vary X X n , R highest value lowest value X X X ( X X) , , , , , , , ( X X) variance n 1 77, , s s 10, ,597.5 The range of 66 tells us that the numbers vary from lowest to highest by a good amount compared to their sizes. The standard deviation tells us that overall the numbers are fairly spread out. 1. Answers may vary. 13. The data values are already in order. (a) Value Number of data values below Percentile 193, % or 8 nd percentile 33, % or 41 st percentile (b) 75% of or 17. The value 171,000 corresponds to the 75th percentile because there are 17 values below it. 14. The data is already in order The median is the middle value Q 45 Find the median of the values less than Q. Q 1 7 Find the median of the values above Q. Q (a) Most of the states have Native American populations less than 100,000; the three biggest states are all outliers, which is what skewed the measures of average so much in question 8. The area between z 0 and z 1.95 is

65 (b) (g) (c) The area between z 0 and z 0.40 is (h) The area to the right of z.00 is (d) The area between z 1.30 and z 1.80 is (i) The area to the right of z 1.35 is The area between z 1.05 and z.05 is The area to the left of z.10 is (e) (j) (f) The area between z 0.05 and z 0.55 is The area between z 1.10 and z 1.80 is The area to the left of z 1.70 is ; so the values 135 and 35 represent standard deviations below and above the mean respectively. Since 95% of the data falls within s of the mean there would be 45(0.95) 43 patients that weigh between 135 and 35 pounds. 650

66 value mean (a) z value mean z The area between z 0 and z 0.40 is and the area between z 0 and z 0.40 is The desired area is (90) 7.9 Approximately 8 values will be between 190 and 10. value mean (b) z The area between z 0 and z 1.6 is Since the desired area is to the right of 40 subtract from (a) (90) 4.95 Approximately 5 values are greater than 40. value mean 4 3 z the area between z 0 and z 3 is Since the desired area is in the right tail, subtract from Hence, the probability that it will take more than 4 years is value mean 3 3 (b) z The area less than z 0 is Hence, the probability that it will take less than 3 years is 0.5. value mean (c) z value mean z The area between z 0 and z.4 is The area between z 0 and z 4.5 is about The area between z.4 and z 4.5 is The probability that it will take between 3.8 and 4.5 years is value mean.5 3 (d) z value mean z The area between z 0 and z 1.5 is The area between z 0 and z 0.3 is The area between z 1.5 and z 0.3 is Hence, the probability that it will take between.5 and 3.1 years is value mean (a) z1 4 3 The area between z 0 and z 4 is about The area between z 0 and z.67 is The area between z 4 and z.67 is The probability that the bus will have between 36 and 40 passengers is value mean 4 48 (b) z 3 The area between z 0 and z is Since the desired area is in the left tail, subtract from Hence, the probability that the bus will have fewer than 4 passengers is (c) value mean z 0 3 The area above z 0 is Hence, the probability that the bus will have more than 48 passengers is

67 (d) 3. The area between z 0 and z 1.67 is The area between z 0 and 0. z 0.33 is Since the desired area is between z 1.67 and z 0.33, subtract 0.19 from to get value mean z 0.75 The area between z 0 and z 0.75 is Since the desired area is in the left tail subtract 0.73 from , Hence, 454 will weigh less than 43.5 pounds. value mean z value mean z The area between z 0 and z 0.38 is The area between z 0 and z 0.5 is The desired area is Hence, will cost between $80.00 and $ The weights are approximately normally distributed. The heights aren t even close. There appears to be a negative correlation x y xy x y Σ (174.3) (6)(1.6) r [7(630) (6) ][7(70.94) (1.6) ] For n 7 and 5% level, the value in Table 1-4 is Since r is larger than this value, r is significant at the 5% level. 7(174.3) (6)(1.6) b 0.1 7(630) (6) 1.6 ( 0.1)(6) a The equation of the regression line is y x. y (9) 3.06, so 3.06 is the best predicted GPA for someone who watches 9 hours of TV a week. 4. (a) Answers may vary. (b) r 0.16 is not significant; Answers may vary. 65

Chapter Test 1. Source Tally Frequency W 6. Draw the bars with heights corresponding to the frequency. L 7 K 7 E 5 3. Source Frequency f f degrees 360 n percent f 100% n W 6 6 360 86.

15 number of classes 8 which rounds up to 00. Start with the 300 and add 00 to get the lower class limits: 500, 700, 900, 1,100, 1,300, 1,500, 1,700, and 1,900.

68 Chapter Test 1. Source Tally Frequency W 6. Draw the bars with heights corresponding to the frequency. L 7 K 7 E 5 3. Source Frequency f f degrees 360 n percent f 100% n W % 4% 5 L % 8% 5 K % 8% 5 E % 0% 5 n 5 4. a) highest value lowest value 1, ,59 difference 1, number of classes 8 which rounds up to 00. Start with the 300 and add 00 to get the lower class limits: 500, 700, 900, 1,100, 1,300, 1,500, 1,700, and 1,900. Set up the classes by subtracting one from each lower class limit except the first lower class limit. Class Tally Frequency , ,100 1,99 0 1,300 1, ,500 1, ,700 1,

69 6. Represent the years on the x axis and the wages on the y axis, and then draw lines connecting the points. b) c) There are 30 data values, and 643 is the nd data value, so there are 1 data values below % 30. It is the 70th percentile. d) 0% of The value $44 million corresponds to the 0th percentile because there are 6 values below it. e) The values 1,000, 1,400 and 1,850 are outliers. 5. Arrange the data in order. Separate the data according to the first two digits. Use the first two digits as stems and the last digits as leaves. Stems Leaves The scores range from 00 to 60. Except for a small portion at the lower end and at the higher end, the scores are very much evenly spread out. The graph shows an increase in the wage for all periods. The two steepest increases were during the jumps from 1975 to 1980 and from 005 to 010. Analysis may vary. 7. (a) (b) Arrange the data in order The median is the middle value. median 85 (c) Since each value occurs only once, there is no mode. (d) L+ H Midrange 84 (e) R highest value lowest value

(f) X X X ( X X) 87.9 8.41 (b) 85 0.9 0.81 80 4.1 16.81 78 6.1 37.1 83 1.1 1.1 The area between z 1.56 and z 1.96 is 0.475 0.441 0.034. 86 1.9 3.61 90 5.9 34.81 (c) (g) 10.87 ( X X) 10.87 variance 17.

Errors Frequency Midpoint Frequency Midpoint (d) 0 1 1 1 3 5 3 4 1 6 8 4 7 8 9 11 1 10 10 1 14 1 13 13 10 64 (e) The area to the right of z 1.8 is 0.500 + 0.400 0.900. 64 X 6.4 10 9.

70 (f) X X X ( X X) (b) The area between z 1.56 and z 1.96 is (c) (g) ( X X) variance 17.1 n 1 6 s s The area between z 0.06 and z 0.73 is Errors Frequency Midpoint Frequency Midpoint (d) (e) The area to the right of z 1.8 is X (a) The area between z 0 and z 1.50 is The area to the left of z 1.36 is value mean (a) z1 1 4 value mean z The area between z 0 and z 1 is The area between z 0 and z 1.5 is The desired area is Hence, the probability that it will take between 34 and 35 minutes is

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good