Frequency 0 2 4 6 8 12 Chapter 2: Descriptive Statistics (Part 1) 2.1: Frequency Distributions and their Graphs Definition A frequency distribution is something (usually a table) that shows what values a variable can take, and how often it takes each value. They aren t hard to make just count! so I won t demonstrate how to create one; rather, I ll focus on what to do with such a distribution if it is given to you specifically, make a graph! Frequency Histograms Some people (mistakenly) use the terms bar chart and histogram synonymously they are not the same thing. For a histogram, the horizontal axis is a quantitative variable, and the vertical axis is the frequency (count). The y-axis can also be labeled with Relative Frequency (percent), Cumulative Frequency (total count for this, and all previous groups), or Cumulative Relative Frequency (total percent for this, and all previous groups). The variable s values are divided into groups (bins, intervals, classes, buckets there are many different names), and a bar is drawn (of the appropriate height) showing the amount (count, percent, etc.) of the data that fall in that group. The bars will touch, unless there were no data in a group (leaving an empty spot where a bar might have been). Cabbage Mass 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Mass (kg) Figure 1 - A Frequency Histogram We could get into a lot of detail about dividing the data into those groups, and where the boundaries of those groups are but let s not. Let s use technology to create our histograms. Let s focus on getting the technology to produce a good histogram. For that, I have a fairly simple algorithm. This procedure works for the TI 83/84 series of calculators. First (naturally), put the data into your calculator. Next, select a histogram from the Stat Plot menu. From the ZOOM menu, choose Zoom Stat. Alas, this is almost certainly a bad histogram. Let s take a minute to fix a couple of things to create a good histogram. HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 1 OF 11
Frequency 4 6 8 10 12 Press the WINDOW button. The values you see here were put there by the calculator, and two of them control how the graph looks (and need to be changed). Scroll down to XSCL and choose a value that you could count in multiples of without much difficulty (5 and 10 are great choices here). Larger values here will create fewer bars in the graph; smaller values will create more bars in the graph. Next, scroll to XMIN and make this number a multiple of the value you have for XSCL. If you must change the value, be sure to make it smaller than the value that was there in the first place! The calculator automatically puts the smallest datum as this value, so if you put in something higher, you will lose part of the graph. Finally, press GRAPH. You should now have a good histogram or at least a better one. You might want to go back to WINDOW and change YMAX (higher) to be able to see the top of every bar, or change XMAX (higher) to see all of the rightmost bar. Ideally, a good histogram has between 5 and 15 bars. Fewer than 5 bars makes it hard to say anything interesting about the variable; more than 15 bars creates a confusing jumble of bars that are hard to interpret. All of this is creates a frequency histogram a histogram where frequency (count) is the vertical axis. I said earlier that there were lots of possible options for the y-axis and each of those options creates a slightly different kind of histogram. It s probably more important that you can read these other types, rather than create them. For all types, be sure to scale and label each axis. Scale means to write out the values of the variable along the axis; label means to tell the name of the variable being measured along that axis. That s a lot of detail about histograms take that as a hint about how important they are in the grand scheme of statistics. Frequency Polygons Take a histogram make a dot at the top center of each bar connect the dots and erase the bars and now you have a frequency polygon. Cabbage Mass 1.5 2.0 2.5 3.0 3.5 4.0 Mass (kg) Figure 2 - A Frequency Polygon It is not difficult to create a frequency polygon. In fact, I told you exactly what to do in the paragraph above! HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 2 OF 11
Relative Frequency 0.00 0.10 0.20 Relative and Cumulative Frequency The word frequency occasionally takes a modifier. The word relative changes the meaning from count to percent. To change a count to a percent, divide the count by the total. Relative frequency histograms and polygons look exactly like plain frequency histograms and polygons only the vertical scale changes. Cabbage Mass 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Mass (kg) Figure 3 - A Relative Frequency Histogram The word cumulative means to accumulate (add) the prior groups. Graphs of this type will always appear to be going up as you look at them from left to right. The number for each group is the total for that group plus all previous groups. You really have to write this out to create one it is difficult to get your calculator to make one. It is probably more important that you are able to read this type. Examples [1.] During an study on car safety, the braking distance (feet) was measured for a car traveling at several different speeds. The data are as follows: Table 1 - Distances for Example 1 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 84 36 32 48 52 56 64 66 54 70 92 93 120 85 46 68 46 34 Construct a histogram of these data. Let your calculator do the work for you. Here s what I got: HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 3 OF 11
Relative Frequency 0.00 0.15 0.30 Frequency 0 5 10 15 Car Safety Study 0 20 40 60 80 100 120 Braking Distance (ft) Figure 4 - Histogram for Example 1 (by the way you do not need to include a title on your graphs in this class) [2.] 31 black cherry trees were harvested and the amount of usable lumber (cubic feet) from each tree was measured. The results are as follows: Table 2 - Volume of Lumber for Example 2 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 31.7 36.3 38.3 42.6 55.4 55.7 58.3 51.5 51.0 77.0 33.8 27.4 25.7 24.9 34.5 21.3 19.1 22.2 21.4 21.0 Construct a relative frequency histogram of these data. Again, let your calculator do the work all you have to do is change the vertical scale. Cherry Tree Lumber 10 20 30 40 50 60 70 80 Volume (cubic feet) Figure 5 Relative Frequency Histogram for Example 2 [3.] Members of the National Cholesterol Education Program conducted a study of cooking methods designed to lower the amount of cholesterol in cooked beef. The percentage reduction in cholesterol content for 24 samples are given below. HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 4 OF 11
Relative Frequency 0.10 0.20 0.30 Table 3 - Cholesterol Reductions for Example 3 7.6 18.2 14.2 5 24.3 16.3 16.3 23.4 24 20 24 26.7 35.2 39.6 32.6 32.8 40.1 40.5 44.8 48.8 45 36.6 34.9 39.9 Construct a relative frequency polygon of these data. Make the histogram and connect the dots. Cholesterol Study 10 20 30 40 Reduction (%) Figure 6 - Relative Frequency Polygon for Example 3 2.2: More Graphs and Displays There are more lots more! Stemplots Also called a stem and leaf plot, this uses the actual data and place value to create a graph (which is not terribly different from a histogram!). Here s an example: - Heat Emitted - 7 349 8 48 9 36 10 3499 11 36 *where 7 3 means 73 calories per gram of cement. Turn your head to the right and you get something that looks a lot like a histogram! The rightmost digit of the data becomes the leaf (on the right of the vertical bar), and the remainder of the digits of the data become the stems. The stems take the place of the groups from histograms so you want between 5 and 15 stems in a stemplot. Since stemplots are based on place value, it is a little harder to fix them if they don t look just right. There are two things that you can do to adjust a stemplot: split the stems, or round the digits. HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 5 OF 11
If there are too few stems, then you can split stems. There are 10 digits in our number system, and there are two ways to split 10 things into equal groups so there are two ways to split stems. When you split stems, the leaves also get split. For example, if you split the stems into two, then half of the leaves will go with one stem (leaves 0 through 4) and the other leaves will go with the other stem (5 through 9). Here s an example: - Cabbage Mass - 1 0344 1 556666777789 2 000122222234 2 556666788888 3 00111223 3 567888 4 022333 *where 1 0 means 1.0 kg If there are too many stems, then round the data one place value and try again. For example, that stemplot above showing calories per gram was rounded the datum 73 was originally 72.5. Dotplots A dotplot is much like a histogram, except that stacks of dots are used instead of bars. Also, dotplots work best when the data are granular only certain values (or multiples of certain values) occur. As an example, the data about cabbages are all rounded to the nearest tenth, so they make a nice dotplot: 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Cabbage Mass (kg) Figure 7 - A Dotplot I think you can figure out how to make one of these on your own. A vertical axis is rarely included, since one dot usually represents one datum. If one dot represents more than one datum, then a legend needs to be included. Pie Charts This is another type that I believe you ve seen before. To construct one, extend your frequency table so that you can measure relative frequency the percent of the total. Convert those percents to angles, and then measure out the pie slices accordingly. When you re doing this by hand, just make the angle measures close there s no need to go and use a protractor. If you need something that exact, use software. Actually, out in reality you should probably try to avoid using pie charts in the first place. They are almost always a poor choice when trying to display data. If you do use one, make sure to label each pie slice in some manner. HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 6 OF 11
and never ever, under any circumstances, construct a 3D pie chart! Figure 8 - A Pie Chart Bar Graphs I would find it hard to believe that you ve never seen a bar chart before. There are, though, several varieties! We ll focus on just two standard, and Pareto. Like a histogram, the values of the variable are on the horizontal axis and the counts are on the vertical axis. Note that there are spaces left between the bars that is an important difference. Two words of caution first, never ever, under any circumstances, construct a 3D bar chart. Second, be sure to scale and label each axis. Figure 9 - A Bar Graph Pareto Charts A Pareto Chart is a bar chart where the values of the variable are arranged so that the bars decrease in height from left to right. Figure 10 - A Pareto Chart HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 7 OF 11
Vitamin C Content 40 60 80 Scatterplots Scatterplots are another graph that I m sure you ve seen before. These plots use two variables one will be labeled as x and the other will be labeled as y. The data will be paired so that you can look at them as a bunch of xy, pairs plot a point for each pair. You can get your calculator to make these graphs, but it is probably just as easy to create one by hand. Don t forget to scale and label each axis! Vitamin C in Cabbages Figure 11 - A Scatterplot 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Head Mass (kg) Examples [4.] Short-term investments are methods (groups, funds, etc.) that take a sum of money and return that money with interest in a relatively short time (as little as one month). The following are the maturity times (how long you ve got to wait to get your money back) in days for 40 short-term investments. Table 4 - Maturity Times for Example 4 70 62 99 85 51 55 57 75 60 64 64 38 68 79 36 81 53 56 69 89 99 67 95 83 63 80 47 71 78 87 55 70 86 70 66 98 50 51 39 65 Let s make a stemplot. The stem will be the first digit which is between 3 and 9. The rightmost digit will be the leaf. Here s what I get: 3 689 4 7 5 01135567 6 0234456789 7 0001589 8 0135679 9 5899 where 3 6 means 36 days HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 8 OF 11
# of Incidents 0 5 10 15 20 [5.] A report on road rage measured the day of the week that each of 69 incidents occurred. The results are shown below. Table 5 - Road Rage Incidents for Example 5 F F F W Sa Tu F Sa M Th W Su Tu M W Su Th Th W Sa Tu W Sa W Th M Th Sa F Tu Su F Tu F F Su Tu F Th F Tu F W F Sa Tu W Tu F F Th W W Tu Tu F Th Th W Su F F Th W Th M Sa M F Let s make a bar chart. First, make a frequency distribution of the data. Table 6 - Road Rage Frequency Distribution Day Su M Tu W Th F Sa # 5 5 11 12 11 18 7 Now, graph. Road Rage Incidents Su M Tu W Th F Sa Figure 12 - Bar Graph for Example 5 Day of Week [6.] Now let s make a Pie Chart of the road rage data from above. Use the frequency distribution of the data from the last problem. Extend that to include proportions by dividing by the total. Table 7 - Road Rage Frequencies (extended) Day Su M Tu W Th F Sa # 5 5 11 12 11 18 7 % 0.072464 0.072464 0.15942 0.173913 0.15942 0.26087 0.101449 Finally, multiply those proportions by 360 to get angle measures. Table 8 - Road Rage Frequencies (extended further) Day Su M Tu W Th F Sa HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 9 OF 11
# 5 5 11 12 11 18 7 % 0.072464 0.072464 0.15942 0.173913 0.15942 0.26087 0.101449 Angle 26.08696 26.08696 57.3913 62.6087 57.3913 93.91304 36.52174 Road Rage Incidents Tu W M Su Th Sa Figure 13 - Pie Chart for Example 6 F [7.] Does the size of a home determine its price? Here are data from a sample of nine homes in Phoenix, Arizona. The size is in hundreds of square feet; the price is in thousands of dollars. Table 9 - Home Data for Example 7 size, x price, y 26 259 27 274 33 294 29 296 29 325 34 380 30 457 40 523 22 215 Let s make a scatterplot of the data. HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 10 OF 11
Price (K$) 250 350 450 Phoenix Homes 25 30 35 40 Figure 14 - Scatterplot for Example 7 Size (sq. ft.) HOLLOMAN S PROBABILITY & STATISTICS SCP CHAPTER 02A, PAGE 11 OF 11