Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations of numerical information and can help reveal patterns and highlight results better and faster than words or even tables. A good, clear graph will make your data easy to understand and your readers will thank you. Some general guidelines for making graphs in science are: Use clearly visible symbols, which are more noticeable than axis labels. Reduce clutter. For example, use only 4 6 tick marks per axis. Data labels should be offset from the axes labels to ensure that they are not confused; appropriate abbreviations can help to keep labels short. Design your graph without color use grayscale instead. While a color graph may look impressive on a web page, color printing is expensive and many pdf files will be printed using a black and white printer, which may result in lost detail. An exception to this is for oral presentations or posters. DO NOT put titles on your graph. These should be included in the Figure headings, which go BELOW the graph. All axes should be labeled correctly, including units if appropriate. Never include horizontal lines. You want your graphs to look as clean as possible, so make sure you remove all axes lines in the graph. To do that, simply right- click on one of the axes lines and hit delete. Not all graphs are good, however. Graphs may unintentionally mislead or misrepresent data. The following bar graph misrepresents data by visually suggesting an equal interval between sampling dates: 6 and 23 years, respectively. Furthermore, the meaning of the error bars (standard error? 95% confidence interval?) was not explained in the accompanying caption. Figure 1. An example of a poorly designed graph that misrepresents the sampling of data. Note that there are error bars, but we aren t told what they mean. 1
Another example of improper interpretation of graphs is when a line graph showing a correlation is used to infer or suggest causation. This is an obvious problem, and one that is encountered far too often in the popular media. Bloomberg Business Week magazine ran an article about this very topic, with the following graph showing how correlation does not equal causation. While these are clearly extreme (and silly) examples, the message is clear. Figure 2. Graphs illustrating that correlation is not the same as causation. Available online at: http://www.businessweek.com/magazine/correlation- or- causation- 12012011- gfx.html LAB OBJECTIVES To become familiar with several different types of common graphs, including when they are used. To interpret data presented in graphical formats To learn how to use Excel to make graphs using data from class. 2
TYPES OF GRAPHS Bar charts or column charts display data as a series of vertical or horizontal bars whose heights indicate the number or proportion (%) of values in each category along the x- axis. These are very often the best visual representation of a table. They can also be used to compare two categorical variables. Figure 3. An example of a column chart showing the amount of different types of trash found on a beach. Bar charts can also be nested (bar charts) or stacked (column charts) meaning that a single bar is constructed for each category of the 1st variable & divided into segments, which are proportional to the count/percentage of values in each category of the 2 nd variable. In this case, counts should sum to the no. of values in the dataset; percentages should sum to 100%. Figure 4. An example of a nested bar chart, showing the percentage of people in 4 different states that have different levels of education. Notice that this is also an example of a bar chart that id difficult to interpret in greyscale. 3
To interpret, ask yourself: How do the heights of the bars compare? Are some elements/values significantly different from others? Can you generalize relative proportions? Practice making a bar chart and a column chart by following the tutorials here: http://www.excel- easy.com/examples/column- chart.html http://www.excel- easy.com/examples/bar- chart.html Special note about error bars There are three main kinds of error bars and each gives us different types of information. Depending on our information available and the question we want to answer, we will choose one of the following: 1. Standard deviation (SD) estimates the variability around a mean. These are presented as Most blonds have an IQ between 70 and 130. Or Most brunettes have an IQ between 90 and 150. 2. Standard error (SE) estimates the certainty of the mean. SE bars tend to be smaller than SD bars, so people use them more often; they make their data look better Generally speaking, the more subjects you have, the more certain your mean will be. And the less variable your population, the more certain your mean will be 3. 95% confidence intervals (95% CI) tell us that there is a 95% chance that the interval contains the true mean. See Figure 1 for an example of a graph that contains error bars. Practice making column charts with error bars using the tutorial here: http://www.excel- easy.com/examples/error- bars.html Histograms are used to show groups of continuous variables. Values are divided into a series of intervals, usually of equal length. Data are displayed as a series of vertical bars whose heights indicate the number or proportion (%) of values in each interval. The overall shape of these bars tells us a lot of information about the distribution of our data. Figure 5. Examples of the various distributions that histograms can show us. To interpret, ask yourself: Is it symmetric? Is it skewed? What does the shape mean for your data? 4
Is there more than one peak? What is the range of the intervals? Is the shape wide or tight (ie, what s the variability of your data?) Practice making a few histograms by following the tutorial here: http://www.excel- easy.com/examples/histogram.html Scatterplots graph a response variable (ie, outcome) along the y- axis and the explanatory variable (ie, predictor; risk factor) along the x- axis. Each subject or sample is represented by a single point. Scatterplots often include lines depicting an estimate of the linear/non- linear relation/association. This line, called a regression line, best- fit line, or trendline, gives us the degree of association or correlation between the x and y variables; the correlation is presented an R 2 value. An R 2 value closer to zero indicates a higher correlation. Figure 6. A scatterplot showing the relationship between calories consumed and weight gained. Note the linear regression line with the associated R 2 value (correlation coefficient) on it. To interpret, ask yourself: What is the overall pattern? Is there a positive association? A negative association? Is the relationship linear or non- linear (ie, a curve)? How strong is the association? (i.e. How tightly clustered are the points? How variable is association?) Are there outliers? Is there potentially 3rd lurking variable that is related to both variables tha may confound the association? Practice making a few scatterplots by following the tutorial here: http://www.excel- easy.com/examples/scatter- chart.html Practice adding trendlines lines to a scatterplot using the tutorial here: http://www.excel- easy.com/examples/trendline.html 5
Boxplots are less frequently used than bar or column charts, but they can convey much more detailed information, including the minimum, first quartile, median, third quartile, and maximum (Figure 7). Unlike bar charts, which are appropriate for count data or for data that range from zero up, boxplots are better for showing the medians and quartiles of data and they show much clearer that the range and standard error in samples (Figure 8). Interpreting box plots is relatively straightforward after you understand what they mean. Figure 7. Description of and how to interpret the parts of a box plot. Figure 8. The same three samples plotted using bar charts with SE bars on the left and box and whisker plots on the right. Excel and most other spreadsheet programs do not plot box- and- whisker plots automatically, however we can trick it. Practice making a box chart using the tutorial here: http://www.dummies.com/how- to/content/boxandwhisker- charts- for- excel.html 6
LAB REPORT As always, make your graphs in excel and copy and paste them into a SINGLE word document. Submit your homework online through the dropittome website. Using the dataset from last week s lab ( class data under Lab 8 on the course website), make one of each of the five types of graphs above. Use the data you think is most appropriate for each graph type. For example, you could make a histogram of the distribution of hours of exercise by creating bins of 0 hours, 1-2, 3-4, etc. You MUST follow all formatting guidelines and give a figure heading under each graph, including your interpretation of what the graph means. 1. A column chart, including standard error bars 2. A histogram 3. A scatterplot with regression line 4. A box plot 7