Lecture 3 - Data Visualization Module 2
A. Data is the raw material used to create information. B. Data collection gathers information measurements. C. Knowledge is measured by the number of data collected. 0% 0% 0% A. B. C.
A. It employs a lot of people B. It produces data visuals C. It gathers data measurements for further analysis 0% 0% 0% A. B. C.
A. True B. False 0% 0% A. B.
A. True B. False 0% 0% A. B.
A. True B. False 0% 0% A. B.
A. Satellite images B. Monitoring networks C. Sampling 0% 0% 0% A. B. C.
A. Data from outer space B. Data with coordinates C. Data referring to environmental resources 0% 0% 0% A. B. C.
http://www.wordle.net/ 5 Words about Climate Change by SCI103 Community 2015/2016
Having the data is not enough. I have to show it in ways people both enjoy and understand. (Prof. Hans Rosling) Raw data does not provide much insight unless it is processed and presented. The ways data is presented have a huge impact in providing meaningful analysis and interpretation. The human brain retains more the information contained in visuals rather than the information conveyed by written words or words spoken in a conversation. Visual tools are crucial for effective data communication.
Some of the worst data visualizations are the ones you have to stare at for several minutes before you even comprehend what they re trying to say. Turn data into something more engaging. There should be a swift aha! moment within seconds after someone sees your data. https://youtu.be/adszjzb-ax8
Goal of data visualization: communicate information clearly and effectively using graphics. Graphics: visual images presented in a surface such as paper or a computer screen. Data visualization: effective techniques used to communicate data or information by encoding it as visual objects able to synthesize large amounts of data. Visuals need to combine functionality and design to convey information intuitively.
HTTPS://YOUTU.BE/MKEXX7SDXAI Charts and tables: synthesize and display data (qualitative and quantitative) Charts Tables Source: Goulburn Murray Water
Conceptual diagrams and infographics synthesis, visualisation and context Conceptual diagram Infographics Source : Bureau of Meteorology http://images.wookmark.com/96600_post1_new.jpg
Satellite images and maps - geographic context Satellite image Map Source : Bureau of Meteorology
Video Photographs provide unique information Video clips can help to tell a story by capturing motion, perspective and sound Photograph Source: L Huzzey
Graphic visual representation of complex data/information, quickly and clearly Include traditional data visuals such as charts and tables Right combination of design and content to be effective https://doms.csu.edu.au/csu/thumbs/2aa657f5-7f7a-45a6-85b4-3610fd378eba/1/ad6a21bc-9c86-4640-b42a-dd470fa45d1a
http://images.sixrevisions.com/2009/05/09-03_coffee_drinks.jpg http://graphs.net/wp-content/uploads/2012/11/worlds-10-popular-books-sold-in-last-50-years.jpg
"Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency" (Prof. Edward Tufte) Graphical excellence is of extreme relevance when data is used to perform analytical tasks such as making comparisons or determining causality. Bad visuals might distort the data, making it harder to understand or compare & ineffective and poorly presented information can lead to misinformation.
Understand the data you are trying to visualize including its size and cardinality (the uniqueness of data values in a column) Determine what you are trying to visualize and what kind of information you want to communicate Know your audience and understand how it processes visual information Use a visual that conveys the information in the best and simplest form for your audience.
Guideline 1: create the simplest graph that conveys the information you want to convey Guideline 2: consider the type of encoding object (points, lines, and bars) and attribute (point position, line length, color) used to create a plot Guideline 3: focus on visualizing patterns or on visualizing details, depending on the purpose of the plot Guideline 4: select meaningful axis ranges Guideline 5: data transformations and carefully chosen graph aspect ratios can be used to emphasize rates of change for timeseries data
Guideline 6: plot overlapping points in a way that density differences become apparent in scatter plots Guideline 7: use lines when connecting sequential data in time-series plots Guideline 8: aggregate larger datasets in meaningful ways Guideline 9: keep axis ranges as similar as possible to compare variables Guideline 10: select an appropriate color scheme based on the type of data
summarize and describe vast amounts of information in a compact, efficient and eye-catching way (Ducklan & Martin, 2002) Useful for data analysis, visualization & communication: present raw data present results of fairly complex analyses, summarise information, expose unanticipated characteristics of data, suggest hypotheses which may be further investigated.
Bar charts Histograms Pie charts Graphs http://www.statmethods.net/graphs/images/pie2.jpg
Y-axis: continuous data (count, value or percent) Display categorical data being each category independent of the others Used to compare a variable across a number of different groups, showing the size of each group (the length of each bar is proportional to the value it represents) Two main types of bar charts: horizontal and vertical. Horizontal bars usually represent a single period of time whereas column bars may represent similar items at different times Can also be displayed as subdivided bar charts where different variables are represented in the same column X-axis: discrete data Gaps between bars http://www.statmethods.net/graphs/images/barplot3.jpg
Type of a bar chart since data is displayed data using bars/columns but bars are placed next to each other Used to display frequency values, or the number of values that fall within the same category or interval (represented in the x-axis). In statistics, histograms are a graphical representation of the distribution of data To construct a histogram: divide the entire range of values into a series of categories; count how many values fall into each category; draw a rectangle with height proportional to the count and width equal to the category size.
Y-axis: continuous data (frequency values) X-axis: interval data
Mainly used to compare proportions To construct a pie chart: calculate the relative proportion of data in each category the divisions of a pie (proportions or segments) add to a whole (100%) Labels are nominal or ordinal data Proportions are ratio data It is generally recommended to avoid pie charts for data analysis and visualization: humans process easily differences in line length than surface area: it is more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison) pie charts usually encode only a handful of numbers and a table is usually a much more efficient way to present such information
A line showing the relationship between two or more variables (line graphs and scatterplots) Advantages: Display of high information density, sometimes with no loss of data Rapid assimilation of the overall result Clearly display of complex relationships among multivariate data Graph interpretation: Height of the line (or series of lines) Patterns (seasonal pattern, trend or a combination of both)
Y-axis: Discrete and continuous data Scale should start at zero Y-axis: Continuous data Shows a trend Relationship between two variables by plotting their (x,y) positions X-axis: Discrete and continuous data http://www.statmethods.net/graphs/images/linechart1.png X-axis: Continuous data
Columns and rows filled with data Used for summarising results and data comparison Tables vs charts Tables are usually a better option than a chart when only few data points need to be displayed. If exact numeric values are required, a table is best since it can be hard to represent the exact values in a chart axis. In a thesis or research report, the detail and precision of tables may be more important since they are a repository of information.
(Swires-Hennessy, 2014)
Swires-Hennessy, E. (2014). How to Communicate your Message Effectively. Hay, I. (2012). Communicating in geography and the environmental sciences. Thomas, J. E., Saxby, T. A., Jones, A. B., Carruthers, T. J. B., Abal, E. G. & Dennison, W. C. (2006) Communicating science effectively: a practical handbook for integrating visual elements. Kelly, D., Jasperse, J. & Westbrooke, I. (2005). Designing science graphs for data analysis and presentation: the bad, the good and the better. Schwabish, J.A. (2014). An Economist s Guide to Visualizing Data.
Show the data Reveal content Avoid distorting data Present many numbers in a small space Encourage comparison of datasets Reveal data at several levels of details Serve a clear purpose Be integrated with verbal and statistical descriptions of data Consistent style and format
Concise and Comprehensible (present only the information that is relevant and required to support the content Who?, What?, How? ) Independent (someone who has not read the document associated with the graphic should be able to look at it and understand what it means) Referenced
http://abacus.bates.edu/~ganderso/biology/resources/writing/graphparts2003.gif http://abacus.bates.edu/~ganderso/biology/resources/writing/population_variation_table_gif.gif
Chart axis Axis labels legible and easy to find and easily associated with the axis/object depicted Labels on the graph should be clearly offset from the data or outside the axes When appropriate the units of measurement should be displayed Start the y-axis at zero when graphing numbers http://www.owlnet.rice.edu/~labgroup/pdf/excelplot.pdf
Chart axis Range of axis scale allows the full range of data to be included. Data points properly spaced Tick marks placed at sufficiently frequent intervals for a reader to work out accurately the value of each data point Time shown as the x-axis with time progressing from left to right & time intervals should be equal http://www.owlnet.rice.edu/~labgroup/pdf/excelplot.pdf
Chart type Two or more datasets must be easily distinguished from one another No more than 4 simultaneous symbols, values or lines, and each line or symbol should be sufficiently different from the others Overlapping symbols or lines must be visually separable Use vertical axes on the left and right sides of the graph to depict different scales when comparing datasets with different measurements O.P. Yakutina, T.V. Nechaeva, N.V. Smirnova, Consequences of snowmelt erosion: Soil fertility, productivity and quality of wheat on Greyzemic Phaeozem in the south of West Siberia, Agriculture, Ecosystems & Environment, Volume 200, 1 February 2015, Pages 88-93, ISSN 0167-8809, http://dx.doi.org/10.1016/j.agee.2014.10.021.
Table number (unique number for each table to be easily identified) Table title (self-explanatory, above the table) Column headings (explain meaning of the data, including units of measurement) Table notes (supplementary information, below the table) Table source (references)
Excel Table with Wagga Wagga climate data (September 2005 to August 2006) (Bureau of Meteorology) Formatting the table to make it effective Highlighted rows show the days in September 2005 where the rainfall was greater than evaporation
Figures and tables are numbered separately (e.g. Table 1, Table 2, Figure 1, Table 3, Figure 2) Figures and tables must always be properly referred to and captioned in the text. This means that a figure or table must be mentioned in the text before the figure or table appears Figures and tables captions: should be informative without being too long; if the data is sourced from elsewhere then this should be referenced in the caption.
Thomas et al., 2006
Table captions go above the table Figure captions go below the figure Graham S. Leonard, Carol Stewart, Thomas M. Wilson, Jonathan N. Procter, Bradley J. Scott, Harry J. Keys, Gill E. Jolly, Johnny B. Wardman, Shane J. Cronin, Sara K. McBride, Integrating multidisciplinary science, modelling and impact data into evolving, syn-event volcanic hazard mapping and communication: A case study from the 2012 Tongariro eruption crisis, New Zealand, Journal of Volcanology and Geothermal Research, Volume 286, 1 October 2014, Pages 208-232, ISSN 0377-0273, http://dx.doi.org/10.1016/j.jvolgeores.2014.08.018.
A title is not always used if the chart is used as a figure in the text and it is included in as part of the caption (a stand-alone chart in a presentation would normally have a title) Choose the right typeface style Avoid using different typefaces: use bold, italics, capitals, small caps or contrasting colours to create contrast and emphasis If possible, avoid putting values on charts (if numbers are needed use tables instead) Eliminate all redundant terms For tables, ensure that all values for the same variable have the same number of decimal points and ensure decimal points are aligned
Only use 2D charts for 2D data (i.e. to plot two variables) 3D visual appearance distorts the data and avoids a clear interpretation
Use colour, but use it with caution & Try to design your chart without the use of colour If it reproduces well in black and white it will be able to be reproduced in any medium Black and white design: make patterns in columns as contrasting as possible Don t make shading too gradual
Use the right resolution for your graphics Resolution measured as number of dots per inch (dpi) or number of pixels (given as the width and height of the image or as the total number of pixels in the image). More pixels > more resolution > file size Different figures formats (*.jpg, *.tif, *.gif, *.png, *.eps): resolution-dependent (image quality changes with compression) and resolution-independent (same quality even when we change size)
Multivariate data refers to data that is measured for more than 2 variables (bivariate data refers to 2 variables) Multivariate charts: Scatterplot matrices (continuous data) Mosaic plots (categorical data) Example multivariate data
Displays the relationship among two or more categorical variables Used for representing frequency tables (i.e. the number of times a data value occurs) Example: Mortality rates aboard the Titanic vary for males and females. Among females, 67% survived (coded as 1) and 33% died (coded as 0). https://medschool.vanderbilt.edu/cqs/files/cqs/media/drtsai2_0.pdf
Multidimensional data refers to measurements of variables in more than 2 dimensions (or 2D, normally referring to the Cartesian plot with the x,y axis) Common multidimensional charts: 3D plots where time (t) and depth (z) are frequently chosen as the third dimension
Next lecture: What is spatial data? Spatial data collection (GPS) Go through Module 2 in your Learning Modules The information presented here is important for Assessment 2a) and 2b)