Data Visualization Principles for Scientific Communication

Similar documents
Data Visualization Principles for Dashboard Design

Introduction to Minitab 1

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

Step 10 Visualisation Carlos Moura

Multivariate Data & Tables and Graphs

Data Visualization (CIS/DSC 468)

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

We will start at 2:05 pm! Thanks for coming early!

Econ 2148, spring 2019 Data visualization

DSC 201: Data Analysis & Visualization

THINKING VISUALLY: AN INTRODUCTION TO DATA & INFORMATION VISUALIZATION

DSC 201: Data Analysis & Visualization

CIS 467/602-01: Data Visualization

STAT1010 cautions on graphics

Chapter 2: Descriptive Statistics (Part 1)

Introduction to R for Epidemiologists

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Data Analyst Nanodegree Syllabus

Chapter 2 - Graphical Summaries of Data

8. MINITAB COMMANDS WEEK-BY-WEEK

Reporting Services Tips for the Stephen Few Fan

Install RStudio from - use the standard installation.

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Basic statistical operations

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Data Analyst Nanodegree Syllabus

Excel Tips and FAQs - MS 2010

cs6630 September VISUAL ENCODING Miriah Meyer University of Utah

HOUR 12. Adding a Chart

You are to turn in the following three graphs at the beginning of class on Wednesday, January 21.

Learning Objectives for Data Concept and Visualization

Levels of Measurement. Data classing principles and methods. Nominal. Ordinal. Interval. Ratio. Nominal: Categorical measure [e.g.

Working with Charts Stratum.Viewer 6

Geography 222 Quantitative Color for GIS Mike Pesses, Antelope Valley College

To make sense of data, you can start by answering the following questions:

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Statistical Tables and Graphs

An introduction to ggplot: An implementation of the grammar of graphics in R

Last Time: Data and Image Models

Creating a Basic Chart in Excel 2007

Today s Hall of Fame and Shame is a comparison of two generations of Google Advanced Search. This is the old interface.

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Statistics Lecture 6. Looking at data one variable

Geography 281 Map Making with GIS Project Three: Viewing Data Spatially

The Science of Data Visualization

Chpt 2. Frequency Distributions and Graphs. 2-4 Pareto chart, time series graph, Pie chart / 35

Creating a Histogram Creating a Histogram

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Data Has Shape. Did you know? Data has Shape! Examples. My Data What do you think the shape of height data for this class looks like?

Age & Stage Structure: Elephant Model

LAB 2: DATA FILTERING AND NOISE REDUCTION

Poster-making 101 for 1 PowerPoint slide

3 Visualizing quantitative Information

Few s Design Guidance

Excel 2013 Intermediate

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

15-Minute Fix: A Step-by-Step Guide to Designing Beautiful Dashboards

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

CP SC 8810 Data Visualization. Joshua Levine

Creating and Modifying Charts

Submission Guideline Checklist

3 Graphical Displays of Data

AP Statistics Summer Assignment:

Visualization Re-Design

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

S. Rinzivillo DATA VISUALIZATION AND VISUAL ANALYTICS

hvpcp.apr user s guide: set up and tour

3 Graphical Displays of Data

How to Make Graphs with Excel 2007

28 CHAPTER 2 Summarizing and Graphing Data

Working with Data and Charts

Background. Lecture 2. Datafusion Systems

23.2 Normal Distributions

SAMLab Tip Sheet #4 Creating a Histogram

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Slides by. John Loucks. St. Edward s University. Slide South-Western, a part of Cengage Learning

Chapter 6: DESCRIPTIVE STATISTICS

Visualization as an Analysis Tool: Presentation Supplement

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Marks. Marks can be classified according to the number of dimensions required for their representation: Zero: points. One: lines.

CS-5630 / CS-6630 Visualization for Data Science The Visualization Alphabet: Marks and Channels

SHOW ME THE NUMBERS: DESIGNING YOUR OWN DATA VISUALIZATIONS PEPFAR Applied Learning Summit September 2017 A. Chafetz

Content provided in partnership with Que, from the book Show Me Microsoft Office Access 2003 by Steve JohnsonÃÃ

PRACTICE EXERCISES. Family Utility Expenses

Visual Representation from Semiology of Graphics by J. Bertin

Data and Image Models

Numbers Basics Website:

Excel Tutorial 4: Analyzing and Charting Financial Data

Problems With Using Microsoft Excel for Statistics

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

COMPUTER TECHNOLOGY SPREADSHEETS BASIC TERMINOLOGY. A workbook is the file Excel creates to store your data.

DATA VISUALIZATION MISTAKES

Organisation and Presentation of Data in Medical Research Dr K Saji.MD(Hom)

Transcription:

Data Visualization Principles for Scientific Communication 8-888 Introduction to Linguistic Data Analysis Using R Jerzy Wieczorek 11//15

Follow along These slides and a summary checklist are at http://www.stat.cmu.edu/~jwieczor/

Overview There is a spectrum between exploratory graphics (for your own understanding as you analyze the data) and explanatory graphics (for communicating results to an audience). Today s advice applies to both, but we ll focus on explanatory graphs.

Overview What should I graph? (let function constrain form, show both raw data and summaries, show precision/uncertainty) Can people read my graphs? (guides and captions, image format and quality, text and color) Can people understand my graphs? (comparisons, grouping and search, cognition, consistency) Where can I learn more about dataviz? (books, blogs to follow)

Principles In general: Design each graph with a clear task in mind Use common sense to help audience read the graph Use knowledge of visual perception to help audience perform the task efficiently For scientific communication: Show the data foremost, and overlay summaries as appropriate Show statistical precision of any summaries Use tables to show exact values; use graphs to show patterns

Design for a clear task General principles: Design each graph with a clear task in mind Use common sense to help audience read the graph Use knowledge of visual perception to help audience perform the task efficiently

Design for a clear task A visualization is a tool for showing order and patterns in data. As the creator, you can generate order before people s brains try to do it on their own. Cairo, The Functional Art: Cairo read a claim that higher education is related to lower obesity, but no evidence was shown. He found a US state-level dataset for rates of obesity and of college education. What graphical form can help show evidence for or against this claim?

Choice of graphical form Bubbles on a map? (larger = higher percentage) Percentage of people with a BA degree or higher Percentage of obese people

Choice of graphical form Choropleth (colored thematic map)? (darker = higher percentage)

Choice of graphical form Scatterplot shows the relationship directly. Doesn t show spatial patterns, but that wasn t our goal, so OK.

Choice of graphical form Lessons Decide on visual task, and choose a form that supports it. Just because you have certain variables (geography, time, social ties) doesn t mean it s helpful to show them (map, time series, network diagram).

Most useful graphic forms and visual variables Data forms: point, line, bar (and small multiples) Statistical summaries: histogram, boxplot, smoothing line, error bar Visual variables: color, point shape, point size, line width, line type

Consistency Which age group weighs the least? 0 1 5 6 5 6 7 5 6 7 Age (months) Weight (kg)

Consistency Give all small multiples the same structure, usually including axis limits, to make comparisons easier and reduce cognitive load 0 1 5 6 7 8 Age (months) Weight (kg)

Consistency Ensure design changes are meaningful (tied to data changes)

Consistency More consistent redesign, Stephen Few

Consistency Avoid meaningless visual variables like shadow or 3D

Consistency Lessons Use consistent mappings (colors and shapes, axis limits) across graphs. Don t reuse the same mappings across different data variables. Avoid meaningless variety in design. Avoid shadow, 3D, and other variables not mapped to data.

Semantic associations Orange vs blue crab species: I ve actually seen this in a talk (crabs dataset) Frontal lobe (mm) 0 15 10 Blue crab Orange crab 6 8 10 1 1 16 18 0 Rear width (mm)

Semantic associations Lessons Use meaningful mappings: orange vs blue crab species = orange and blue symbols. Use conventional mappings: blue = cold, red = hot. More = more : deeper saturation or larger size = higher value of variable.

Legibility General principles: Design each graph with a clear task in mind Use common sense to help audience read the graph Use knowledge of visual perception to help audience perform the task efficiently

Legibility What is done well here? What could be improved?

Guides Title or caption: explain anything not already on axes or legend. What are the individual plotted points (raw data by subject, vs. averages by group)? Is this the full dataset or a subset? Axis labels: give units ( number of events, $, %, m/s ). Give readable labels, not default variable names (Pretest Score, not PRE_TEST_SCORE). Tick marks: use round, readable numbers and avoid scientific notation. If Dollars axis has ticks like 1e6, plot Dollars in millions instead. Grid lines: omit or make very light. Do not overwhelm the data. Legend: use same ordering as on plot. Or, if possible, omit and use direct labels instead.

Color Ensure your colors can be distinguished from each other: Around 10% of men and 1% of women are colorblind. Some color palettes do not photocopy well. Use Color Brewer to choose a well-tested palette.

Image quality Understand when to use bitmap vs. vector formats. Ensure graph s text size is similar to surrounding body text. Don t stretch graphs! Create & save them at the right size in the first place.

Vector vs bitmap Vector vs bitmap explained Bitmap: common formats jpg/jpeg is lossy, designed for photos but not text/charts png is lossless, good for text/charts, common on web Vector: common formats svg can display in browser, common on web pdf is for standalone doc (or to put inside another pdf)

Recommended formats and resolutions Software Illustrator pdflatex Office web Recommended graphics device svg pdf, png (600 ppi) png (600 ppi) png (7 ppi) (ppi = Pixels Per Inch) From Wickham, ggplot, Table 8.3

Save images at intended final size Decide on target size and create the graph that way from the start. Gives better quality than changing size after saving, and avoids stretching/distorting the graph. For example: Default textwidth in a LaTeX article is around 5. inches My Wordpress blog has a default width of 500 pixels

Visual Perception In general: Design each graph with a clear task in mind Use common sense to help audience read the graph Use knowledge of visual perception to help audience perform the task efficiently

Quantitative comparisons We ll try an experiment on next few slides: A B C D Positions 1??? Lengths 1??? Angles 1??? Areas 1???

Quantitative perceptual tasks: position, aligned A B C D

Quantitative perceptual tasks: length A B C D

Quantitative perceptual tasks: angle A B C D

Quantitative perceptual tasks: area A B C D

Quantitative perceptual tasks: answers A B C D Positions 1 3/ 1/ / Lengths 1 / 3/ 1/ Angles 1 /3 1/3 /3 Areas 1 / 1/ 3/ Cleveland and McGill (198) Cleveland, The Elements of Graphing Data

Quantitative perceptual tasks: effect of angle orientation Vertical Horizontal Same angle looks wider when bisector is horizontal.

Ordering of perceptual tasks Cleveland and McGill s ordering (split over slides)

Ordering of perceptual tasks

Distance Cleveland and McGill (198)

Quantitative perceptual tasks Lessons: Best to show quantitative variables with position or length. Use points, lines, or bars. Bars encode length, so start bars at 0. If you must zoom in, use dotplots (encoding position) instead. Avoid stacked bars (not aligned). Use dots or lines (aligned baselines) instead. Avoid pies, area, and volume entirely. Choose and order hues sensibly. Use Color Brewer. Place things-to-be-compared near each other.

Preattentive processing: example task Find and count the 6s 8 0 5 6 5 3 7 0 1 8 3 7 7 5 7 7 1 8 5 9 3 9 7 8 6 6 0 6 7 3 0 6 9 8 0 9 9 1 8 0 3 5 0 6 7 9 0 9 1 1 9 7 9 7 1 3 5 7 5 3 9 3 5 8 3 5 6 0 3 9 3 1

Preattentive processing: example task Find and count the 6s now 8 0 5 6 5 3 7 0 1 8 3 7 7 5 7 7 1 8 5 9 3 9 7 8 6 6 0 6 7 3 0 6 9 8 0 9 9 1 8 0 3 5 0 6 7 9 0 9 1 1 9 7 9 7 1 3 5 7 5 3 9 3 5 8 3 5 6 0 3 9 3 1

Preattentive processing We automatically process and notice certain features, while others require conscious thought to find We process faster when there are few categories to distinguish

Preattentive processing: features Colin Ware, Information Visualization

Preattentive processing: features

Preattentive processing Lessons Distinguish categorical groups by features like hue & shape. Hue also lets you use direct labels instead of a legend. Don t try to show too many groups on one plot. Use small multiples to show more sub-groups. If highlighting one group, use a preattentive attribute.

Separable dimensions Some examples from Colin Ware, Information Visualization <- More integral... More separable ->

Integral dimensions example US Census Bureau map using hue and saturation

Separable dimensions Lessons Use color and another variable (shape, size, orientation, motion). Use small multiples rather than different plotting symbols. Avoid mixing aspects of color, or aspects of size. Don t show too many grouping variables at once.

Show data, then summaries Scientific communication principles: Show the data foremost, and overlay summaries as appropriate Show statistical precision of any summaries Use tables to show exact values; use graphs to show patterns

Show data foremost Weissgerber et al. (015): summaries hide important data structure and sample size

Show data first, then add summaries

Show statistical precision Scientific communication principles: Show the data foremost, and overlay summaries as appropriate Show statistical precision of any summaries Use tables to show exact values; use graphs to show patterns

Show statistical precision How well do n=3 observations estimate the population mean? n=10? n=30? 3 10 30 5 6 3 10 30 n Weight (kg) Sampled newborn weights, with estimated means in red

Show statistical precision 3 10 30 5 6 3 10 30 n Weight (kg) Sampled newborn weights, with 95% CIs for the means in red

Show statistical precision Be clear about whether error bars show SD, SE, or CI. SD (standard deviation) summarizes data spread. No need to show it if you already show the data. SE (standard error) summarizes precision of the sample mean estimate. Useful for calculations, but not directly interpretable. CI (confidence interval) summarizes plausible values of the unknown population mean, based on our observed sample. CIs are most interpretable & best to plot. Specify confidence level, e.g. Error bars show 95% confidence intervals.

Use tables for lookup Scientific communication principles: Show the data foremost, and overlay summaries as appropriate Show statistical precision of any summaries Use tables to show exact values; use graphs to show patterns

Use tables for lookup Graphs are better for making comparisons and finding patterns. Tables are better for showing precise values, if audience needs them: Experimental design: sample size, number of repeated measurements, or variable settings in each group Summary statistics for further calculation: group means and SDs allow reader to test differences between specific groups of interest Reference tables: Celsius-to-Fahrenheit conversions, normal distribution critical values, etc.

Principles In general: Design each graph with a clear task in mind Use common sense to help audience read the graph Use knowledge of visual perception to help audience perform the task efficiently For scientific communication: Show the data foremost, and overlay summaries as appropriate Show statistical precision of any summaries Use tables to show exact values; use graphs to show patterns

Further resources My checklist for the material covered today Cairo, The Functional Art: great overview from a data journalism perspective Robbins, Creating More Effective Graphs: short, accessible summary of classic advice by Tufte and Cleveland; includes a graph design checklist Few, Show Me The Numbers: examples in a business context; excellent advice on table design Ware, Information Visualization: thorough textbook on insights from perception research Weissgerber et al. (015): article on showing all the data, not just statistical summaries Donahue, Fundamental Statistical Concepts in Presenting Data: real case studies from a statistical consultant

Thanks!

A few more principles, if there s time Align small multiples for the task Rank/order informatively Show derived variables directly Use Gestalt principles

Alignment Among male newborns, compare by race 0 6 0 6 0 6 0 6 0 6 0 6 Hisp Hisp Black Black White White Female Male Female Male Female Male 5 7 9 11 Weight (kg) Age (months)

Alignment Among male newborns, compare by race: easier search now, though harder comparison Hisp Black White 0 6 0 6 Female Male 5 7 9 11 5 7 9 11 5 7 9 11 Weight (kg) Age (months)

Ranking: alphabetical Alaska Arizona California Colorado Hawaii Idaho Montana Nevada New Mexico Oregon Utah Washington Wyoming 0 100 00 300 00 500 600 Western state areas (1000s of sq miles)

Ranking: informative Alaska California Montana New Mexico Arizona Nevada Colorado Wyoming Oregon Utah Idaho Washington Hawaii 0 100 00 300 00 500 600 Western state areas (1000s of sq miles)

Derived variables William Playfair, one of the earliest line charts What does the difference look like?

Derived variables Differences shown directly, by Cleveland and McGill

Alignment, ranking, and derived variables Lessons Decide on visual task, and helpfully align elements to be compared. As you explore data, try several arrangements. Order your dots/bars meaningfully: rank by a variable, not alphabetically. If differences or ratios are interesting, compute and plot them directly.

Gestalt Gestalt = pattern in German We automatically structure data into patterns / groups using certain features

Gestalt Lessons Distinguish categorical groups by similarity, proximity, or enclosure. Use proximity to structure your layout (arrange small multiples). Use connection to show groups on line chart, parallel coordinates chart, or network graph. To highlight one group, use enclosure or similarity.