Data Visualization (DSC 530/CIS )

Similar documents
Data Visualization (CIS/DSC 468)

CIS 467/602-01: Data Visualization

DSC 201: Data Analysis & Visualization

Data Visualization (CIS/DSC 468)

Data Visualization (CIS 468)

Telecommunications and Internet Access By Schools & School Districts

DSC 201: Data Analysis & Visualization

CS-5630 / CS-6630 Visualization for Data Science The Visualization Alphabet: Marks and Channels

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

CostQuest Associates, Inc.

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

2018 NSP Student Leader Contact Form

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

The Lincoln National Life Insurance Company Universal Life Portfolio

Fall 2007, Final Exam, Data Structures and Algorithms

Ocean Express Procedure: Quote and Bind Renewal Cargo

B.2 Measures of Central Tendency and Dispersion

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

Representation: Design Idioms 1

2018 Supply Cheat Sheet MA/PDP/MAPD

Team Members. When viewing this job aid electronically, click within the Contents to advance to desired page. Introduction... 2

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

CP SC 8810 Data Visualization. Joshua Levine

MERGING DATAFRAMES WITH PANDAS. Appending & concatenating Series

Distracted Driving- A Review of Relevant Research and Latest Findings

Marks. Marks can be classified according to the number of dimensions required for their representation: Zero: points. One: lines.

NSA s Centers of Academic Excellence in Cyber Security

State IT in Tough Times: Strategies and Trends for Cost Control and Efficiency

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Global Forum 2007 Venice

Silicosis Prevalence Among Medicare Beneficiaries,

Visualization Analysis & Design Full-Day Tutorial Session 1

We will start at 2:05 pm! Thanks for coming early!

Name: Business Name: Business Address: Street Address. Business Address: City ST Zip Code. Home Address: Street Address

Presented on July 24, 2018

cs6630 September VISUAL ENCODING Miriah Meyer University of Utah

Visual Encoding Design

IT Modernization in State Government Drivers, Challenges and Successes. Bo Reese State Chief Information Officer, Oklahoma NASCIO President

Visual Encoding Design

Data Visualization Principles for Scientific Communication

Tina Ladabouche. GenCyber Program Manager

Moonv6 Update NANOG 34

Post Graduation Survey Results 2015 College of Engineering Information Networking Institute INFORMATION NETWORKING Master of Science

DSC 201: Data Analysis & Visualization

2015 DISTRACTED DRIVING ENFORCEMENT APRIL 10-15, 2015

cs6964 February TABULAR DATA Miriah Meyer University of Utah

Charter EZPort User Guide

Visualization Analysis & Design

Department of Business and Information Technology College of Applied Science and Technology The University of Akron

CSE 781 Data Base Management Systems, Summer 09 ORACLE PROJECT

DTFH61-13-C Addressing Challenges for Automation in Highway Construction

HYPERVARIATE DATA VISUALIZATION

Best Practices in Rapid Deployment of PI Infrastructure and Integration with OEM Supplied SCADA Systems

Touch Input. CSE 510 Christian Holz Microsoft Research February 11, 2016

State HIE Strategic and Operational Plan Emerging Models. February 16, 2011

Geographic Accuracy of Cell Phone RDD Sample Selected by Area Code versus Wire Center

AASHTO s National Transportation Product Evaluation Program

Homework Assignment #5

Week 4: Facet. Tamara Munzner Department of Computer Science University of British Columbia

The Outlook for U.S. Manufacturing

SAS Visual Analytics 8.2: Working with Report Content

Week 6: Networks, Stories, Vis in the Newsroom

Prizm. manufactured by. White s Electronics, Inc Pleasant Valley Road Sweet Home, OR USA. Visit our site on the World Wide Web

ACCESS PROCESS FOR CENTRAL OFFICE ACCESS

Amy Schick NHTSA, Occupant Protection Division April 7, 2011

Visual Representation from Semiology of Graphics by J. Bertin

Presentation Outline. Effective Survey Sampling of Rare Subgroups Probability-Based Sampling Using Split-Frames with Listed Households

Sideseadmed (IRT0040) loeng 4/2012. Avo

Data Visualization Principles for Dashboard Design

PulseNet Updates: Transitioning to WGS for Reference Testing and Surveillance

3 Visualizing quantitative Information

What Did You Learn? Key Terms. Key Concepts. 68 Chapter P Prerequisites

A Capabilities Presentation

Visualization Analysis & Design Full-Day Tutorial Session 2

Multivariate Data More Overview

SAS Visual Analytics 8.2: Getting Started with Reports

InfoVis: a semiotic perspective

Data Visualization (CIS/DSC 468)

Contact Center Compliance Webinar Bringing you the ANSWERS you need about compliance in your call center.

CP SC 8810 Data Visualization. Joshua Levine

CIE L*a*b* color model

Lecture 5: DATA MAPPING & VISUALIZATION. November 3 rd, Presented by: Anum Masood (TA)

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Perception Maneesh Agrawala CS : Visualization Fall 2013 Multidimensional Visualization

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Expanding Transmission Capacity: Options and Implications. What role does renewable energy play in driving transmission expansion?

Chap 12: Facet Into Multiple Views Paper: Multiform Matrices and Small Multiples

MIS2502: Review for Exam 2. Jing Gong

Visual Computing. Lecture 2 Visualization, Data, and Process

Multiple Dimensional Visualization

MOVR. Mobile Overview Report January March The first step in a great mobile experience

8. MINITAB COMMANDS WEEK-BY-WEEK

Steve Stark Sales Executive Newcastle

Strengthening connections today, while building for tomorrow. Wireless broadband, small cells and 5G

Chapter 2 - Graphical Summaries of Data

Making Science Graphs and Interpreting Data

Organisation and Presentation of Data in Medical Research Dr K Saji.MD(Hom)

cs6964 March TREES & GRAPHS Miriah Meyer University of Utah

Transcription:

Data Visualization (DSC 530/CIS 602-02) Tabular Data Dr. David Koop

Visual Encoding How should we visualize this data? Name Region Population Life Expectancy Income China East Asia & Pacific 1335029250 73.28 7226.07 India South Asia 1140340245 64.01 2731 United States America 306509345 79.43 41256.08 Indonesia East Asia & Pacific 228721000 71.17 3818.08 Brazil America 193806549 72.68 9569.78 Pakistan South Asia 176191165 66.84 2603 Bangladesh South Asia 156645463 66.56 1492 Nigeria Sub-Saharan Africa 141535316 48.17 2158.98 Japan East Asia & Pacific 127383472 82.98 29680.68 Mexico America 111209909 76.47 11250.37 Philippines East Asia & Pacific 94285619 72.1 3203.97 Vietnam East Asia & Pacific 86970762 74.7 2679.34 Germany Europe & Central Asia 82338100 80.08 31191.15 Ethiopia Sub-Saharan Africa 79996293 55.69 812.16 Turkey Europe & Central Asia 72626967 72.06 8040.78 2

Potential Solution [Gapminder, Wealth & Health of Nations] 3

Visual Encoding How do we encode data visually? - Marks are the basic graphical elements in a visualization - Channels are ways to control the appearance of the marks Marks classified by dimensionality: Points Lines Areas Also can have surfaces, volumes Think of marks as a mathematical definition, or if familiar with tools like Adobe Illustrator or Inkscape, the path & point definitions 4

Visual Channels Position Color Horizontal Vertical Both Shape Tilt Size Length Area Volume [Munzner (ill. Maguire), 2014] 5

Channel Types Identity => what or where, Magnitude => how much Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Spatial region Color hue Motion Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) [Munzner (ill. Maguire), 2014] 6

Mark Types Can have marks for items and links - Connection => pairwise relationship - Containment => hierarchical relationship Marks as Items/Nodes Points Lines Areas Marks as Links Containment Connection [Munzner (ill. Maguire), 2014] 7

Expressiveness and Effectiveness Expressiveness Principle: all data from the dataset and nothing more should be shown - Do encode ordered data in an ordered fashion - Don t encode categorical data in a way that implies an ordering Effectiveness Principle: the most important attributes should be the most salient - Saliency: how noticeable something is - How do the channels we have discussed measure up? - How was this determined? 8

Studying Visualization Perception Cleveland & McGill s Results Positions 1.0 1.5 2.0 2.5 3.0 Log Error Crowdsourced Results Angles Circular areas Rectangular areas (aligned or in a treemap) 1.0 1.5 2.0 2.5 3.0 Log Error [Munzner (ill. Maguire) based on Heer & Bostock, 2014] 9

Ranking Channels by Effectiveness Channels: Expressiveness Types and Effectiveness Ranks Magnitude Channels: Ordered Attributes Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) [Munzner (ill. Maguire), 2014] 10

Assignment 2 http://www.cis.umassd.edu/~dkoop/ dsc530-2017sp/assignment2.html Due next Monday Use D3 1. Repeat Part 3b of A1 using D3 2. Extend Part 1 to create a stacked bar chart 3. Create a line chart that shows a region's numbers that is linked to a dropdown menu allowing you to select the region. Use transitions! 11

Discriminability What is problematic here? File vtkdatasetreader PythonSource vtkimageclip vtkimagedatageometryfilter vtkimageresample vtkimagereslice vtkwarpscalar vtkcolortransferfunction vtklookuptable vtkelevationfilter vtkoutlinefilter PythonSource vtkpolydatanormals vtkpolydatamapper vtkproperty vtkdatasetmapper vtkscalarbaractor vtkactor vtkcamera vtkactor vtklodactor vtkcubeaxesactor2d vtkrenderer VTKCell [Koop et al., 2013] 12

Separability Cannot treat all channels as independent! Separable means each individual channel can be distinguished Integral means the channels are perceived together Position Hue (Color) Size Hue (Color) Width Height Red Green Fully separable Some interference Some/significant interference Major interference [Munzner (ill. Maguire) based on Ware, 2014] 13

Integral Channels [http://magazine.good.is/infographics/america-s-richest-counties-and-best-educated-counties] 14

Visual Popout [C. G. Healey, http://www.csc.ncsu.edu/faculty/healey/pp/] 15

Visual Popout: Parallel Lines Require Search [Munzner (ill. Maguire), 2014] 16

Relative vs. Absolute Judgments Weber s Law: - We judge based on relative not absolute differences - The amount of perceived difference depends is relative to the object s magnitude! A B A B A B Unframed Unaligned Framed Unaligned Unframed Aligned [Munzner (ill. Maguire), 2014] 17

Luminance Perception [E. H. Adelson, 1995] 18

Luminance Perception [E. H. Adelson, 1995] 19

Tables 20

Visualization of Tables Items and attributes For now, attributes are not known to be positions Keys and values - key is an independent attribute that is unique and identifies item - value tells some aspect of an item Keys: categorical/ordinal Values: +quantitative Levels: unique values of categorical or ordered attributes Items (rows) Attributes (columns) Cell containing value Multidimensional Table Value in cell [Munzner (ill. Maguire), 2014] 21

Arrange Tables Arrange Tables Express Values Separate, Order, Align Regions Separate Order Align 1 Key 2 Keys 3 Keys Many Keys List Matrix Volume Recursive Subdivision Axis Orientation Rectilinear Parallel Radial Layout Density Dense Space-Filling [Munzner (ill. Maguire), 2014] 22

Express Values: Scatterplots Prices in 1980 450 400 350 300 250 200 150 100 50 0 Fish Prices over the Years 0 50 100 150 200 250 300 350 400 450 Prices in 1970 Data: two quantitative values Task: find trends, clusters, outliers How: marks at spatial position in horizontal and vertical directions Correlation: dependence between two attributes - Positive and negative correlation - Indicated by lines Coordinate system (axes) and labels are important! 23

0.000 0.002 0.004 0.006 0 10000 20000 30000 40000 n dist Coordinate Systems 24 [Wickham, 2014]

0.001 0.0001 0.00001 100 1000 10000 n dist Log-Log Plot 25 [Wickham, 2014]

Bubble Plot [Gapminder, Wealth & Health of Nations] 26

Scatterplot Data: two quantitative values Task: find trends, clusters, outliers How: marks at spatial position in horizontal and vertical directions Scalability: hundreds of items Cool recent result from Harrison et al., "Ranking Visualizations of Correlation Using Weber s Law", 2014: - Correlation perception can be modeled via Weber s Law - Scatterplots are one of the best visualizations for both positive and negative correlation - Further analysis: M. Kay and J. Heer, "Beyond Weber's Law", 2015 27

Separate, Order, and Align: Categorical Regions Categorical: =,!= Spatial position can be used for categorical attributes Use regions, distinct contiguous bounded areas, to encode categorical attributes Three operations on the regions: - Separate (use categorical attribute) - Align - Order (use some other ordered attribute) Alignment and order can use same or different attribute 28

List Alignment: Bar Charts Data: one quantitative attribute, one categorical attribute Task: lookup & compare values How: line marks, vertical position (quantitative), horizontal position (categorical) What about length? Ordering criteria: alphabetical or using quantitative attribute Scalability: distinguishability - bars at least one pixel wide - hundreds 100 75 50 25 0 100 75 50 25 0 Animal Type Animal Type [Munzner (ill. Maguire), 2014] 29

Stacked Bar Charts 35M 30M 25M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 20M 15M 10M 5.0M 0.0 CA TX NY FL IL PA OH MI GA NC NJ VA WA AZ MA IN TN MOMD WI MN CO AL SC LA KY OR OK CT IA MS AR KS UT NV NMWV NE ID ME NH HI RI MT DE SD AK ND VT DC WY [Bostock, 2012] 30

Grouped Bar Chart 10M 9.0M 8.0M 7.0M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 6.0M 5.0M 4.0M 3.0M 2.0M 1.0M 0.0 CA TX NY FL IL PA [M. Bostock, http://bl.ocks.org/mbostock/3887051] 31

Stacked Bar Charts Data: multidimensional table: one quantitative, two categorical Task: lookup values, part-to-whole relationship, trends How: line marks: position (both horizontal & vertical), subcomponent line marks: length, color Scalability: main axis (hundreds like bar chart), bar classes (<12) Orientation: vertical or horizontal (swap how horizontal and vertical position are used. 32

Streamgraphs Include a time attribute Data: multidimensional table, one quantitative attribute (count), one ordered key attribute (time), one categorical key attribute + derived attribute: layer ordering (quantitative) Task: analyze trends in time, find (maxmial) outliers How: derived position+geometry, length, color Scalability: more categories than stacked bar charts fig 2 films from the summer of 2007 [Byron and Wattenberg, 2012] 33

NYTimes Ebb and Flow of Movies" [http://www.nytimes.com/interactive/2008/02/23/movies/20080223_revenue_graphic.html] 34

Dot and Line Charts 20 15 10 5 0 20 15 10 5 0 Year Data: one quantitative attribute, one ordered attribute Task: lookup values, find outliers and trends How: point mark and positions Line Charts: add connection mark (line) Similar to scatterplots but allow ordered attribute Year [Munzner (ill. Maguire), 2014] 35

Proper Use of Line and Bar Charts 60 60 50 50 40 40 30 30 20 20 10 10 0 Female Male 0 Female Male 60 50 40 30 20 10 0 10-year-olds 12-year-olds 60 50 40 30 20 10 0 10-year-olds 12-year-olds [Zacks and Tversky, 1999, Munzner (ill. Maguire), 2014] 36

Proper Use of Line and Bar Charts 60 60 50 50 40 40 30 30 20 20 10 10 0 Female Male 0 Female Male [Zacks and Tversky, 1999, Munzner (ill. Maguire), 2014] 37

Aspect Ratio Trends in line charts are more apparent because we are using angle as a channel Perception of angle (and the relative difference between angles) is important Initial experiments found people best judge differences in slope when angles are around 45 degrees (Cleveland et al., 1988, 1993) 38

Multiscale Banking 706 IEEE TRANSACTIONS ON VISUALIZATIO Sunspot Cycles Aspect Ratio = 3.96 Aspect Ratio = 22.35 Power Spectrum [Heer and Agrawala, 2006] 39

Multiscale Banking ON AND COMPUTER GRAPHICS, VOL. 12, NO. 5, SEPTEMBER/OCTOBER 2006 PRMTX Mutual Fund Aspect Ratio = 4.23 Aspect Ratio = 14.55 Power Spectrum [Heer and Agrawala, 2006] 40

Expanding the Study Cleveland et al. did not study the entire space of slope comparisons and 45 degrees was at the low end of their study (blue marks on right) Talbot et al. compared more slopes and found that people do better with smaller slopes Baselines may aid with this Slope ratio (pij) 87% 65% 48% 35% 25% 17% 11% 9 18 30 45 60 72 81 Mid angle (θ m ) [Talbot et al., 2013] 41

Heatmaps Data: Two keys, one quantitative attribute Task: Find clusters, outliers, summarize How: area marks in grid, color encoding of quantitative attribute Scalability: number of pixels for area marks (millions), <12 colors Red-green color scales often used - Be aware of colorblindness! Fast-Pitch Softball Slugging Percentage [fastpitchanalytics.com] 42

Bertin Matrices Must we only use color? - What other marks might be appropriate? [C.Perrin et al., 2014] 43

Bertin Matrices Must we only use color? - What other marks might be appropriate? [C.Perrin et al., 2014] 43

Bertin s Encodings Text 0 1 2 3 4 5 6 7 8 9 10 11 text Grayscale QUANTITY OF INK ENCODINGS POSITIONAL ENCODING MEAN-BASED ENCODINGS Circle Dual bar chart Bar chart Line Black and white bar chart Average bar chart [C.Perrin et al., 2014] 44

Matrix Reordering [Bertin Exhibit (INRIA, Vis 2014), Photo by Robert Kosara] 45

Cluster Heatmap [File System Similarity, R. Musăloiu-E., 2009] 46

Cluster Heatmap Data & Task: Same as Heatmap How: Area marks but matrix is ordered by cluster hierarchies Scalability: limited by the cluster dendrogram Dendrogram: a visual encoding of tree data with leaves aligned 47

Sepal.Length 2.0 3.0 4.0 0.5 1.5 2.5 4.5 5.5 6.5 7.5 2.0 3.0 4.0 Sepal.Width Petal.Length 1 2 3 4 5 6 7 4.5 5.5 6.5 7.5 0.5 1.5 2.5 1 2 3 4 5 6 7 Petal.Width Iris Data (red=setosa,green=versicolor,blue=virginica) Scatterplot Matrix (SPLOM) Data: Many quantitative attributes Derived Data: names of attributes Task: Find correlations, trends, outliers How: Scatterplots in matrix alignment Scale: attributes: ~12, items: hundreds? Visualizations in a visualization: at high level, marks are themselves visualizations 48 [R command from Wikipedia]

Spatial Axis Orientation So far, we have seen the vertical and horizontal axes (a rectilinear layout) used to encode almost everything What other possibilities are there for axes? [Munzner (ill. Maguire), 2014] 49

Spatial Axis Orientation So far, we have seen the vertical and horizontal axes (a rectilinear layout) used to encode almost everything What other possibilities are there for axes? - Parallel axes Math Physics Dance Drama 100 90 80 70 60 50 40 30 20 10 0 Parallel Coordinates [Munzner (ill. Maguire), 2014] 49

Spatial Axis Orientation So far, we have seen the vertical and horizontal axes (a rectilinear layout) used to encode almost everything What other possibilities are there for axes? - Parallel axes - Radial axes Math Physics Dance Drama 100 90 80 70 60 50 40 30 20 10 0 Parallel Coordinates [Munzner (ill. Maguire), 2014] 49

Radial Axes 120 90 60 150 30 180 0 210 330 240 270 300 50

Radial Axes Polar Coordinates (angle + position along the line at that angle) 150 120 90 60 30 What types of encodings are possible for tabular data in polar coordinates? 180 0 210 330 240 270 300 50

Radial Axes Polar Coordinates (angle + position along the line at that angle) 150 120 90 60 30 What types of encodings are possible for tabular data in polar coordinates? - Radial bar charts 180 0 - Pie charts - Donut charts 210 330 240 270 300 50

Part-of-whole: Relative % comparison? 40M 35M 30M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 25M 20M 15M 10M 5M 0M CA TX NY FL IL PA OH MI GA NC NJ VA WA AZ MA IN TN MOMD WI MN CO AL SC LA KY OR OK CT IA MS AR KS UT NV NMWV NE ID ME NH HI RI MT DE SD AK ND VT DC WY [Stacked Bar Chart, Bostock, 2017] 51

Normalized Stacked Bar Chart 100% 90% 65 Years and Over 80% 70% 45 to 64 Years 60% 50% 40% 25 to 44 Years 30% 18 to 24 Years 20% 14 to 17 Years 10% 5 to 13 Years 0% UT TX ID AZ NV GA AK MSNM NE CA OK SD CO KS WYNC AR LA IN IL MN DE HI SCMO VA IA TN KY AL WAMDND OH WI OR NJ MT MI FL NY DC CT PA MAWV RI NH ME VT Under 5 Years [Normalized Stacked Bar Chart, Bostock, 2017] 52

Pie Chart 65 <5 45-64 5-13 14-17 18-24 25-44 [Pie Chart, Bostock, 2017] 53

Pie Charts vs. bar charts [Munzner's Textbook, 2014] - Angle channel is lower precision then position in bar charts What about donut charts? Are we judging angle, or are we judging area, or arc length? - "Arcs, Angles, or Areas: Individual Data Encodings in Pie and Donut Charts", D. Skau and R. Kosara, 2016 - "Judgment Error in Pie Chart Variations", R. Kosara and D. Skau, 2016 - Summary: "An Illustrated Study of the Pie Chart Study Results" 54

Study Setup Three studies 80-100 participants each Each answered ~60 questions Computed results using 95% Confidence Intervals 95% CI Mean [R. Kosara and D. Skau, 2016] 55

Arcs, Angles, or Areas? [R. Kosara and D. Skau, 2016] 56

Signed Error [R. Kosara and D. Skau, 2016] 57

Absolute Error [R. Kosara and D. Skau, 2016] 58

Absolute Error Relative to Pie Chart [R. Kosara and D. Skau, 2016] 59

Donut Charts Width [R. Kosara and D. Skau, 2016] 60

Pie Chart Variations [R. Kosara and D. Skau, 2016] 61

Pie Chart Variations [R. Kosara and D. Skau, 2016] 62

Conclusion: We do not read pie charts by angle [R. Kosara and D. Skau, 2016] 63

Pies vs. Bars but area is still harder to judge than position Screens are usually not round 64