DSC 201: Data Analysis & Visualization Visualization Design Dr. David Koop
Definition Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. T. Munzner 2
Attribute Types Attribute Types Categorical Ordered Ordinal Quantitative Ordering Direction Sequential Diverging Cyclic [Munzner (ill. Maguire), 2014] 3
Categorial, Ordinal, and Quantitative 1 = Quantitative 23 2 ordinal = Nominal 3 = Ordinal quantitative categorical 4
Categorial, Ordinal, and Quantitative 1 = Quantitative 24 2 ordinal = Nominal 3 = Ordinal quantitative categorical 5
Tableau High-level GUI that connects to data, helps organize it, and provides intuitive routines for visualizing it plus customization Lots of possibilities Great for exploration 6
matplotlib %matplotlib inline (show plots in the notebook!) Always create a figure first, then draw plots Lots of high-level plotting types (line plots, scatterplots, histograms) Lots of customizability 7
matplotlib shortcuts in pandas Connect directly to pandas data frames - df.plot http://pandas.pydata.org/pandas-docs/stable/generated/ pandas.dataframe.plot.html 8
What ingredients go into visualizations? 9
Visual Encoding How do we encode data visually? - Marks are the basic graphical elements in a visualization - Channels are ways to control the appearance of the marks Marks classified by dimensionality: Points Lines Areas Also can have surfaces, volumes Think of marks as a mathematical definition, or if familiar with tools like Adobe Illustrator or Inkscape, the path & point definitions 10
Channels: Visual Appearance How should we encode this data? Name Region Population Life Expectancy Income China East Asia & Pacific 1335029250 73.28 7226.07 India South Asia 1140340245 64.01 2731 United States America 306509345 79.43 41256.08 Indonesia East Asia & Pacific 228721000 71.17 3818.08 Brazil America 193806549 72.68 9569.78 Pakistan South Asia 176191165 66.84 2603 Bangladesh South Asia 156645463 66.56 1492 Nigeria Sub-Saharan Africa 141535316 48.17 2158.98 Japan East Asia & Pacific 127383472 82.98 29680.68 Mexico America 111209909 76.47 11250.37 Philippines East Asia & Pacific 94285619 72.1 3203.97 Vietnam East Asia & Pacific 86970762 74.7 2679.34 Germany Europe & Central Asia 82338100 80.08 31191.15 Ethiopia Sub-Saharan Africa 79996293 55.69 812.16 Turkey Europe & Central Asia 72626967 72.06 8040.78 11
Potential Solution [Gapminder, Wealth & Health of Nations] 12
Another Solution [Gapminder, Wealth & Health of Nations] 13
Visual Channels Position Color Horizontal Vertical Both Shape Tilt Size Length Area Volume [Munzner (ill. Maguire), 2014] 14
Channels Usually map an attribute to a single channel - Could use multiple channels but - Limited number of channels Restrictions on size and shape - Points are nothing but location so size and shape are ok - Lines have a length, cannot easily encode attribute as length - Maps with boundaries have area, changing size can be problematic 15
Cartograms [Election Results by Population, M. Newman, 2012] 16
Channel Types Identity => what or where, Magnitude => how much Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Spatial region Color hue Motion Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) [Munzner (ill. Maguire), 2014] 17
Mark Types Can have marks for items and links - Connection => pairwise relationship - Containment => hierarchical relationship Marks as Items/Nodes Points Lines Areas Marks as Links Containment Connection [Munzner (ill. Maguire), 2014] 18
Expressiveness and Effectiveness Expressiveness Principle: all data from the dataset and nothing more should be shown - Do encode ordered data in an ordered fashion - Don t encode categorical data in a way that implies an ordering Effectiveness Principle: the most important attributes should be the most salient - Saliency: how noticeable something is - How do the channels we have discussed measure up? - How was this determined? 19
How do we test effectiveness? 20
Test % difference in length between elements 100 0 A B [Heer & Bostock, 2010] 21
Test % difference in length between elements 100 0 A B [Heer & Bostock, 2010] 22
Test % difference in length between elements 100 0 A B [Heer & Bostock, 2010] 23
Test % difference in length between elements 100 Answer: Left is ~5.6x longer than Right 0 A B [Heer & Bostock, 2010] 24
Test % difference in length between elements 100 0 A B [Modified from Heer & Bostock, 2010] 25
Test % difference in length between elements 100 Answer: Right is 4x larger than Left 0 A B [Modified from Heer & Bostock, 2010] 26
Test % difference in area between elements A B [Heer & Bostock, 2010] 27
Test % difference in area between elements Answer: A is ~2.25x larger (in area) than B A B [Heer & Bostock, 2010] 28
Test % difference in area between elements A B [Heer & Bostock, 2010] 29
Test % difference in area between elements Answer: A is ~6.1x larger (in area) than B A B [Heer & Bostock, 2010] 30
Test % difference in area between elements B A [Heer & Bostock, 2010] 31
Test % difference in area between elements Answer: B is ~2.5 larger (in area) than A B A [Heer & Bostock, 2010] 32
Results Summary Cleveland & McGill s Results Positions 1.0 1.5 2.0 2.5 3.0 Log Error Crowdsourced Results Angles Circular areas Rectangular areas (aligned or in a treemap) 1.0 1.5 2.0 2.5 3.0 Log Error [Munzner (ill. Maguire) based on Heer & Bostock, 2014] 33
Ranking Channels by Effectiveness Channels: Expressiveness Types and Effectiveness Ranks Magnitude Channels: Ordered Attributes Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) [Munzner (ill. Maguire), 2014] 34
Visualizing Data Tables Arrange Tables Express Values Separate, Order, Align Regions Separate Order Align 1 Key 2 Keys 3 Keys Many Keys List Matrix Volume Recursive Subdivision Axis Orientation Rectilinear Parallel Radial Layout Density Dense Space-Filling [Munzner (ill. Maguire), 2014] 35
Express Values: Scatterplots Prices in 1980 450 400 350 300 250 200 150 100 50 0 Fish Prices over the Years 0 50 100 150 200 250 300 350 400 450 Prices in 1970 Data: two quantitative values Task: find trends, clusters, outliers How: marks at spatial position in horizontal and vertical directions Correlation: dependence between two attributes - Positive and negative correlation - Indicated by lines Coordinate system (axes) and labels are important! 36
0.000 0.002 0.004 0.006 0 10000 20000 30000 40000 n dist Coordinate Systems 37 [Wickham, 2014]
0.001 0.0001 0.00001 100 1000 10000 n dist Log-Log Plot 38 [Wickham, 2014]
Bubble Plot [Gapminder, Wealth & Health of Nations] 39
Separate, Order, and Align: Categorical Regions Categorical: =,!= Spatial position can be used for categorical attributes Use regions, distinct contiguous bounded areas, to encode categorical attributes Three operations on the regions: - Separate (use categorical attribute) - Align - Order (use some other ordered attribute) Alignment and order can use same or different attribute 40
List Alignment: Bar Charts Data: one quantitative attribute, one categorical attribute Task: lookup & compare values How: line marks, vertical position (quantitative), horizontal position (categorical) What about length? Ordering criteria: alphabetical or using quantitative attribute Scalability: distinguishability - bars at least one pixel wide - hundreds 100 75 50 25 0 100 75 50 25 0 Animal Type Animal Type [Munzner (ill. Maguire), 2014] 41
Stacked Bar Charts 35M 30M 25M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 20M 15M 10M 5.0M 0.0 CA TX NY FL IL PA OH MI GA NC NJ VA WA AZ MA IN TN MOMD WI MN CO AL SC LA KY OR OK CT IA MS AR KS UT NV NMWV NE ID ME NH HI RI MT DE SD AK ND VT DC WY [Bostock, 2012] 42
Grouped Bar Chart 10M 9.0M 8.0M 7.0M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 6.0M 5.0M 4.0M 3.0M 2.0M 1.0M 0.0 CA TX NY FL IL PA [M. Bostock, http://bl.ocks.org/mbostock/3887051] 43