DSC 201: Data Analysis & Visualization

Similar documents
Data Visualization (CIS/DSC 468)

CIS 467/602-01: Data Visualization

Data Visualization (DSC 530/CIS )

Data Visualization (CIS/DSC 468)

Data Visualization (CIS 468)

DSC 201: Data Analysis & Visualization

Fall 2007, Final Exam, Data Structures and Algorithms

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

CS-5630 / CS-6630 Visualization for Data Science The Visualization Alphabet: Marks and Channels

Telecommunications and Internet Access By Schools & School Districts

DSC 201: Data Analysis & Visualization

DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

The Lincoln National Life Insurance Company Universal Life Portfolio

CostQuest Associates, Inc.

Ocean Express Procedure: Quote and Bind Renewal Cargo

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

MERGING DATAFRAMES WITH PANDAS. Appending & concatenating Series

2018 NSP Student Leader Contact Form

B.2 Measures of Central Tendency and Dispersion

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

State IT in Tough Times: Strategies and Trends for Cost Control and Efficiency

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

DSC 201: Data Analysis & Visualization

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Distracted Driving- A Review of Relevant Research and Latest Findings

Global Forum 2007 Venice

2018 Supply Cheat Sheet MA/PDP/MAPD

Silicosis Prevalence Among Medicare Beneficiaries,

NSA s Centers of Academic Excellence in Cyber Security

Team Members. When viewing this job aid electronically, click within the Contents to advance to desired page. Introduction... 2

Best Practices in Rapid Deployment of PI Infrastructure and Integration with OEM Supplied SCADA Systems

IT Modernization in State Government Drivers, Challenges and Successes. Bo Reese State Chief Information Officer, Oklahoma NASCIO President

Moonv6 Update NANOG 34

Post Graduation Survey Results 2015 College of Engineering Information Networking Institute INFORMATION NETWORKING Master of Science

CSE 781 Data Base Management Systems, Summer 09 ORACLE PROJECT

Presented on July 24, 2018

Name: Business Name: Business Address: Street Address. Business Address: City ST Zip Code. Home Address: Street Address

Department of Business and Information Technology College of Applied Science and Technology The University of Akron

Homework Assignment #5

Visual Encoding Design

Charter EZPort User Guide

Tina Ladabouche. GenCyber Program Manager

The Outlook for U.S. Manufacturing

Visualization Analysis & Design Full-Day Tutorial Session 1

Visual Encoding Design

DTFH61-13-C Addressing Challenges for Automation in Highway Construction

State HIE Strategic and Operational Plan Emerging Models. February 16, 2011

ACCESS PROCESS FOR CENTRAL OFFICE ACCESS

Sideseadmed (IRT0040) loeng 4/2012. Avo

Prizm. manufactured by. White s Electronics, Inc Pleasant Valley Road Sweet Home, OR USA. Visit our site on the World Wide Web

2015 DISTRACTED DRIVING ENFORCEMENT APRIL 10-15, 2015

Presentation Outline. Effective Survey Sampling of Rare Subgroups Probability-Based Sampling Using Split-Frames with Listed Households

AASHTO s National Transportation Product Evaluation Program

PulseNet Updates: Transitioning to WGS for Reference Testing and Surveillance

A Capabilities Presentation

Data Visualization (CIS/DSC 468)

What Did You Learn? Key Terms. Key Concepts. 68 Chapter P Prerequisites

Touch Input. CSE 510 Christian Holz Microsoft Research February 11, 2016

Contact Center Compliance Webinar Bringing you the ANSWERS you need about compliance in your call center.

Visualization Analysis & Design

Week 6: Networks, Stories, Vis in the Newsroom

Scalable Data Analysis (CIS )

Geographic Accuracy of Cell Phone RDD Sample Selected by Area Code versus Wire Center

Selling Compellent Hardware: Controllers, Drives, Switches and HBAs Chad Thibodeau

MOVR. Mobile Overview Report January March The first step in a great mobile experience

Amy Schick NHTSA, Occupant Protection Division April 7, 2011

Expanding Transmission Capacity: Options and Implications. What role does renewable energy play in driving transmission expansion?

Week 4: Facet. Tamara Munzner Department of Computer Science University of British Columbia

CMPE 180A Data Structures and Algorithms in C++ Spring 2018

Presentation to NANC. January 22, 2003

Medium voltage Marketing contacts

Steve Stark Sales Executive Newcastle

Marks. Marks can be classified according to the number of dimensions required for their representation: Zero: points. One: lines.

HYPERVARIATE DATA VISUALIZATION

Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records

Representation: Design Idioms 1

2013 Product Catalog. Quality, affordable tax preparation solutions for professionals Preparer s 1040 Bundle... $579

Panasonic Certification Training January 21-25, 2019

Strengthening connections today, while building for tomorrow. Wireless broadband, small cells and 5G

3 Visualizing quantitative Information

OPERATOR CERTIFICATION

Configuring Oracle GoldenGate OGG 11gR2 local integrated capture and using OGG for mapping and transformations

VitalSigns Company List

How to Make an Impressive Map of the United States with SAS/Graph for Beginners Sharon Avrunin-Becker, Westat, Rockville, MD

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

STATE DATA BREACH NOTIFICATION LAWS OVERVIEW OF REQUIREMENTS FOR RESPONDING TO A DATA BREACH UPDATED JUNE 2017

cs6964 February TABULAR DATA Miriah Meyer University of Utah

CP SC 8810 Data Visualization. Joshua Levine

Multivariate Data & Tables and Graphs

Be sure to fax AGA a copy of every application you submitted direct along with the fax confirmation page to ensure you receive your commission!

Data Visualization (DSC 530/CIS )

National Continuity Programs

Unit 7 Day 5 Graph Theory. Section 5.1 and 5.2

WHAT S NEW IN CHECKPOINT

Making Science Graphs and Interpreting Data

The Altice Business. Carrier Advantage

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

Multivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles

We will start at 2:05 pm! Thanks for coming early!

Transcription:

DSC 201: Data Analysis & Visualization Visualization Design Dr. David Koop

Definition Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. T. Munzner 2

Attribute Types Attribute Types Categorical Ordered Ordinal Quantitative Ordering Direction Sequential Diverging Cyclic [Munzner (ill. Maguire), 2014] 3

Categorial, Ordinal, and Quantitative 1 = Quantitative 23 2 ordinal = Nominal 3 = Ordinal quantitative categorical 4

Categorial, Ordinal, and Quantitative 1 = Quantitative 24 2 ordinal = Nominal 3 = Ordinal quantitative categorical 5

Tableau High-level GUI that connects to data, helps organize it, and provides intuitive routines for visualizing it plus customization Lots of possibilities Great for exploration 6

matplotlib %matplotlib inline (show plots in the notebook!) Always create a figure first, then draw plots Lots of high-level plotting types (line plots, scatterplots, histograms) Lots of customizability 7

matplotlib shortcuts in pandas Connect directly to pandas data frames - df.plot http://pandas.pydata.org/pandas-docs/stable/generated/ pandas.dataframe.plot.html 8

What ingredients go into visualizations? 9

Visual Encoding How do we encode data visually? - Marks are the basic graphical elements in a visualization - Channels are ways to control the appearance of the marks Marks classified by dimensionality: Points Lines Areas Also can have surfaces, volumes Think of marks as a mathematical definition, or if familiar with tools like Adobe Illustrator or Inkscape, the path & point definitions 10

Channels: Visual Appearance How should we encode this data? Name Region Population Life Expectancy Income China East Asia & Pacific 1335029250 73.28 7226.07 India South Asia 1140340245 64.01 2731 United States America 306509345 79.43 41256.08 Indonesia East Asia & Pacific 228721000 71.17 3818.08 Brazil America 193806549 72.68 9569.78 Pakistan South Asia 176191165 66.84 2603 Bangladesh South Asia 156645463 66.56 1492 Nigeria Sub-Saharan Africa 141535316 48.17 2158.98 Japan East Asia & Pacific 127383472 82.98 29680.68 Mexico America 111209909 76.47 11250.37 Philippines East Asia & Pacific 94285619 72.1 3203.97 Vietnam East Asia & Pacific 86970762 74.7 2679.34 Germany Europe & Central Asia 82338100 80.08 31191.15 Ethiopia Sub-Saharan Africa 79996293 55.69 812.16 Turkey Europe & Central Asia 72626967 72.06 8040.78 11

Potential Solution [Gapminder, Wealth & Health of Nations] 12

Another Solution [Gapminder, Wealth & Health of Nations] 13

Visual Channels Position Color Horizontal Vertical Both Shape Tilt Size Length Area Volume [Munzner (ill. Maguire), 2014] 14

Channels Usually map an attribute to a single channel - Could use multiple channels but - Limited number of channels Restrictions on size and shape - Points are nothing but location so size and shape are ok - Lines have a length, cannot easily encode attribute as length - Maps with boundaries have area, changing size can be problematic 15

Cartograms [Election Results by Population, M. Newman, 2012] 16

Channel Types Identity => what or where, Magnitude => how much Magnitude Channels: Ordered Attributes Identity Channels: Categorical Attributes Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Spatial region Color hue Motion Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) [Munzner (ill. Maguire), 2014] 17

Mark Types Can have marks for items and links - Connection => pairwise relationship - Containment => hierarchical relationship Marks as Items/Nodes Points Lines Areas Marks as Links Containment Connection [Munzner (ill. Maguire), 2014] 18

Expressiveness and Effectiveness Expressiveness Principle: all data from the dataset and nothing more should be shown - Do encode ordered data in an ordered fashion - Don t encode categorical data in a way that implies an ordering Effectiveness Principle: the most important attributes should be the most salient - Saliency: how noticeable something is - How do the channels we have discussed measure up? - How was this determined? 19

How do we test effectiveness? 20

Test % difference in length between elements 100 0 A B [Heer & Bostock, 2010] 21

Test % difference in length between elements 100 0 A B [Heer & Bostock, 2010] 22

Test % difference in length between elements 100 0 A B [Heer & Bostock, 2010] 23

Test % difference in length between elements 100 Answer: Left is ~5.6x longer than Right 0 A B [Heer & Bostock, 2010] 24

Test % difference in length between elements 100 0 A B [Modified from Heer & Bostock, 2010] 25

Test % difference in length between elements 100 Answer: Right is 4x larger than Left 0 A B [Modified from Heer & Bostock, 2010] 26

Test % difference in area between elements A B [Heer & Bostock, 2010] 27

Test % difference in area between elements Answer: A is ~2.25x larger (in area) than B A B [Heer & Bostock, 2010] 28

Test % difference in area between elements A B [Heer & Bostock, 2010] 29

Test % difference in area between elements Answer: A is ~6.1x larger (in area) than B A B [Heer & Bostock, 2010] 30

Test % difference in area between elements B A [Heer & Bostock, 2010] 31

Test % difference in area between elements Answer: B is ~2.5 larger (in area) than A B A [Heer & Bostock, 2010] 32

Results Summary Cleveland & McGill s Results Positions 1.0 1.5 2.0 2.5 3.0 Log Error Crowdsourced Results Angles Circular areas Rectangular areas (aligned or in a treemap) 1.0 1.5 2.0 2.5 3.0 Log Error [Munzner (ill. Maguire) based on Heer & Bostock, 2014] 33

Ranking Channels by Effectiveness Channels: Expressiveness Types and Effectiveness Ranks Magnitude Channels: Ordered Attributes Position on common scale Position on unaligned scale Length (1D size) Tilt/angle Identity Channels: Categorical Attributes Spatial region Color hue Motion Shape Area (2D size) Depth (3D position) Color luminance Color saturation Curvature Volume (3D size) [Munzner (ill. Maguire), 2014] 34

Visualizing Data Tables Arrange Tables Express Values Separate, Order, Align Regions Separate Order Align 1 Key 2 Keys 3 Keys Many Keys List Matrix Volume Recursive Subdivision Axis Orientation Rectilinear Parallel Radial Layout Density Dense Space-Filling [Munzner (ill. Maguire), 2014] 35

Express Values: Scatterplots Prices in 1980 450 400 350 300 250 200 150 100 50 0 Fish Prices over the Years 0 50 100 150 200 250 300 350 400 450 Prices in 1970 Data: two quantitative values Task: find trends, clusters, outliers How: marks at spatial position in horizontal and vertical directions Correlation: dependence between two attributes - Positive and negative correlation - Indicated by lines Coordinate system (axes) and labels are important! 36

0.000 0.002 0.004 0.006 0 10000 20000 30000 40000 n dist Coordinate Systems 37 [Wickham, 2014]

0.001 0.0001 0.00001 100 1000 10000 n dist Log-Log Plot 38 [Wickham, 2014]

Bubble Plot [Gapminder, Wealth & Health of Nations] 39

Separate, Order, and Align: Categorical Regions Categorical: =,!= Spatial position can be used for categorical attributes Use regions, distinct contiguous bounded areas, to encode categorical attributes Three operations on the regions: - Separate (use categorical attribute) - Align - Order (use some other ordered attribute) Alignment and order can use same or different attribute 40

List Alignment: Bar Charts Data: one quantitative attribute, one categorical attribute Task: lookup & compare values How: line marks, vertical position (quantitative), horizontal position (categorical) What about length? Ordering criteria: alphabetical or using quantitative attribute Scalability: distinguishability - bars at least one pixel wide - hundreds 100 75 50 25 0 100 75 50 25 0 Animal Type Animal Type [Munzner (ill. Maguire), 2014] 41

Stacked Bar Charts 35M 30M 25M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 20M 15M 10M 5.0M 0.0 CA TX NY FL IL PA OH MI GA NC NJ VA WA AZ MA IN TN MOMD WI MN CO AL SC LA KY OR OK CT IA MS AR KS UT NV NMWV NE ID ME NH HI RI MT DE SD AK ND VT DC WY [Bostock, 2012] 42

Grouped Bar Chart 10M 9.0M 8.0M 7.0M Population 65 Years and Over 45 to 64 Years 25 to 44 Years 18 to 24 Years 14 to 17 Years 5 to 13 Years Under 5 Years 6.0M 5.0M 4.0M 3.0M 2.0M 1.0M 0.0 CA TX NY FL IL PA [M. Bostock, http://bl.ocks.org/mbostock/3887051] 43