Graphical critique & theory. Hadley Wickham

Similar documents
A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Statistical transformations

Facets and Continuous graphs

Introduction to R and the tidyverse. Paolo Crosetto

Econ 2148, spring 2019 Data visualization

Lecture 4: Data Visualization I

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

The diamonds dataset Visualizing data in R with ggplot2

Introduction to Graphics with ggplot2

03 - Intro to graphics (with ggplot2)

Building visualisations Hadley Wickham

Getting started with ggplot2

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Plotting with Rcell (Version 1.2-5)

1 The ggplot2 workflow

An introduction to ggplot: An implementation of the grammar of graphics in R

Creating elegant graphics in R with ggplot2

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

An introduction to R Graphics 4. ggplot2

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Introduction to Data Visualization

An Introduction to R Graphics

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

Visualizing Data: Customization with ggplot2

We will start at 2:05 pm! Thanks for coming early!

Package ggsubplot. February 15, 2013

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization. Module 7

Package lvplot. August 29, 2016

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Plotting with ggplot2: Part 2. Biostatistics

Stat 849: Plotting responses and covariates

data visualization Show the Data Snow Month skimming deep waters

ggplot2 for beginners Maria Novosolov 1 December, 2014

You submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of

Data Visualization in R

User manual forggsubplot

Importing and visualizing data in R. Day 3

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Hadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer

Stat 849: Plotting responses and covariates

Package cowplot. March 6, 2016

Data Visualization in R

Future in mind Self Assessment Tool User Guide v2

What Type Of Graph Is Best To Use To Show Data That Are Parts Of A Whole

Package ggseas. June 12, 2018

ggplot2 and maps Marcin Kierczak 11/10/2016

Data and Image Models

STAT 1291: Data Science

Last Time: Value of Visualization

Package ggrepel. September 30, 2017

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

Data visualization with ggplot2

Intro to R for Epidemiologists

ggplot2: elegant graphics for data analysis

Data and Image Models

PRESENTING DATA. Overview. Some basic things to remember

Data Handling: Import, Cleaning and Visualisation

DATA VISUALIZATION WITH GGPLOT2. Coordinates

Package ggextra. April 4, 2018

ggplot in 3 easy steps (maybe 2 easy steps)

Session 3 Nick Hathaway;

# Call plot plot(gg)

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

grammar statistical graphics Building a in Clojure Kevin Lynagh 2012 November 9 Øredev Keming Labs Malmö, Sweden

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Adding a corporate identity to reproducible research

Visualizing the World

Exploring Maps with ggplot2 1 Draft: Do not Cite

How to Plot With Ggiraph

Resources: Books. Data Visualization in R 4. ggplot2. What is ggplot2? Resources: Cheat sheets

Package cowplot. November 16, 2017

Desktop Studio: Charts. Version: 7.3

Working with Charts Stratum.Viewer 6

DATA VISUALIZATION WITH GGPLOT2. Grid Graphics

Desktop Studio: Charts

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Introduction to ggvis. Aimee Gott R Consultant

Data and Image Models

R Workshop 1: Introduction to R

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Package ggmosaic. February 9, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Data Visualization for M&E. BRIDGE M&E Colloquium Jerusha Govender 8 August 2017

Package lemon. September 12, 2017

Name Date Types of Graphs and Creating Graphs Notes

Package ggspectra. May 7, 2018

An Introduction to R. Ed D. J. Berry 9th January 2017

Tufte s Design Principles

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

Package geomnet. December 8, 2016

Visualization of large multivariate datasets with the tabplot package

Data visualization in Python

Package ggqc. R topics documented: January 30, Type Package Title Quality Control Charts for 'ggplot' Version Author Kenith Grey

Transcription:

Graphical critique & theory Hadley Wickham

Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial for developing the best display of your data. Gives rise to two key questions:

What should I plot? How can I plot it?

Two general tools Plot critique toolkit: graphics are like pumpkin pie Theory behind ggplot2: A layered grammar of graphics plus lots of practice...

Graphics are like pumpkin pie The four C s of critiquing a graphic

Content

Construction

Context

Consumption

Content What data (variables) does the graph display? What non-data is present? What is pumpkin (essence of the graphic) vs what is spice (useful additional info)?

Your turn Identify the data and nondata on Napoleon's march and Building an electoral victory.

Results Minard s march: (top) latitude, longitude, number of troops, direction, branch, city name (bottom) latitude, temperature, date Building an electoral victory: state, number of electoral college votes, winner, margin of victory

Construction How many layers are on the plot? What data does each layer display? What sort of geometric object does it use? How are variables mapped to aesthetics?

Your turn Answer the following questions for Napoleon's march and Flight delays : How many layers are on the plot? What data does the layer display? How does it display it?

Results Napoleon s march: (top) (1) path plot with width mapped to number of troops, colour to direction, separate group for each branch (2) labels giving city names (bottom) (1) line plot with longitude on x-axis and temperature on y-axis (2) text labels giving dates Flight delays: (1) white circles showing 100% cancellation, (2) outline of states, (3) points with size proportional to percent cancellations at each airport.

Can the explain composition of a graphic in words, but how do we create it?

If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum. Euclid, ~300 BC

If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum. Euclid, ~300 BC m(σx) = Σ(mx)

The grammar of graphics An abstraction which makes it easier to create, understand and communicate graphics. Developed by Leland Wilkinson, particularly in The Grammar of Graphics 1999/2005 ggplot2 adapts for use within R and builds to create A Layered Grammar of Graphics (in press, Journal of Computational and Graphical Statistics).

What is a layer? Data Mappings from variables to aesthetics (aes) A geometric object (geom) A statistical transformation (stat) A position adjustment (position)

layer(geom, stat, position, data, mapping,...) layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity" ) layer( data = diamonds, mapping = aes(x = carat), geom = "bar", stat = "bin", position = "stack" )

# A lot of typing! layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity" ) # Every geom has an associated default statistic # (and vice versa), and position adjustment. geom_point(aes(displ, hwy), data = mpg) geom_histogram(aes(displ), data = mpg)

# To actually create the plot ggplot() + geom_point(aes(displ, hwy), data = mpg) ggplot() + geom_histogram(aes(carat), data = diamonds)

# Multiple layers ggplot() + geom_point(aes(displ, hwy), data = mpg) + geom_smooth(aes(displ, hwy), data = mpg) # Avoid redundancy: ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()

# Different layers can have different aesthetics ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth() ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(aes(group = class), method = "lm", se = F)

Your turn Perform the same process for the flight delays data, and try and come up with the ggplot2 code that would create it.

usa <- map_data("state") ggplot(feb13, aes(long, lat)) + geom_point(aes(size = 1), colour = "white") + geom_polygon(aes(group = group), data = usa, colour = "grey70", fill = NA) + geom_point(aes(size = ncancelw / ntot), colour = alpha("black", 1/2)) last_plot() + scale_area("% cancelled", to = c(1, 8), breaks = seq(0, 1, by = 0.2), limits = c(0, 1)) scale_x_continuous("", limits = c(-125, -67)), scale_y_continuous("", limits = c(24, 50))

Back to Minard

troops <- read.csv("minard-troops.csv") cities <- read.csv("minard-cities.csv") ggplot(cities, aes(long, lat)) + geom_path(aes(size = survivors, colour = direction, group = interaction(group, direction)), data = troops) + geom_text(aes(label = city), hjust = 0, vjust = 1, size = 4) # Polish appearance last_plot() + scale_x_continuous("", limits = c(24, 39)) + scale_y_continuous("") + scale_colour_manual(values = c("grey50","red")) + scale_size(to = c(1, 10))

Other components Scales. Used to override default perceptual mappings. Mainly useful for polishing plot for communication. Coordinate system. Rarely useful, but when needed are critical. Facetting. Have seen in use already. Only other feature of importance is interaction with scales. Themes: control presentation of non-data elements.