Visualizing Data: Customization with ggplot2

Similar documents
Getting started with ggplot2

Introduction to Data Visualization

Intro to R for Epidemiologists

Install RStudio from - use the standard installation.

The diamonds dataset Visualizing data in R with ggplot2

Introduction to Graphics with ggplot2

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to ggvis. Aimee Gott R Consultant

03 - Intro to graphics (with ggplot2)

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Creating elegant graphics in R with ggplot2

Econ 2148, spring 2019 Data visualization

Package ggextra. April 4, 2018

Lecture 4: Data Visualization I

Data Visualization in R

Facets and Continuous graphs

Data Handling: Import, Cleaning and Visualisation

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to R Graphics 4. ggplot2

An Introduction to R Graphics

Data visualization with ggplot2

DATA VISUALIZATION WITH GGPLOT2. Coordinates

Data Visualization in R

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

You submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of

Statistical transformations

grammar statistical graphics Building a in Clojure Kevin Lynagh 2012 November 9 Øredev Keming Labs Malmö, Sweden

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Plotting with Rcell (Version 1.2-5)

ggplot in 3 easy steps (maybe 2 easy steps)

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

Importing and visualizing data in R. Day 3

Table of Contents. Preface... ix

Graphical critique & theory. Hadley Wickham

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

R Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1

EDWARD TUFTE. The Visual Display of Quantitative Information. Envisioning Information. Edward Tufte

Visualizing the World

Statistics Lecture 6. Looking at data one variable

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Visual Analytics. Visualizing multivariate data:

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Understand and plan a chart Create a chart Move and resize charts and chart objects Apply chart layouts and styles

Stat 849: Plotting responses and covariates

User manual forggsubplot

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Introduction to ggplot2 Graphics

Lesson 16: More on Modeling Relationships with a Line

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Package ggdark. R topics documented: January 11, Type Package Title Dark Mode for 'ggplot2' Themes Version Author Neal Grantham

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Data organization. So what kind of data did we collect?

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

The Tidyverse BIOF 339 9/25/2018

Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

Figure 1: The PMG GUI on startup

Stat 849: Plotting responses and covariates

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

Plotting with ggplot2: Part 2. Biostatistics

INTRODUCTION TO DATA. Welcome to the course!

How to Make Graphs in EXCEL

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

Graphics in R. Jim Bentley. The following code creates a couple of sample data frames that we will use in our examples.

Package ggsubplot. February 15, 2013

Chapter 2 - Graphical Summaries of Data

STAT 1291: Data Science

EXPLORATORY DATA ANALYSIS. Introducing the data

ggplot2 for beginners Maria Novosolov 1 December, 2014

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

DATA VISUALIZATION WITH GGPLOT2. Grid Graphics

Information Visualization

Chuck Cartledge, PhD. 20 January 2018

Data Mining: Exploring Data. Lecture Notes for Chapter 3

1 The ggplot2 workflow

ggplot2 and maps Marcin Kierczak 11/10/2016

Package cowplot. March 6, 2016

An Introduction to R. Ed D. J. Berry 9th January 2017

LAST UPDATED: October 16, 2012 DISTRIBUTIONS PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Visualization as an Analysis Tool: Presentation Supplement

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

Resources: Books. Data Visualization in R 4. ggplot2. What is ggplot2? Resources: Cheat sheets

STAT 1291: Data Science Lecture 4 - Data Visualization: Composing/dissecting Data Graphics Sungkyu Jung

Data Visualization. Andrew Jaffe Instructor

Elementary Statistics

PRESENTING DATA. Overview. Some basic things to remember

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

8 Organizing and Displaying

Chpt 2. Frequency Distributions and Graphs. 2-4 Pareto chart, time series graph, Pie chart / 35

Transcription:

Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics

ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers a highly versatile grammar of graphics. Basic statistical plots are typically no more difficult with ggplot2 than base R graphics, especially if qplot ( quick plot ) is used. Some graphics, like faceted graphics that make similar plots for each factor level, are considerably easier with ggplot2, while others (like the violin plot) simply don t exist in base R. Publication-quality graphics are usually easier with ggplot2. Today, most (if not all) active development of graphics in R builds from ggplot2. Examples of recent libraries: ggmap (integrates with Google Maps) ggridges (support for many density plots) gganimation (data animation)

gg basics ggplot requires a data.frame. aes (as in aesthestics ) is used to specify x and/or y Additional features are added (literally using +) to layer on additional function calls, including a variety of statistics ggplot(mtcars, aes(x=factor(cyl))) + geom_bar() count 4 6 8 factor(cyl)

gg coding style ggplots are often saved as objects to which additional features are added one or two at a time. g <- ggplot(mtcars, aes(x=factor(cyl))) + geom_bar() g <- g + theme(text=element_text(size=24)) g <- g + labs(x="cylinders", y="n", title="cars") g <- g + theme_minimal() g Cars N 4 6 8

this grammar of graphics enables endless variety source: Google Image 4/8/217

Outline The slides that follow contain ggplot2 examples of: Histograms Box Plots Violin Plots Scatter Plots Loess (Smoothed Trend Lines) Density Plots Faceting To avoid clutter, recurring coding details (like font size) are omitted after their first appearance.

Histograms ggplot(airquality,aes(x=ozone)) + geom_histogram(binwidth=) Daily Ozone Levels New York, May Sept 1973 Parts per Billion 2 1 Ozone

Histogram, density example Setting the y aesthetic to..density.. makes the sum of (width of bar * height of bar) add to 1. The unitb becomes % per unit/ (% per ppb/ below). ggplot(airquality,aes(x=ozone)) + geom_histogram(binwidth=, aes(y=..density..)) Daily Ozone Levels New York, May Sept 1973 (% per ppb)/.2.1... 1 Ozone

Information displayed in histogram Density: The height of the bar tells how many observations there are for one unit on the horizontal scale. For example, the highest density is around 1-2 ppb as 2. = 2% of all (non-missing) observations have values in the range 1-2 ppb. Thus it also displays relative frequency Daily Ozone Levels New York, May Sept 1973 (% per ppb)/.2.1... 1 Ozone

Box plot ggplot(mtcars,aes(x=factor(cyl), y=mpg)) + geom_boxplot() 3 Fuel Efficiency by Engine Type Miles per Gallon 3 2 2 1 4 6 8 Cylinders

Violin plot of Bay Area weather ggplot(sfo213, aes(x = factor(month), y = Max_TemperatureF)) + geom_violin() Daily High at SFO in 213 Source: library(weatherdata) 9 Temperature 8 7 6 1 2 3 4 6 7 8 9 11 12 Month

Scatter plot ggplot(galtonfamilies, aes(x=midparentheight, y=childheight geom_point() Child Height 8 7 7 6 6 Galton's Famous Height Data N = 934 children in 2 families, Galton (1886) 6 6 7 7 8 Parent Height (Mean)

Adding a Smoothed Trend Line Sometimes we want to add a trend line (details on how this works will come later in the course). ggplot(galtonfamilies, aes(x=midparentheight, y=childheight geom_point() + geom_smooth() Child Height 8 7 7 6 6 Galton's Famous Height Data N = 934 children in 2 families, Galton (1886) 6 6 7 7 8 Parent Height (Mean)

Joint Density Plots ggplot(galtonfamilies, aes(x=midparentheight, y=childheight geom_density2d() Child Height 8 7 7 6 6 Galton's Famous Height Data N = 934 children in 2 families, Galton (1886) 6 6 7 7 8 Parent Height (Mean)

Scatter plot If there are many points then one needs to modify the opacity with the alpha parameter to by able to see anything. ggplot(flights, aes(x=as.numeric(dep_time_sched), y=dep_delay)) + geom_point(alpha=.3, size=.) 4 Delay at NYC Area Airpots source: library(nycflights13) Departure Delay 3 2 2 7 12 Scheduled Departure (minutes after midnight)

Scatter plots with more than two variables For multiple variables, add size, shape, and color to scatters.

Example: Scatter Plot with Four Variables A single graphic displaying fuel economy (mpg) by cylinders (cyl), weight (wt), and acceleration (qsec). ggplot(mtcars,aes(x=wt,y=mpg, size=qsec, col=factor(cyl))) geom_point() + labs(x="weight", size="acceleration", col="# Cylinders") 3 mpg 3 2 2 1 # Cylinders 4 6 8 Acceleration 1. 17. 2. 22. 2 3 4 Weight

Conditional Density Plots ggplot(galtonfamilies, aes(x=childheight, col=gender)) + geom_density() Galton's Famous Height Data N = 934 children in 2 families, Galton (1886).1 density. gender female male.. 6 6 7 7 8 Child Height

Small multiples Small multiples refers to a series of charts using similar scales and axes, arranged in a lattice or grid. At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution (Tufte, Envisioning Information, p67). The pair plot is, loosely speaking, a small multiple design More generally, we can produce small multiples in ggplot by faceting

Faceting for Exploratory Analysis Suppose Galton suspects that genetics aren t the only determinant of height and that childhood nutrition is very important too. Other things being equal, he imagines that it is harder to feed large families well, so he decides to facet a histogram of height by the number of children in the family...

Faceting Child Height by Family Size ggplot(galtonfamilies) + geom_bar(aes(childheight, fill=gender)) + facet_grid(children ~.) count Child Height by Family's Number of Children Data: Galton (1886) 6 6 7 7 8 childheight 1 2 3 4 6 7 8 9 11 1 gender female male

Summary ggplot2 is a highly customizable grammar of graphics that enables one to make publication-quality graphics in keeping with the recommendations of visualization experts like Tufte. Histograms Box Plots Violin Plots Scatter Plots Loess (Smoothed Trend Lines) Density Plots Faceting