Introduction to Graphics with ggplot2

Similar documents
The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

Getting started with ggplot2

Econ 2148, spring 2019 Data visualization

Data Visualization in R

Visualizing the World

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Plotting with Rcell (Version 1.2-5)

An introduction to ggplot: An implementation of the grammar of graphics in R

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data Visualization in R

Lecture 4: Data Visualization I

Statistical transformations

An Introduction to R Graphics

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

ggplot2: elegant graphics for data analysis

Visualizing Data: Customization with ggplot2

Introduction to Data Visualization

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

Importing and visualizing data in R. Day 3

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

03 - Intro to graphics (with ggplot2)

The diamonds dataset Visualizing data in R with ggplot2

Stat 849: Plotting responses and covariates

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

Facets and Continuous graphs

Graphical critique & theory. Hadley Wickham

Data visualization with ggplot2

User manual forggsubplot

PRESENTING DATA. Overview. Some basic things to remember

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

Stat 849: Plotting responses and covariates

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created

1 The ggplot2 workflow

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Using Built-in Plotting Functions

ggplot2 for beginners Maria Novosolov 1 December, 2014

Package lvplot. August 29, 2016

ggplot2 and maps Marcin Kierczak 11/10/2016

Data Handling: Import, Cleaning and Visualisation

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

Creating elegant graphics in R with ggplot2

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

Package ggsubplot. February 15, 2013

STAT 1291: Data Science Lecture 4 - Data Visualization: Composing/dissecting Data Graphics Sungkyu Jung

Notes for week 3. Ben Bolker September 26, Linear models: review

ggplot in 3 easy steps (maybe 2 easy steps)

Intro to R for Epidemiologists

An introduction to R Graphics 4. ggplot2

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Data Visualization. Module 7

Package ggmosaic. February 9, 2017

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Rational Numbers: Graphing: The Coordinate Plane

Hadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer

Data visualization in Python

You submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of

STAT 1291: Data Science

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

INTRODUCTION TO DATA. Welcome to the course!

Table of Contents. Preface... ix

Building visualisations Hadley Wickham

Visual Analytics. Visualizing multivariate data:

Transformations. Lesson Summary: Students will explore rotations, translations, and reflections in a plane.

Introduction to ggvis. Aimee Gott R Consultant

Data Visualization. Andrew Jaffe Instructor

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Package ggseas. June 12, 2018

Package ggextra. April 4, 2018

Regression III: Advanced Methods

Financial Econometrics Practical

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

Install RStudio from - use the standard installation.

7 th Grade HONORS Year at a Glance

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

Working with Geospatial Data in R. Introducing sp objects

LAST UPDATED: October 16, 2012 DISTRIBUTIONS PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Resources: Books. Data Visualization in R 4. ggplot2. What is ggplot2? Resources: Cheat sheets

Exploratory model analysis

LEARNING AREA 1 : NUMBERS AND OPERATIONS 1. W W12 5 Jan 13 Jan RATIONAL NUMBERS CONTENT LEARNING STANDARDS STANDARDS Recognise positive negative

DATA VISUALIZATION WITH GGPLOT2. Coordinates

Package ggiraphextra

R programming Philip J Cwynar University of Pittsburgh School of Information Sciences and Intelligent Systems Program

Making plots in R [things I wish someone told me when I started grad school]

NFC ACADEMY MATH 600 COURSE OVERVIEW

Package lemon. September 12, 2017

What Is A Relation? Example. is a relation from A to B.

MATH Grade 6. mathematics knowledge/skills as specified in the standards. support.

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Visualization as an Analysis Tool: Presentation Supplement

proficient in applying mathematics knowledge/skills as specified in the Utah Core State Standards. The student generally content, and engages in

Input/Output Machines

<style> pre { overflow-x: auto; } pre code { word-wrap: normal; white-space: pre; } </style>

Evgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:

Section 2.1 Graphs. The Coordinate Plane

Making R Graphs, For People Who Don t Want To Learn R

Statistics Lecture 6. Looking at data one variable

Transcription:

Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28

Graphics with ggplot2 ggplot2 [... ] allows you to produce graphics using the same structured thinking that you use to design an analysis, reducing the distance between a plot in your head and one on the page. It is especially helpful for students who have not yet developed the structured approach to analysis used by experts. Hadley Wickham Wickham, H. (2009), ggplot2, Springer. Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 2 / 28

How ggplot2 works (1) Wilkinson (2005) created the grammar of graphics to describe the deep features that underlie all statistical graphics. The grammar of graphics is an answer to a question: what is a statistical graphic? The layered grammar of graphics (Wickham, 2009) builds on Wilkinson s grammar, focussing on the primacy of layers and adapting it for embedding within R. Hence, a graphic (may) consists of (see Wickham, 2009, p. 3): The data that you want to visualise and a set of aesthetic mappings describing how variables in the data are mapped to aesthetic attributes that you can perceive. Geometric objects, geoms for short, represent what you actually see on the plot: points, lines, polygons, etc. Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 3 / 28

How ggplot2 works (2) Statistical transformations, stats for short, summarise data in many useful ways. For example, binning and counting observations to create a histogram, or summarising a 2d relationship with a linear model. Stats are optional, but very useful. The scales map values in the data space to values in an aesthetic space, whether it be colour, or size, or shape. Scales draw a legend or axes, which provide an inverse mapping to make it possible to read the original data values from the graph. A coordinate system, coord for short, describes how data coordinates are mapped to the plane of the graphic. It also provides axes and gridlines to make it possible to read the graph. We normally use a Cartesian coordinate system, but a number of others are available, including polar coordinates and map projections. Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 4 / 28

How ggplot2 works (3) A faceting specification describes how to break up the data into subsets and how to display those subsets as small multiples. This is also known as conditioning or latticing/trellising. Remember: ggplot always requires data to be stored in data.frame objects Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 5 / 28

Some examples (1) Consider the following data frame: > str(borsadf) 'data.frame': 2610 obs. of 5 variables: $ date : Date, format: "2002-05-07" "2002-05-08"... $ SP500 : num 1049 1089 1073 1055 1075... $ rendsp500: num NA 0.0368-0.0147-0.0169 0.0184... $ ENI : num 16.5 16.4 16.1 15.9 16... $ rendeni : num NA -0.00357-0.01747-0.0153 0.00738... > borsadf[1:4,] date SP500 rendsp500 ENI rendeni 1 2002-05-07 1049.49 NA 16.4604 NA 2 2002-05-08 1088.85 0.03681776 16.4017-0.003572508 3 2002-05-09 1073.01-0.01465431 16.1177-0.017466941 4 2002-05-10 1054.99-0.01693650 15.8729-0.015304794 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 6 / 28

Some examples (2) Load the ggplot2 package: > library(ggplot2) The following command generate a scatterplot of daily returns of S&P500 and ENI: > ggplot(borsadf,aes(x=rendsp500,y=rendeni))+geom_point() or, in a script: > gr1<- ggplot(borsadf,aes(x=rendsp500,y=rendeni))+ + geom_point() > print(gr1) Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 7 / 28

Some examples (3) 0.1 rendeni 0.0 0.1 0.10 0.05 0.00 0.05 0.10 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 8 / 28

Some examples (4) It is also possible to set the colour, and the transparency of all points: > gr1<- ggplot(borsadf,aes(x=rendsp500,y=rendeni))+ + geom_point(colour="red",alpha=0.1) > print(gr1) And add a linear regression line with 99% confidence inteval: > gr1<- ggplot(borsadf,aes(x=rendsp500,y=rendeni))+ + geom_point(colour="red",alpha=0.1)+ + stat_smooth(method="lm",formula=y~x,level=0.99, + size=0.5) > print(gr1) Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 9 / 28

Some examples (5) 0.1 rendeni 0.0 0.1 0.10 0.05 0.00 0.05 0.10 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 10 / 28

Some examples (6) 0.1 rendeni 0.0 0.1 0.10 0.05 0.00 0.05 0.10 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 11 / 28

Some examples (7) Assume that we want to distinguish between returns before and after the Lehman Brothers bankruptcy (15 September 2008). We define a new variable (as a factor) called LehBroB : > borsadf$lehbrob<- factor(borsadf$date<"2008-09-15", + c(true,false),c("before","after")) Now we plot the points as before, but we specify (in aes ) that points should be coloured according to the value of variable LehBroB : > gr1<- ggplot(borsadf,aes(x=rendsp500,y=rendeni))+ + geom_point(aes(colour=lehbrob),alpha=0.1) > print(gr1) Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 12 / 28

Some examples (8) 0.1 rendeni 0.0 LehBroB before after 0.1 0.10 0.05 0.00 0.05 0.10 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 13 / 28

Some examples (9) It is also possible to derive a 2-dimensional density function: > gr1<- ggplot(borsadf,aes(x=rendsp500,y=rendeni))+ + geom_point(alpha=0.1)+ + geom_density2d(colour="red",binwidth=0.001,bins=7)+ + xlim(c(-0.03,0.03))+ + ylim(c(-0.05,0.05)) > print(gr1) Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 14 / 28

Some examples (10) 0.050 0.025 rendeni 0.000 0.025 0.050 0.02 0.00 0.02 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 15 / 28

Some examples (11) Now we are going to draw the time series of the price of the S&P500: > gr1<- ggplot(borsadf,aes(x=date,y=sp500))+geom_line() > print(gr1) 1600 1400 SP500 1200 1000 800 2002 2004 2006 2008 2010 2012 date Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 16 / 28

Some examples (12) You can colour the line according to the value of another variable: > gr1<- ggplot(borsadf,aes(x=date,y=sp500))+ + geom_line(aes(colour=lehbrob)) > print(gr1) 1600 SP500 1400 1200 1000 LehBroB before after 800 2002 2004 2006 2008 2010 2012 date Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 17 / 28

Some examples (13) Otherwise, you can change the type of the line: > gr1<- ggplot(borsadf,aes(x=date,y=sp500))+ + geom_line(aes(linetype=lehbrob)) > print(gr1) 1600 SP500 1400 1200 1000 LehBroB before after 800 2002 2004 2006 2008 2010 2012 date Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 18 / 28

Some examples (14) Or any other aestetics: > gr1<- ggplot(borsadf,aes(x=date,y=sp500))+ + geom_line(aes(alpha=lehbrob)) > print(gr1) 1600 SP500 1400 1200 1000 LehBroB before after 800 2002 2004 2006 2008 2010 2012 date Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 19 / 28

Some examples (15) The histogram of S&P500 returns can be obtained with: > gr1<- ggplot(borsadf,aes(x=rendsp500,y=..density..))+ + geom_histogram(binwidth=0.01) > print(gr1) 40 density 30 20 10 0 0.10 0.05 0.00 0.05 0.10 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 20 / 28

Some examples (16) And it is possible to add an estimated density function: > gr1<- ggplot(borsadf,aes(x=rendsp500,y=..density..))+ + geom_histogram(binwidth=0.01)+geom_density() > print(gr1) 60 density 40 20 0 0.10 0.05 0.00 0.05 0.10 rendsp500 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 21 / 28

Some examples (17) If you need to plot more than one variable on the same plot, the data.frame object should have a long-table format. Assume that you want to compare the distributions of returns of ENI and S&P500 by means of a boxplot. Extract the variables you are interested in: > tempborsa<- borsadf[,c("date","rendeni","rendsp500")] > head(tempborsa) date rendeni rendsp500 1 2002-05-07 NA NA 2 2002-05-08-0.003572508 0.03681776 3 2002-05-09-0.017466941-0.01465431 4 2002-05-10-0.015304794-0.01693650 5 2002-05-13 0.007375290 0.01837999 6 2002-05-14 0.012176663 0.02092311 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 22 / 28

Some examples (18) Convert tempborsa into long-table format: > library(reshape2) > tempborsa<- melt(tempborsa,id.vars="date", + variable.name="asset",value.name="return") > str(tempborsa) 'data.frame': 5220 obs. of 3 variables: $ date : Date, format: "2002-05-07" "2002-05-08"... $ asset : Factor w/ 2 levels "rendeni","rendsp500": 1 1 1 1 1 $ return: num NA -0.00357-0.01747-0.0153 0.00738... > head(tempborsa,3) date asset return 1 2002-05-07 rendeni NA 2 2002-05-08 rendeni -0.003572508 3 2002-05-09 rendeni -0.017466941 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 23 / 28

Some examples (19) Plot the time series: > gr1<- ggplot(tempborsa,aes(x=date,y=return))+ + geom_line(aes(colour=asset)) > print(gr1) return 0.1 0.0 asset rendeni rendsp500 0.1 2002 2004 2006 2008 2010 2012 date Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 24 / 28

Some examples (20) Plot the densities: > gr1<- ggplot(tempborsa,aes(x=return,y=..density.., + fill=asset,colour=asset))+geom_density(alpha=0.3) > print(gr1) 60 density 40 20 asset rendeni rendsp500 0 0.1 0.0 0.1 return Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 25 / 28

Some examples (21) Plot the densities: > gr1<- ggplot(tempborsa,aes(x=asset,y=return,fill=asset))+ + geom_boxplot()+coord_flip() > print(gr1) asset rendsp500 rendeni asset rendeni rendsp500 0.1 0.0 0.1 return Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 26 / 28

Some examples (22) 49 48 47 Per capita GPD 46 Less than 30 000 30 000 50 000 More than 50 000 45 NA 44 43 4 Flavio Santi 8 12 Introduction to Graphics with ggplot2 16 Sept. 6, 2017 27 / 28

thank you! Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 28 / 28