The diamonds dataset Visualizing data in R with ggplot2

Size: px
Start display at page:

Download "The diamonds dataset Visualizing data in R with ggplot2"

Transcription

1 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018

2 Contents The diamonds dataset Visualizing data in R with ggplot2

3 The diamonds dataset

4 The tibble package The tibble package is part of the core tidyverse. library(tidyverse) Tibbles are data frames, tweaked to make life a little easier: never change the type of the inputs (e.g. do not convert strings to factors!) never changes the names of variables only recycles inputs of length 1 never creates row.names() Subsetting is a little different in tibbles: use [[]] or $ to extract columns. You can read more about these features with vignette("tibble")

5 The diamonds dataset Contains prices and other attributes of almost 54,000 diamonds. Included in tidyverse. data(diamonds) diamonds ## # A tibble: 53,940 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## Ideal E SI ## Premium E SI ## Good E VS ## Premium I VS ## Good J SI ## Very Good J VVS ## Very Good I VVS ## Very Good H SI ## Fair E VS ## Very Good H VS ## #... with 53,930 more rows More information with?diamonds. Spreadsheet view in RStudio with View(diamonds).

6 Introduction to ggplot2

7 The ggplot2 package The ggplot2 package is part of the core of tidyverse. library(tidyverse) It is the most elegant and versatile tool for graphically visualizing data in R, offering a coherent system (or grammar) for building graphs. R also has some basic built-in graphics, but we do not use that in this course. Instead, ggplot2 offers: higher level of abstraction plots broken into layers beautiful graphics excellent documentation large user base Base graphics are good for drawing pictures; ggplot2 graphics are good for understanding the data. (Wickham, 2012)

8 Building blocks of a ggplot2 graph A ggplot2 graph is built up from a few basic elements: Element Symbol Description Data The raw data that you want to plot. Geometries geom_ The geometric shapes that will represent the data. Aesthetics aes() Aesthetics of the geometric and statistical objects, such as color, size, shape and position. Scales scale_ Maps between the data and the aesthetic dimensions, such as data range to plot width or factor values to colors. The ggplot() function is used to initialize the basic graph structure. You need to add extra components to generate a graph. Specify different parts of a plot, and add them using an + operator.

9 Creating a ggplot Create a scatterplot with weight on the x axis and price on the y axis. ggplot(diamonds, aes(x=carat, y=price)) + geom_point()

10 Plots as objects Whenever ggplot() is called, an object is created. p <- ggplot(diamonds, aes(x=carat, y=price)) + geom_point() p

11 Saving plots Now that you have your beautiful plot, you may want to save it as an image. ggsave() is a convenient function for saving a plot. By default, it saves the last plot that you displayed, using the size of the current graphics device. It also guesses the type of graphics device from the extension. ggsave(filename, plot = last_plot(), device = NULL, path = NULL, scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE,...) Device can be either be a device function (e.g. png), or one of eps, ps, tex (pictex), pdf, jpeg, tiff, png, bmp, svg or wmf (windows only).

12 Aesthetic mappings

13 What are aesthetic mappings? Aesthetic means something you can see. Examples include: position (i.e., on the x and y axes) color ( outside color) fill ( inside color) shape (of points) linetype size Each type of geom accepts only a subset of all aesthetics. You can convey information about your data by mapping the aesthetics in your plot to the variables in your dataset.

14 Adding a color aesthetic We can color the points based on clarity by adding another aesthetic. ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point()

15 Adding a color aesthetic (2) How does the quality of the color affect the price? ggplot(diamonds, aes(x=carat, y=price, color=color)) + geom_point()

16 Adding a shape aesthetic ggplot(diamonds, aes(x=carat, y=price, color=clarity, shape=cut)) + geom_point()

17 Facets

18 Facets for a categorical variable Another way of adding categorical variables is to split your plot into facets. ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point() + facet_wrap(~ cut)

19 Facets for combinations of variables You can facet your plot on the combination of two variables. ggplot(diamonds, aes(x=carat, y=price)) + geom_point() + facet_wrap(cut ~ clarity, nrow=5)

20 Exercise Fuel economy data for 38 popular models of car mpg ## # A tibble: 3 x 11 ## manufacturer model displ year cyl trans drv cty hwy fl c ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> < ## 1 audi a auto f p c ## 2 audi a manu f p c ## 3 audi a manu f p c What plots does the following code make? What does. do? ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(drv ~.) ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(. ~ cyl)

21 Geometric objects

22 What are geometric objects? Geometric objects are the marks we put on the plot. Examples: points (geom_point, for scatter plots, dot plots, ) lines (geom_line, for time series; geom_smooth, for trend lines, ) boxplot (geom_boxplot, for boxplots) A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator. For a list of available geometric objects: help.search("geom_", package = "ggplot2")

23 Adding a smoothing trend ggplot(diamonds, aes(x=carat, y=price)) + geom_point() + geom_smooth() The shaded area represents uncertainty in this smoothing curve.

24 Adding a smoothing trend (2) With a color aesthetic, ggplot will create one smoothing curve for each color. ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point() + geom_smooth()

25 Changing the smoothing method ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point() + geom_smooth(method="lm") Type help(geom_smooth, ggplot2) for more options.

26 Specifying geometric object aesthetics Aesthetics can also be specified for a single geometric object. ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=clarity)) + geom_smooth()

27 Aesthetic mappings vs fixed aesthetics Set the same color and transparency for all observations. ggplot(diamonds, aes(x=carat, y=price)) + geom_point(color="darkred", alpha=0.2)

28 Histograms ggplot(diamonds, aes(x=price)) + geom_histogram()

29 Customizing histograms You can specify the number of bins or the bin width. ggplot(diamonds, aes(x=price)) + geom_histogram(bins=10)

30 Bar charts A discrete analogue of a histogram is the bar chart. The geom_bar counts the number of instances of each discrete class. ggplot(diamonds, aes(x=clarity)) + geom_bar()

31 Make this plot. Exercise

32 Boxplots Boxplots graphically depict groups of numerical data through their quartiles. ggplot(diamonds, aes(x=clarity, y=carat)) + geom_boxplot()

33 Position adjustments Position adjustments are used to adjust the position of each geom. The following position adjustments are available: position_identity: default of most geoms position_jitter: adds a small amount of random variation position_dodge: default of geom_boxplot position_stack: default of geom_bar, geom_histogram position_fill: useful for geom_bar, geom_histogram The position parameter can be set as follows: geom_point(..., position="jitter")

34 Position adjustments for scatterplots Overplotting: many points overlap each other. Here variables are categorical, but sometimes rounding causes overplotting. p0 <- ggplot(diamonds,aes(x=cut, y=depth)) p1 <- p0 + geom_point() p2 <- p0 + geom_point(position = "jitter")

35 Position adjustments for bar charts The stacking is performed automatically by the position adjustment specified by the position argument. p0 <- ggplot(data = diamonds, aes(x=cut, fill=clarity)) p1 <- p0 + geom_bar() # p2 <- EXERCISE

36 Scales

37 Aesthetic mapping vs variable scaling aes() assigns an aesthetic to a variable; it doesn t determine how mapping should be done. For example, aes(shape = x) or aes(color = z) do not specify what shapes or what colors should be used. To choose colors/shapes/sizes etc. you need to modify the corresponding scale. ggplot2 includes scales for: position color and fill size shape line type Scales can be modified with functions of the form: scale_<aesthetic>_<type>() In RStudio, type scale_ followed by TAB to list all available scales.

38 Scales for axes Square-root transformation on the y-axis: p1 <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point() p2 <- p1 + scale_y_sqrt()

39 Scales for shapes p1 <- ggplot(diamonds,aes(x=carat,y=price,shape=cut))+geom_point() p2 <- p1 + scale_shape_manual(values = c(1:5))

40 Scales for discrete colors To choose specific colors for discrete variables we use scale_color_manual(). p1 <- ggplot(diamonds,aes(x=carat,y=price,color=cut))+geom_point() color.values <- c("red","orange","yellow","green","blue") p2 <- p1 + scale_color_manual(values=color.values) You can also use default palettes with scale_color_brewer.

41 Scales for continuous colors For continuous variables we use scale_color_gradient and specify the end-points of the color spectrum. p1 <- ggplot(diamonds,aes(x=carat,y=price,color=price))+geom_point() #p2 <- EXERCISE You can also scale the values of the variable corresponding to color: scale_color_gradient(low = "blue", high = "red", trans = "log10")

42 Manual transformations You can also define your own transformations, e.g. position scaling. Square-root transformation on the y-axis: p1 <- ggplot(diamonds,aes(x=carat,y=price))+geom_point()+scale_y_sqrt() p2 <- ggplot(diamonds,aes(x=carat,y=sqrt(price))) + geom_point() Note that the labels on the y-axis are different.

43 Modify axis, legend, and plot labels Good labels are critical for making your plots accessible to a wider audience. Some of the most useful ggplot2 functions: labs(...) xlab(label) ylab(label) ggtitle(label, subtitle = NULL) You can even display mathematical formulae, using the function expression() and the syntax of plotmath expressions. Alternatively, the R package latex2exp lets you use LaTeX to typeset math.

44 Axis labels with mathematical expressions Square-root transformation on the y-axis: p1 <- ggplot(diamonds,aes(x=carat,y=sqrt(price))) + geom_point() p2 <- p1 + labs(x = "carat", y = expression(sqrt(price)))

45 Statistical transformations

46 Credit to What are statistical transformations? Many graphs, like scatterplots, plot the raw values of your dataset. Other graphs, like bar charts, calculate new values to plot: Bar charts and histograms bin your data and then plot bin counts. Smoothers fit a model to your data and then plot predictions. Boxplots compute a robust summary of the distribution and then display it. The algorithm used to calculate new values for a graph is called a stat.

47 C ed t to ttp:// ds. ad.co. geom vs. stat You can generally use geoms and stats interchangeably. Every geom has a default stat, and every stat has a default geom. p1 <- ggplot(data = diamonds) + geom_bar(aes(x = cut)) p2 <- ggplot(data = diamonds) + stat_count(aes(x = cut))

48 So why should you know about stat? You may want to: Override the default stat. E.g., in a bar chart with frequencies already in data set, use stat_identity instead of the default stat_count. Override the default mapping from transformed variables to aesthetics. E.g., bar chart of proportion, rather than count. Draw greater attention to the statistical transformation in your code. E.g., stat_summary summarises the y values for each unique x value. ggplot2 provides over 20 stats for you to use. You can get help in the usual way: help.search("stat_", package = "ggplot2")

49 The layered grammar of graphics

50 A code template At this point we have a foundation to make any type of plot. ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<mappings>), stat = <STAT>, position = <POSITION>) + <COORDINATE_FUNCTION> + <FACET_FUNCTION> This composes the grammar of graphics, a formal system for building plots. In practice, we don t need all: ggplot2 will provide useful defaults for everything except: the data the mappings the geom function. We did not discuss coordinate systems: they are a little more complicated. The default is Cartesian: x and y positions act independently to determine the location of each point.

51 Summary of the workflow At this point we have developed a general recipe for making any plot. 1. Specify the dataset 2. Transform it into the information that you want to display (stat_) 3. Choose a geometric object to represent each observation in the transformed data (geom_) 4. Select a coordinate system to place the geoms into (default: Cartesian) 5. If needed, extent the plot by adding layers 6. If needed, create multiple plots with facets Have we missed anything? Scales (discussed earlier): from data values to visual properties Aesthetics unrelated to the data (later) Annotations (omitted): layers than don t inherit global settings from the plot. Used to add fixed reference data to plot. Programming with ggplot2 (omitted): automating the creation of plots

52 Conclusion

53 Learn more about ggplot2 Use the help?function_name help(function_name, package_name) Online package reference can be easier to search. Lots of other online resources. A good starting point: Book: R for data science, by Garrett Grolemund and Hadley Wickham.

54 Other R packages for plotting Even ggplot2 cannot literally make any plot. Some kinds of plot require specialized packages. plotly: interactive and 3D plots, good for online publications gridextra: easily combine plots into grids ggnet2: visualizing networks heatmaply: interactive heatmaps ggmap: retrieve maps from popular online mapping services and plot them using the ggplot2 framework More information online, or as a starting point last year s course material.

55 Common problems As you start to run R code, you re likely to run into problems. Don t worry - it happens to everyone. Common things to check: Pair every ( with ) and opening " with closing " If there is a + at the end, R expects you to complete the expression + has to go at the end of the line, not the beginning Do not be afrain to use the help.?function_name help(function_name, package_name) Carefully read any error messages. Another great tool is Google: try looking up the error message.

56 Next time Importing data from files

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 1. Diving in: scatterplots & aesthetics 2. Facetting 3. Geoms

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1) Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Advanced Plotting with ggplot2 Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Today s Lecture Objectives 1 Distinguishing different types of plots and their purpose 2 Learning

More information

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

Plotting with Rcell (Version 1.2-5)

Plotting with Rcell (Version 1.2-5) Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson

More information

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12 Stat405 Displaying distributions Hadley Wickham 1. The diamonds data 2. Histograms and bar charts 3. Homework Diamonds Diamonds data ~54,000 round diamonds from http://www.diamondse.info/ Carat, colour,

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

Creating elegant graphics in R with ggplot2

Creating elegant graphics in R with ggplot2 Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is

More information

1 The ggplot2 workflow

1 The ggplot2 workflow ggplot2 @ statistics.com Week 2 Dope Sheet Page 1 dope, n. information especially from a reliable source [the inside dope]; v. figure out usually used with out; adj. excellent 1 This week s dope This week

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

Visualization of large multivariate datasets with the tabplot package

Visualization of large multivariate datasets with the tabplot package Visualization of large multivariate datasets with the tabplot package Martijn Tennekes and Edwin de Jonge December 18, 2012 (A later version may be available on CRAN) Abstract The tableplot is a powerful

More information

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL 1 R and RStudio OVERVIEW 2 R and RStudio R is a free and open environment

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates 10-09-03 Outline Contents 1 R Graphics Systems Graphics systems in R ˆ R provides three dierent high-level graphics systems base graphics The system

More information

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist LondonR: Introduction to ggplot2 Nick Howlett Data Scientist Email: nhowlett@mango-solutions.com Agenda Catie Gamble, M&S - Using R to Understand Revenue Opportunities for your Online Business Andrie de

More information

Econ 2148, spring 2019 Data visualization

Econ 2148, spring 2019 Data visualization Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by

More information

User manual forggsubplot

User manual forggsubplot User manual forggsubplot Garrett Grolemund September 3, 2012 1 Introduction ggsubplot expands the ggplot2 package to help users create multi-level plots, or embedded plots." Embedded plots embed subplots

More information

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences' Lecture 8: The grammar of graphics STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University Grammar? A set of rules describing how to compose a 'vocabulary'

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018 Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

Graphical critique & theory. Hadley Wickham

Graphical critique & theory. Hadley Wickham Graphical critique & theory Hadley Wickham Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Data Visualization. Module 7

Data Visualization.  Module 7 Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates Department of Statistics University of Wisconsin, Madison 2010-09-03 Outline R Graphics Systems Brain weight Cathedrals Longshoots Domedata Summary

More information

Plotting with ggplot2: Part 2. Biostatistics

Plotting with ggplot2: Part 2. Biostatistics Plotting with ggplot2: Part 2 Biostatistics 14.776 Building Plots with ggplot2 When building plots in ggplot2 (rather than using qplot) the artist s palette model may be the closest analogy Plots are built

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Package cowplot. March 6, 2016

Package cowplot. March 6, 2016 Package cowplot March 6, 2016 Title Streamlined Plot Theme and Plot Annotations for 'ggplot2' Version 0.6.1 Some helpful extensions and modifications to the 'ggplot2' library. In particular, this package

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Maps & layers Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University July 2010 1. Introduction to map data 2. Map projections 3. Loading & converting

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

EXPLORATORY DATA ANALYSIS. Introducing the data

EXPLORATORY DATA ANALYSIS. Introducing the data EXPLORATORY DATA ANALYSIS Introducing the data Email data set > email # A tibble: 3,921 21 spam to_multiple from cc sent_email time image 1 not-spam 0 1 0 0

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

ggplot2: elegant graphics for data analysis

ggplot2: elegant graphics for data analysis ggplot2: elegant graphics for data analysis Hadley Wickham February 24, 2009 Contents 1. Preface 1 1.1. Introduction.................................... 1 1.2. Other resources..................................

More information

Visualizing the World

Visualizing the World Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing

More information

Introduction to ggvis. Aimee Gott R Consultant

Introduction to ggvis. Aimee Gott R Consultant Introduction to ggvis Overview Recap of the basics of ggplot2 Getting started with ggvis The %>% operator Changing aesthetics Layers Interactivity Resources for the Workshop R (version 3.1.2) RStudio ggvis

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Intoduction to data analysis with R

Intoduction to data analysis with R 1/66 Intoduction to data analysis with R Mark Johnson Macquarie University Sydney, Australia September 17, 2014 2/66 Outline Goals for today: calculate summary statistics for data construct several kinds

More information

Package lvplot. August 29, 2016

Package lvplot. August 29, 2016 Version 0.2.0 Title Letter Value 'Boxplots' Package lvplot August 29, 2016 Implements the letter value 'boxplot' which extends the standard 'boxplot' to deal with both larger and smaller number of data

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

Data visualization in Python

Data visualization in Python Data visualization in Python Martijn Tennekes THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Outline Overview data visualization in Python ggplot2 tmap tabplot 2 Which

More information

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 Who s ahead in the polls? 2/86 What values are displayed in this chart? 3/86

More information

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

An introduction to R Graphics 4. ggplot2

An introduction to R Graphics 4. ggplot2 An introduction to R Graphics 4. ggplot2 Michael Friendly SCS Short Course March, 2017 http://www.datavis.ca/courses/rgraphics/ Resources: Books Hadley Wickham, ggplot2: Elegant graphics for data analysis,

More information

Publication-quality figures with Inkscape

Publication-quality figures with Inkscape Publication-quality figures with Inkscape In Lab 3 we briefly learnt about the different formats available to save the plots we create in R and how to modify them in PowerPoint and Adobe Illustrator. Today

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Data Handling: Import, Cleaning and Visualisation

Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline

More information

Package ggsubplot. February 15, 2013

Package ggsubplot. February 15, 2013 Package ggsubplot February 15, 2013 Maintainer Garrett Grolemund License GPL Title Explore complex data by embedding subplots within plots. LazyData true Type Package Author Garrett

More information

<style> pre { overflow-x: auto; } pre code { word-wrap: normal; white-space: pre; } </style>

<style> pre { overflow-x: auto; } pre code { word-wrap: normal; white-space: pre; } </style> --- title: "Visualization for Data Management Modules Wheat CAP 2018" author: name: "Jean-Luc Jannink" affiliation: "USDA-ARS" date: "June 7, 2018" output: html_document: fig_height: 6 fig_width: 12 highlight:

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Aug, 2017 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

ggplot in 3 easy steps (maybe 2 easy steps)

ggplot in 3 easy steps (maybe 2 easy steps) 1 ggplot in 3 easy steps (maybe 2 easy steps) 1.1 aesthetic: what you want to graph (e.g. x, y, z). 1.2 geom: how you want to graph it. 1.3 options: optional titles, themes, etc. 2 Background R has a number

More information

PRACTICUM, day 1: R graphing: basic plotting and ggplot2 CRG Bioinformatics Unit, May 6th, 2016

PRACTICUM, day 1: R graphing: basic plotting and ggplot2 CRG Bioinformatics Unit, May 6th, 2016 PRACTICUM, day 1: R graphing: basic plotting and ggplot2 CRG Bioinformatics Unit, sarah.bonnin@crg.eu May 6th, 216 Contents Introduction 2 Packages................................................... 2

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

Introduction to Plot.ly: Customizing a Stacked Bar Chart

Introduction to Plot.ly: Customizing a Stacked Bar Chart Introduction to Plot.ly: Customizing a Stacked Bar Chart Plot.ly is a free web data visualization tool that allows you to download and embed your charts on other websites. This tutorial will show you the

More information

1 Introduction to Using Excel Spreadsheets

1 Introduction to Using Excel Spreadsheets Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)

More information

Homework 1 Excel Basics

Homework 1 Excel Basics Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the

More information

Exploratory data analysis

Exploratory data analysis Lecture 4 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents Exploratory data analysis Exploratory data analysis What is exploratory data analysis (EDA) In this lecture we discuss how

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

0 Graphical Analysis Use of Excel

0 Graphical Analysis Use of Excel Lab 0 Graphical Analysis Use of Excel What You Need To Know: This lab is to familiarize you with the graphing ability of excels. You will be plotting data set, curve fitting and using error bars on the

More information

Data Visualization. Andrew Jaffe Instructor

Data Visualization. Andrew Jaffe Instructor Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data

More information

Financial Econometrics Practical

Financial Econometrics Practical Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction 1 1.0.1 Install ggplot2................................................. 2 1.1 Get data Tidy.....................................................

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

= 3 + (5*4) + (1/2)*(4/2)^2.

= 3 + (5*4) + (1/2)*(4/2)^2. Physics 100 Lab 1: Use of a Spreadsheet to Analyze Data by Kenneth Hahn and Michael Goggin In this lab you will learn how to enter data into a spreadsheet and to manipulate the data in meaningful ways.

More information

Package arphit. March 28, 2019

Package arphit. March 28, 2019 Type Package Title RBA-style R Plots Version 0.3.1 Author Angus Moore Package arphit March 28, 2019 Maintainer Angus Moore Easily create RBA-style graphs

More information

PRESENTING DATA. Overview. Some basic things to remember

PRESENTING DATA. Overview. Some basic things to remember PRESENTING DATA This handout is one of a series that accompanies An Adventure in Statistics: The Reality Enigma by me, Andy Field. These handouts are offered for free (although I hope you will buy the

More information

Session 5 Nick Hathaway;

Session 5 Nick Hathaway; Session 5 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Adding Text To Plots 1 Line graph................................................. 1 Bar graph..................................................

More information

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013 Properties of Data Digging into Data: Jordan Boyd-Graber University of Maryland February 11, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Properties of Data February 11, 2013 1 / 43 Roadmap Munging

More information

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): Graphing on Excel Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): The first step is to organize your data in columns. Suppose you obtain

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

Working with Charts Stratum.Viewer 6

Working with Charts Stratum.Viewer 6 Working with Charts Stratum.Viewer 6 Getting Started Tasks Additional Information Access to Charts Introduction to Charts Overview of Chart Types Quick Start - Adding a Chart to a View Create a Chart with

More information

Introductory Tutorial: Part 1 Describing Data

Introductory Tutorial: Part 1 Describing Data Introductory Tutorial: Part 1 Describing Data Introduction Welcome to this R-Instat introductory tutorial. R-Instat is a free, menu driven statistics software powered by R. It is designed to exploit the

More information

DATA VISUALIZATION WITH GGPLOT2. Coordinates

DATA VISUALIZATION WITH GGPLOT2. Coordinates DATA VISUALIZATION WITH GGPLOT2 Coordinates Coordinates Layer Controls plot dimensions coord_ coord_cartesian() Zooming in scale_x_continuous(limits =...) xlim() coord_cartesian(xlim =...) Original Plot

More information

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created Graphics in R There are three plotting systems in R base Convenient, but hard to adjust after the plot is created lattice Good for creating conditioning plot ggplot2 Powerful and flexible, many tunable

More information

CSC 1315! Data Science

CSC 1315! Data Science CSC 1315! Data Science Data Visualization Based on: Python for Data Analysis: http://hamelg.blogspot.com/2015/ Learning IPython for Interactive Computation and Visualization by C. Rossant Plotting with

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Hadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer

Hadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer Hadley Wickham ggplot2 Elegant Graphics for Data Analysis July 26, 2016 Springer To my parents, Alison & Brian Wickham. Without them, and their unconditional love and support, none of this would have

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

Mixed models in R using the lme4 package Part 2: Lattice graphics

Mixed models in R using the lme4 package Part 2: Lattice graphics Mixed models in R using the lme4 package Part 2: Lattice graphics Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Lausanne July 1,

More information

ENV Laboratory 2: Graphing

ENV Laboratory 2: Graphing Name: Date: Introduction It is often said that a picture is worth 1,000 words, or for scientists we might rephrase it to say that a graph is worth 1,000 words. Graphs are most often used to express data

More information

Introduction To Inkscape Creating Custom Graphics For Websites, Displays & Lessons

Introduction To Inkscape Creating Custom Graphics For Websites, Displays & Lessons Introduction To Inkscape Creating Custom Graphics For Websites, Displays & Lessons The Inkscape Program Inkscape is a free, but very powerful vector graphics program. Available for all computer formats

More information

LAB 2: DATA FILTERING AND NOISE REDUCTION

LAB 2: DATA FILTERING AND NOISE REDUCTION NAME: LAB TIME: LAB 2: DATA FILTERING AND NOISE REDUCTION In this exercise, you will use Microsoft Excel to generate several synthetic data sets based on a simplified model of daily high temperatures in

More information

JUST THE MATHS UNIT NUMBER STATISTICS 1 (The presentation of data) A.J.Hobson

JUST THE MATHS UNIT NUMBER STATISTICS 1 (The presentation of data) A.J.Hobson JUST THE MATHS UNIT NUMBER 18.1 STATISTICS 1 (The presentation of data) by A.J.Hobson 18.1.1 Introduction 18.1.2 The tabulation of data 18.1.3 The graphical representation of data 18.1.4 Exercises 18.1.5

More information

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018 Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP

More information