Plotting with Rcell (Version 1.2-5)

Similar documents
Statistical transformations

Introduction to Graphics with ggplot2

An introduction to ggplot: An implementation of the grammar of graphics in R

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

Econ 2148, spring 2019 Data visualization

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

The diamonds dataset Visualizing data in R with ggplot2

Getting started with ggplot2

Lecture 4: Data Visualization I

ggplot2: elegant graphics for data analysis

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

Facets and Continuous graphs

User manual forggsubplot

Data Visualization in R

Introduction to R and the tidyverse. Paolo Crosetto

Data Visualization in R

Statistics Lecture 6. Looking at data one variable

Data Visualization. Module 7

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Graphical critique & theory. Hadley Wickham

Introduction to Data Visualization

Package ggsubplot. February 15, 2013

1 The ggplot2 workflow

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two:

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

ggplot2 for beginners Maria Novosolov 1 December, 2014

Visualizing Data: Customization with ggplot2

An Introduction to R Graphics

PRESENTING DATA. Overview. Some basic things to remember

Visualizing the World

Install RStudio from - use the standard installation.

Creating an Autocorrelation Plot in ggplot2

03 - Intro to graphics (with ggplot2)

You submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Statistical Graphs & Charts

Package lvplot. August 29, 2016

Creating elegant graphics in R with ggplot2

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created

Advanced Multivariate Continuous Displays and Diagnostics

Introduction to ggvis. Aimee Gott R Consultant

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Page 1. Graphical and Numerical Statistics

AN INTRODUCTION TO LATTICE GRAPHICS IN R

Package beanplot. R topics documented: February 19, Type Package

Individual Covariates

Data visualization with ggplot2

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

STAT 1291: Data Science Lecture 4 - Data Visualization: Composing/dissecting Data Graphics Sungkyu Jung

Mixed models in R using the lme4 package Part 2: Lattice graphics

An introduction to R Graphics 4. ggplot2

Package plotluck. November 13, 2016

Visualizing ASH. John Beresniewicz NoCOUG 2018

Computing Fundamentals Plotting

Notes for week 3. Ben Bolker September 26, Linear models: review

Intro to R for Epidemiologists

Plotting - Practice session

Data Handling: Import, Cleaning and Visualisation

Math 227 EXCEL / MEGASTAT Guide

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

CS1100: Computer Science and Its Applications. Creating Graphs and Charts in Excel

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates

Using the DATAMINE Program

1 Introduction to Using Excel Spreadsheets

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

DATA VISUALIZATION WITH GGPLOT2. Coordinates

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Introduction to Lattice Graphics. Richard Pugh 4 th December 2012

Name Date Types of Graphs and Creating Graphs Notes

S4C03, HW2: model formulas, contrasts, ggplot2, and basic GLMs

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

CREATING THE DISTRIBUTION ANALYSIS

Scientific Graphing in Excel 2007

Package flowchic. R topics documented: October 16, Encoding UTF-8 Type Package

Extending ODS Output by Incorporating

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

RSG online program documentation

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Data Visualization. Andrew Jaffe Instructor

Solution to Tumor growth in mice

MOVING FROM CELL TO CELL

Bar Charts and Frequency Distributions

Regression III: Advanced Methods

Applied Regression Modeling: A Business Approach

Az R adatelemzési nyelv

Making R Graphs, For People Who Don t Want To Learn R

Learning Objectives for Data Concept and Visualization

Package ggiraphextra

Transcription:

Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson (0) of a grammar of graphics, i.e. a formal description of statistical graphics. I recommend to read the book on the ggplot2 package. If you are already familiar with this package you can skip this document and read the help on cplot. Rcell plotting functions return objects of class ggplot, and therefore can be combined with other functions of the ggplot2 package in a transparent manner. In this document I will present and explain different plots that can be made with the Rcell plotting functions. If you haven t done so, read the Getting Started with Rcell vignette before proceeding. > vignette('rcell') 2 A Grammar of Graphics There exist an enormous variety of statistical plots, as a few example we can mention scatter plots, bar plots, histograms, piecharts, box plots, etc. The grammar of graphics attempts to describe all these different plots in a simple, coherent language. It defines the following elements to describe a plot: data: the dataset to be plotted geom: the geometrical object used to represent the data (e.g. points, bars, lines, etc) aesthetic mapping: defines what variable of the dataset will be mapped to each aesthetic attribute of the geometrical object stat: statistical transformations to be performed before plotting (e.g. calculate mean, bin the data for a histogram, etc) position: minnor position adjustments of the geoms scale: defines the scale for the mapping between a variable and a aesthetic attribute (e.g. lineal, log) coordinate: the coordinate system to be used. This is normally cartesian coordinates, but can also be polar coordinates for example. faceting: describes how the plot should be divided into a series of subplots. ggplot2 implements a layered grammar of graphics, meaning that several layers containing the mentioned elements can be added to a plot. In the following sections I will try to make this clear by a series of explained examples. 1

3 Scatter Plots If you haven t done so, load the Rcell package and the example dataset with > library(rcell) > data(acl394filtered) You can create a scatter plot of YFP vs CFP total fluorescence (Figure 1 left) by typing > cplot(x, x=, y=f.tot.c, subset=t.frame==13) The first argument of the cplot function is the cell.data object, i.e. the data element of the plot. The second and third argument specify the aesthetic mapping; variable of the dataset should be mapped to the x position aesthetic, and f.tot.c to the y position aesthetic (the subset argument just indicates we don t want to plot the entire dataset, just some registers). As you can see not all elements of the grammar where specified. The ones that are not are set to default values. For example geom is set to "point", meaning that a point is going to be used to represent the data. The aesthetic attributes available depend on the selected geom. For example a point geom has x, y, size, shape, color, fill and alpha (transparency). To see all aesthetic attributes and default values of the available geom go to http://had.co.nz/ggplot2. There are many attributes that can be mapped to variables, therefore increasing the amount of information a plot contains. You can specify the aesthetic mapping of these attributes as we did before. > cplot(x, x=, y=f.tot.c, size=a.tot, color=factor(af.nm), + shape=factor(af.nm), alpha=0., subset=t.frame==13) We have mapped the size aesthetic to the a.tot variable, and the color and shape aesthetics to the AF.nM variable (Figure 1 right). This mapping redundancy can be useful to present the data to color blind persons or if the plot is going to be presented in a black and white device (e.g. paper). The use of the factor function for AF.nM forces the use of discrete scales. Note that the alpha (transparency) aesthetic has been mapped to a constat value and not a variable. Making objects semi-transparents can reduce over-plotting. 4 Time Courses and Over-plotting A time course plot is basically a scatter plot in which the x axis is time. To get a time course of YFP fluorescence (Figure 2 left) type in > cplot(x, x=t.frame, y=, subset=af.nm==) The same plot can be obtained using a formula notation of the form y~x as the second argument in the call. When using this notation you can use the generic function plot instead of cplot. The next two lines result in the same plot (Figure 2 left). > cplot(x, ~t.frame, subset= AF.nM==) > plot(x, ~t.frame, subset= AF.nM==) You can see that the t.frame variable takes discrete levels, thus causing overplotting; too many points fall in the same region and you can no longer estimate the density (Figure 2 left). The plot is saturated. There are several strategies to avoid overplotting, for example one could reduce the size of the geom and use alpha blending. Another strategy is to use the position adjustment. By default this is set to "identity", meaning that the points fall where the data says so. If you change this to "jitter", the points are jittered (a small white noise is added to the x position) to avoid overplotting (Figure 2 right). > cplot(x, ~t.frame, size=1, alpha=0.3, position="jitter", subset=af.nm==) 2

1e+06 0e+00 6e+06 9e+06 f.tot.c 1e+06 0e+00 6e+06 9e+06 f.tot.c a.tot 20 00 70 00 factor(af.nm) 1.2 2. Figure 1: Left: Scatter plot of total CFP (f.tot.c) vs YFP () fluorescence. Right: same plot with shape and color mapped to treatment (AF.nM), and size mapped to cell area (a.tot). Figure 2: Left: Time course of YFP fluorescence () for the nm dose. Right: same plot with smaller, semitransparent and jittered points to reduce over-plotting. 3

Statistical Transformations Another way to avoid overplotting is by using other statistical plots, for example a box and whisker plot (Figure 3 left). > cplot(x, ~t.frame, group=t.frame, geom="boxplot", subset=af.nm==) Setting the geom to "boxplot" creates the desired plot. If you think the steps required to produce this plot, you will realize that some preprocessing of the data has to occur before the plot is actually done. For instance the 2%, 0% and 7% quantiles have to be calculated for each group (defined by the group argument). This means that a statistical transformation (stat) of the data has to be done. The stat that does this calculation is called "boxplot". When we selected the "boxplot" geom, the stat was set by default to "boxplot". Every geom has a default stat. For example geom "point" has the default stat "identity" that does no transformation to the data, exactly what we need to do scatter plots. Another way to show this data is to plot the mean and a confidence interval for the mean, for each time. We can do this selecting stat "summary" and specifying which function is to be used to calculate the summary statistics by the fun.data argument. We have to use several geoms to show all this information. > cplot(x, ~t.frame, subset= AF.nM ==, stat="summary", fun.data="mean_cl_normal", + geom=c("point","errorbar","line")) Because plotting the mean is a fairly common thing to do, Rcell provides the convenience function cplotmean that sets stat to "summary" and fun.data to "mean_cl_normal". The same plot (Figure 3 right) can be done with > cplotmean(x, ~t.frame, subset=af.nm==) Note that the plotting range used for this plot is the data range. You probably want to zoom-in the y axis to show only the range of the means. To do this you can use the yzoom argument, for example yzoom=c(0,6e6). The ylim argument is also available, but it filters the data BEFORE the statistical transformation, thus the calculated mean will depended on the plotting range! A warning is issued if you use ylim in a call to cplotmean. 6 Adding Layers to a Plot Many times plots with several layers are required. For example we may want to add a layer with the mean and confidence interval for the mean, to a scatter plot. To create such a multilayer plot we first create a plot and, instead of displaying it, we assign it to a variable 1. > p <- cplot(x, ~t.frame, size=1, alpha=0.3, position="jitter", subset=af.nm==) We can then add a layer with the mean using the clayermean function (Figure 4 left). > p + clayermean(color="red") When used with no cell.data or mapping arguments, the added layer uses the same data and aesthetic mapping as the plot to which it is being added. You can specify the layer to use another dataset or aesthetic mapping by passing arguments to clayermean. The default geom for the layer can also be changed, for example to the "smooth" geom (Figure 4 right). > p + clayermean(geom="smooth") Only layers can be added to plots, you can t add plots to plots nor layers to layers. Create the layers with clayer and clayermean. 1 The plotting functions return a object of class ggplot 4

6e+06 9e+06 6e+06 4e+06 0e+00 0 t.frame 0 t.frame Figure 3: Left: box-whisker plot for the time course data of. intervals of the mean for the same data. Right: means and 9% confidence Figure 4: Left: means and confidence intervals layer added to the scatter plot. Right: Using the smooth geom to represent the mean and CI.

6e+06 1 8 1 e+06 4e+06 1e+06 0e+00 6e+06 22 29 e+06 4e+06 1e+06 0e+00 0 0 t.frame Figure : YFP vs time, facetted by pos (one position of each dose was selected). 7 Faceting Plots Faceting creates many subplots showing different subsets of the data. You can facet a plot by providing a facets argument. This argument should be a formula. If only the right term is provided (~myvar) then a one dimensional strip of subplots is created and then wrapped into 2D to save space. An example is shown in Figure. If you provide both terms to the facets formula, a grid of subplots is created. > cplot(x, ~t.frame, facets=~pos, geom="smooth", method="loess", + subset=pos%in%c(1,8,1,22,29), yzoom=c(0,6e6)) 8 Histograms Histograms are automatically created if you specify the x, but not the y aesthetic to cplot (Figure 6 left). This sets stat to "bin" and geom to "bar". In this example the binwidth is set to 4e, but a default of the range of the data divided by 30 is used if this argument is omitted. > cplot(x, ~, subset=t.frame==13, binwidth=4e) As with other plots, you can map other aesthetics to variables. In geom "bar" the color aesthetic refers to the border of the bar. The fill aesthetic specifies the filling color (Figure 6 right). 6

40 40 30 30 factor(af.nm) 1.2 2. 0 0 0.0e+00 3.0e+06 6.0e+06 9.0e+06 1.2e+07 0.0e+00 3.0e+06 6.0e+06 9.0e+06 1.2e+07 Figure 6: Left: histogram for total YFP fluorescence. Right: Same histogram, with fill aesthetic mapped to pheromone dose (AF.nM) > cplot(x, ~, subset=t.frame==13, fill=factor(af.nm), binwidth=4e) Note that we transformed the numerical variable AF.nM to a factor. This was done because the color scale applied to factors makes it easier to differentiate between levels of the variable. You can see that the title of the legend contains is "factor(af.nm)". To avoid the inclusion of factor in this legend you can use the as.factor argument (see next example). When specifying the filling color, the position adjustment is set to "stack", which means that bars of different colors are stacked one over the other (the outline of the histogram doesn t change). This behavior can be changed by using other position adjustments, for example "dodge" and "fill" are shown in Figure 7, left and right respectively. If you use position "identity" the histograms are plotted one over an other, so you will probably just see the last one to be plotted. You can avoid this by using geom "step" (Figure 8 left). If you do so you have to specify the stat to "bin" as this is not the default. > cplot(x, x=, subset=t.frame==7, color=af.nm, as.factor="af.nm", + geom="step", stat="bin", binwidth=4e) A smoothed version of this plot is obtained with a density plot, as shown in Figure 8 right. > cplot(x, x=, subset=t.frame==7, geom="density", color=factor(af.nm), + yzoom=c(0,1e-6), binwidth=4e) References Leland Wilkinson The Grammar of Graphics Statistics and Computation. Springer, 0 Hadley Wickham. ggplot2: Elegant graphics for Data Analysis Springer 09 7

1.00 1 0.7 AF.nM 1.2 2. 0.0 AF.nM 1.2 2. 0.2 0 0.00 0.0e+00 3.0e+06 6.0e+06 9.0e+06 1.2e+07 0.0e+00 3.0e+06 6.0e+06 9.0e+06 1.2e+07 Figure 7: Left: histogram for total YFP fluorescence, using position dodge. Right: Same histogram, with position fill 1e 06 8e 07 AF.nM 1.2 2. density 6e 07 4e 07 factor(af.nm) 1.2 2. 2e 07 0 0e+00 4e+06 6e+06 0e+00 0e+00 4e+06 6e+06 Figure 8: Left: histogram for total YFP fluorescence, using geom step and stat bin. histogram, with geom density Right: Same 8

Colman-Lerner, Gordon et al. (0). Regulated cell-to-cell variation in a cell-fate decision system. Nature, 437(709):699-706. Chernomoretz, Bush et al. (08). Using Cell-ID 1.4 with R for Microscope-Based Cytometry Curr Protoc Mol Biol., Chapter 14:Unit 14.18. 9