Making plots in R [things I wish someone told me when I started grad school]

Similar documents
Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

Basics of Plotting Data

IST 3108 Data Analysis and Graphics Using R Week 9

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function

Introduction to RStudio

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Using Large Data Sets Workbook Version A (MEI)

University of California, Los Angeles Department of Statistics

Combo Charts. Chapter 145. Introduction. Data Structure. Procedure Options

Error-Bar Charts from Summary Data

Stat 290: Lab 2. Introduction to R/S-Plus

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /6/ /13

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

CREATING THE DISTRIBUTION ANALYSIS

IT 403 Practice Problems (1-2) Answers

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

LAB #1: DESCRIPTIVE STATISTICS WITH R

Regression III: Advanced Methods

Az R adatelemzési nyelv

An introduction to ggplot: An implementation of the grammar of graphics in R

R Lattice Graphics. Paul Murrell

Statistical Programming with R

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Ch3 E lement Elemen ar tary Descriptive Descriptiv Statistics

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

An Introduction to Minitab Statistics 529

Lecture 6: Chapter 6 Summary

More Summer Program t-shirts

Install RStudio from - use the standard installation.

Chuck Cartledge, PhD. 20 January 2018

Chapter 2: The Normal Distribution

CHAPTER 2 Modeling Distributions of Data

Thomas Vincent Head of Data Science, Getty Images

Bar Charts and Frequency Distributions

Chapter 2: The Normal Distributions

Data Visualization in R

Chapter 6: DESCRIPTIVE STATISTICS

9 POINTS TO A GOOD LINE GRAPH

Density Curves Sections

Package gibbs.met documentation

This document is designed to get you started with using R

Gene Expression an Overview of Problems & Solutions: 1&2. Utah State University Bioinformatics: Problems and Solutions Summer 2006

Corso di Identificazione dei Modelli e Analisi dei Dati

Module 10. Data Visualization. Andrew Jaffe Instructor

A brief introduction to R

Making Science Graphs and Interpreting Data

In Minitab interface has two windows named Session window and Worksheet window.

R for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

R Bootcamp Part I (B)

Introduction to Graphics with ggplot2

Statistics 251: Statistical Methods

An Introduction to R 2.2 Statistical graphics

Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

CHAPTER 2: Describing Location in a Distribution

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

Computing With R Handout 1

Notes for week 3. Ben Bolker September 26, Linear models: review

Business Statistics: R tutorials

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

1 Introduction to Using Excel Spreadsheets

Stat 302 Statistical Software and Its Applications SAS: Distributions

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Statistics Lecture 6. Looking at data one variable

R (and S, and S-Plus, another program based on S) is an interactive, interpretive, function language.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Data Visualization. Andrew Jaffe Instructor

Graphing with Microsoft Excel

Computing With R Handout 1

Package gibbs.met. February 19, 2015

An Introduction to R Graphics

Topic 5 - Joint distributions and the CLT

Chapter 2: Looking at Multivariate Data

User manual forggsubplot

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

iplots extreme Next-generation interactive graphics for analysis of large data Simon Urbanek AT&T Labs Statistics Research

Using Built-in Plotting Functions

Introduction to scientific programming in R

Tricking it Out: Tricks to personalize and customize your graphs.

3. Prepare all your graphs, illustrations and text by cutting them to size. For straight lines use a guillotine.

Assignments. Math 338 Lab 1: Introduction to R. Atoms, Vectors and Matrices

Nature Methods: doi: /nmeth Supplementary Figure 1

CHAPTER 2 DESCRIPTIVE STATISTICS

Visualizing the World

University of California, Los Angeles Department of Statistics

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

MATH11400 Statistics Homepage

3.3 The Five-Number Summary Boxplots

Minitab on the Math OWL Computers (Windows NT)

Exploring Our Family History Web Site Frequently Asked Questions

Chapter 3 - Displaying and Summarizing Quantitative Data

Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics

STA490HY1Y. Initial Examination of Data

Monte Carlo sampling

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

Transcription:

Making plots in R [things I wish someone told me when I started grad school] Kirk Lohmueller Department of Ecology and Evolutionary Biology UCLA September 22, 2017

In honor of Talk Like a Pirate Day... What s a pirates favorite computer language?

In honor of Talk Like a Pirate Day... Rrrrrr!

In honor of Talk Like a Pirate Day... But, why?

In honor of Talk Like a Pirate Day... Because they get lost when they go to C

Learning objectives Know how to make simple plots: Scatterplot, histogram, density plot, bar plot Read large datasets into R Perform manipulations of large datasets Calculate some statistics in R Start to become familiar with simulation Begin to appreciate and enjoy Kirk s corny humor

R can simulate data from a probability distribution Let s look at the normal distribution: Matthew Freeman J Epidemiol Community Health 2006;60:6

Simulating data from a normal distribution #first, draw 1000 random values from a standard normal distribution (SD=1): s1<-rnorm(1000,mean=0,sd=1) #now do 1000 drawn from a normal distribution with SD=3. s3<-rnorm(1000,mean=0,sd=3) head(s1) [1] 0.26951848-2.43530911 1.15968499 0.09647798-0.74425935 0.40504897 head(s3) [1] 3.6718664 4.8193934-0.6078601 2.1520862 2.9089759-3.6002362

Basic histogram #plot histograms of both on same panel and save to a file: pdf(file="normal_hist.pdf", width=4,height=7); #open the file par(mfrow=c(2,1), mar=c(4, 4, 3, 2)) #sets plotting area and margins hist(s1,col=2,xlab="value",main="sigma=1") #make first hist hist(s3,col=4,xlab="value",main="sigma=3") #make second hist dev.off() #shuts off current output device quartz 2

Basic histogram Sigma=1 Frequency 0 50 100 150 200 3 2 1 0 1 2 3 Value Sigma=3 Frequency 0 50 150 250 10 5 0 5 10 Value

Getting fancier... σ=1 Frequency 0 50 100 150 200 3 2 1 0 1 2 3 Value σ=3 Frequency 0 50 150 250 10 5 0 5 10 Value

How did I do that? #plot histograms of both on same panel and save to a file: pdf(file="normal_hist.fancy.pdf", width=4,height=7); #open the file par(mfrow=c(2,1), mar=c(4, 4, 3, 2)) #sets plotting area and margins hist(s1,col=2,xlab="value",main=expression(paste(sigma,"=1"))) #make first hist hist(s3,col=4,xlab="value",main=expression(paste(sigma,"=3"))) #make second hist dev.off() #shuts off current output device pdf 2

Smooth density plot #make smooth density plot: pdf(file="normal_density.pdf", width=6,height=6); #open the file par(mfrow=c(1,1), mar=c(4, 4, 3, 2)) #sets plotting area and margins plot(density(s1),col=2,lwd=4,xlab="value",xlim=c(-15,15),main="normal distribution") lines(density(s3),col=4,lwd=4) #add the SD=3 values legend(-15,0.4,c("sigma=1","sigma=3"),lwd=4,col=c(2,4),cex=1.5) #put a legend on #we can highlight the upper 10% of each distribution with a vertical line: abline(v=quantile(s1,0.9),lty=2,lwd=3,col=2) #puts a vertical line onto the plot for s1 abline(v=quantile(s3,0.9),lty=2,lwd=3,col=4) #puts a vertical line onto the plot for s3 dev.off() quartz 2

Smooth density plot Normal distribution Density 0.0 0.1 0.2 0.3 0.4 sigma=1 sigma=3 15 10 5 0 5 10 15 Value

More on quantile #quantile take a vector of stuff, and returns the value q such that p% of your distribution is less than q. #for example, find the 75th percentile of the standard normal distribution: quantile(s1,0.75) 75% 0.5899364 #quantile with just a vector gives some interesting stuff: quantile(s1) 0% 25% 50% 75% 100% -2.97189479-0.71435745-0.05515638 0.58993639 2.87407876

Boxplot #boxplot: pdf(file="normal_boxplot.pdf", width=6,height=6); #open the file par(mfrow=c(1,1), mar=c(4, 4, 3, 2)) #sets plotting area and margins boxplot(cbind(s1,s3),names=c("sigma=1","sigma=3"),main="draws from a normal distribution",col=c(2,4)) dev.off() quartz 2

Boxplot Draws from a normal distribution 5 0 5 Sigma=1 Sigma=3

Histogram with both sets of data on same axes? Can we do it? YES WE CAN! #Let's make a histogram of these values, but putting both on the same axes. #But, we need to have the same bin widths for both datasets: bins<-seq(-10,10,by=1) hist(s1,breaks=bins)$breaks [1] -10-9 -8-7 -6-5 -4-3 -2-1 0 1 2 3 4 5 6 7 8 9 10 hist(s3,breaks=bins)$breaks [1] -10-9 -8-7 -6-5 -4-3 -2-1 0 1 2 3 4 5 6 7 8 9 10 #This looks good counts_s1<-hist(s1,breaks=bins)$counts counts_s3<-hist(s3,breaks=bins)$counts

Histogram with both sets of data on same axes? Can we do it? YES WE CAN! #now make the plot: pdf(file="normal_barplot.pdf", width=6,height=6); #open the file par(mfrow=c(1,1), mar=c(4, 4, 3, 2)) #sets plotting area and margins barplot(rbind(counts_s1,counts_s3),col=c(2,4),beside=t,names.arg= seq(-10,9.5,by=1),xlab="value",ylab="count") legend(6,350,c(expression(paste(sigma,"=3")),expression(paste(sig ma,"=6"))),col=c(2,4),lwd=4) dev.off() pdf 2

Histogram with both sets of data on same axes? Can we do it? YES WE CAN! Count 0 50 100 150 200 250 300 350 σ=3 σ=6 10 8 6 4 2 0 2 4 6 8 Value

Finding extreme values Say we want to find the % of values in a vector that are X... #We can find the % of values in s1 that are 3: mean(s13) [1] 0.001 #Only 1 of the 1000 values in s1 is 3 mean(s33) [1] 0.168 #16.8% of values in s3 are 3

Scatterplot pitfalls #Simple scatterplot: pdf(file="/users/kirk/dropbox/kirk_stuff/kel_bootcamp/scatter_small.pdf", width=10,height=10); #open the file par(mfrow=c(1,1), mar=c(4, 4, 3, 2)) #sets plotting area and margins x<-rnorm(100) y<-x+rnorm(100,sd=0.2) plot(x,y,pch=19) dev.off() quartz 2

The most annoying thing in R... 2 1 0 1 2 2 1 0 1 2 3 x y Huh? What is plotted here? My tired eyes can t read this...

One way to fix it #now, try again, make the labels bigger: #Simple scatterplot: pdf(file="/users/kirk/dropbox/kirk_stuff/kel_bootcamp/scatter_large.pdf", width=10,height=10); #open the file par(mfrow=c(1,1), mar=c(5, 5, 3, 2)) #sets plotting area and margins x<-rnorm(100) y<-x+rnorm(100,sd=0.2) plot(x,y,pch=19,cex.lab=2,cex.axis=2) dev.off() quartz 2

One way to fix it 2 1 0 1 2 2 1 0 1 2 x y 2 1 0 1 2 2 1 0 1 2 3 x y