Importing and visualizing data in R. Day 3

Similar documents
An Introduction to R Graphics

Facets and Continuous graphs

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Introduction to Graphics with ggplot2

ggplot in 3 easy steps (maybe 2 easy steps)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Getting started with ggplot2

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

03 - Intro to graphics (with ggplot2)

The diamonds dataset Visualizing data in R with ggplot2

Stat 849: Plotting responses and covariates

Data Import and Formatting

ggplot2 for beginners Maria Novosolov 1 December, 2014

Instructions and Result Summary

Data Visualization. Andrew Jaffe Instructor

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

Package ggextra. April 4, 2018

Data Science and Machine Learning Essentials

Intro to R for Epidemiologists

Stat 849: Plotting responses and covariates

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

data visualization Show the Data Snow Month skimming deep waters

Statistical Software Camp: Introduction to R

Introduction to R Reading, writing and exploring data

Introduction to Data Visualization

Econ 2148, spring 2019 Data visualization

An introduction to ggplot: An implementation of the grammar of graphics in R

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

Statistical transformations

Visualizing Data: Customization with ggplot2

Introduction to R and the tidyverse. Paolo Crosetto

Practical 2: Plotting

An Introduction to R 2.2 Statistical graphics

Data Science and Machine Learning Essentials

DATA VISUALIZATION WITH GGPLOT2. Coordinates

Introduction to Minitab 1

Data Visualization in R

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?

Exploratory Data Analysis September 6, 2005

Using Built-in Plotting Functions

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL

File Input/Output in Python. October 9, 2017

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

You submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of

IST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

Creating elegant graphics in R with ggplot2

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

IMPORTING DATA IN R. Introduction read.csv

Plotting with ggplot2: Part 2. Biostatistics

Advanced tabular data processing with pandas. Day 2

Introduction to R. Andy Grogan-Kaylor October 22, Contents

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

Data Visualization in R

plot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");

Introduction to R Forecasting Techniques

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Table of Contents. Preface... ix

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

R Workshop 1: Introduction to R

Input/Output Data Frames

INTRODUCTION TO R. Basic Graphics

Lecture 4: Data Visualization I

Data visualization with ggplot2

Introduction to R and R-Studio In-Class Lab Activity The 1970 Draft Lottery

Basics of Plotting Data

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Graphical critique & theory. Hadley Wickham

Step-by-step user instructions to the hamlet-package

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

Introduction to R (BaRC Hot Topics)

IMPORTING DATA INTO R. Introduction Flat Files

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12

Package Rwinsteps. February 19, 2015

LAST UPDATED: October 16, 2012 DISTRIBUTIONS PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Introduction to ggvis. Aimee Gott R Consultant

BGGN 213 Working with R packages Barry Grant

CSC 1315! Data Science

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Statistics 251: Statistical Methods

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

S4C03, HW2: model formulas, contrasts, ggplot2, and basic GLMs

Data Analyst Nanodegree Syllabus

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Data Visualization. Module 7

Multi-Axis Tabular Loads in ANSYS Workbench

Data Science and Machine Learning Essentials

Stat 290: Lab 2. Introduction to R/S-Plus

Demo yeast mutant analysis

Introduction to R: Day 2 September 20, 2017

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

User manual forggsubplot

Introduction to Nesstar

Transcription:

Importing and visualizing data in R Day 3

R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation (e.g., getting, setting data) Data output

Reading delimited files Most general: read.table( filename, header=f, sep="" ) You must specify filename, whether to expect a header (T/F), and what the separator is Tab: "\t" Comma: "," Space: " " Preset defaults (header=t, format-specific delims) For TSV: read.delim( filename,... ) For CSV: read.csv( filename,... ) Docs: https://stat.ethz.ch/r-manual/r-devel/library/utils/html/read.table.html

Rstudio can automate this part for you Load up Rstudio Select Tools à Import Dataset à From Local File. Point and click to select settings (column separator, row names, heading, etc). Click import, and Rstudio translates your settings into an R read command.

Writing out data frames Load in cereal table Syntax is different than pandas Let's make a new column, calories_per_cup ( = # calories per cup of cereal) df$new_column = value Equivalently: df['new_column'] = value To write out data frame as delimited file: write.table(df, filename, sep="\t", row.names=f) write.csv( )

Two major plotting options in R Base graphics (built-in to R) Prep your data ahead of time (e.g., summarize cereals by manufacturer) Data doesn't need to be in data.frame Run a command, make a plot Run another command, add something to that plot ggplot2 (http://docs.ggplot2.org/current/) Have all data points in a data.frame, one per line Implements the grammar of graphics separates data from plot with a series of abstractions Upshot: it's easy quickly change aspects of the plot (e.g., scatter to histogram) Great for exploratory plotting; final tweaks can be painful.

Base graphics scatter plot Let's make x ~ N(0,1) and y = 2x + e, e ~ N(0, 0.1) x=rnorm(100,0,1) e=rnorm(100,0,0.1) y=2*x+e Now, plot scatterplot of y vs x plot(x,y) Tweak settings: type="p", "l" main=" ", xlab=" ", ylab=" " cex=... (point magnification, normal=1) col=... (point color) pch=... (point type) lwd=... (line width)

Base graphics bar chart Make a bar plot For instance, we have raw data from a poll Questions: Are you a choosy mom or dad? Did you choose Jif? Have 100 responses, T/F to each question But, what we want is % chose jif is choosy With base plots, first make the summary: choosy_sums = table( choosy_data ) Then, barplot barplot(choosy_sums)

ggplot2 Data frame with unsummarized data, one point per row geom = how to project those data onto a plot aes(thetics) = how to map data variables to x, y, color, point size, fill Transformations to bin, smooth, or scale data for display http://varianceexplained.org/r/why-i-use-ggplot2/

Opinions vary on which is better http://varianceexplained.org/r/why-i-use-ggplot2/

ggplot2 syntax make a simple scatterplot library(ggplot2) First, import the library dfxy = data.frame(xvals=x, yvals=y) Make a new dataframe from our x,y g = ggplot(data = dfxy) g = g + geom_point( aes(x=xvals,y=yvals) ) g Make a new ggplot using these data Add a geom and map xvals to the x axis and yvals to the y axis Show the plot!

Slightly more interesting dataset library(ggplot2) First, import the library library(reshape2) This will load a dataframe called tips head(tips) Check it out g = ggplot(data = tips) g = g + geom_point( aes(x=total_bill,y=tip) ) g What if we wanted a histogram instead?

Slightly more interesting dataset library(ggplot2) First, import the library library(reshape2) This will load a dataframe called tips head(tips) Check it out gbase = ggplot(data = tips) g = gbase + geom_point( aes(x=total_bill,y=tip) ) g Can we color points by sex?

Mapping with aes library(ggplot2) library(reshape2) head(tips) First, import the library This will load a dataframe called tips Check it out gbase = ggplot(data = tips) g = gbase + geom_point( aes(x=total_bill,y=tip, colour=sex) ) g How about if we want a histogram instead?

Change to a histogram library(ggplot2) library(reshape2) head(tips) First, import the library This will load a dataframe called tips Check it out gbase = ggplot(data = tips) g2 = gbase + geom_histogram( aes(x=total_bill) ) g2 What happens here if you map sex to the aesthetic "colour"? Or "fill"?

Faceting for exploratory plotting library(ggplot2) library(reshape2) What if we wanted to know whether men or women are stingier tippers? head(tips) gbase = ggplot(data = tips) g = gbase + geom_point( aes(x=total_bill,y=tip) ) g3 = g + facet_grid( sex ~.) g3

Faceting for exploratory plotting library(ggplot2) library(reshape2) head(tips) What if we wanted to know whether men or women are stingier tippers? Does meal time matter? gbase = ggplot(data = tips) g = gbase + geom_point( aes(x=total_bill,y=tip) ) g3 = g + facet_grid( sex ~ time ) g3

Layering multiple geoms on one plot library(ggplot2) library(reshape2) head(tips) gbase = ggplot(data = tips) g = gbase + geom_point( aes(x=total_bill,y=tip) ) g4 = g + geom_smooth( aes(x=total_bill,y=tip), method='lm' ) g5 = g4 + facet_grid( sex ~ time ) g5