Intro to R for Epidemiologists

Similar documents
An Introduction to R Graphics

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

DATA VISUALIZATION WITH GGPLOT2. Coordinates

DATA VISUALIZATION WITH GGPLOT2. Grid Graphics

Facets and Continuous graphs

Install RStudio from - use the standard installation.

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

Package ggdark. R topics documented: January 11, Type Package Title Dark Mode for 'ggplot2' Themes Version Author Neal Grantham

Getting started with ggplot2

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Data Visualization in R

R Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1

Package reghelper. April 8, 2017

Introduction to R for Epidemiologists

Visualizing Data: Customization with ggplot2

Introduction to ggvis. Aimee Gott R Consultant

STAT 1291: Data Science

Data Visualization in R

Chuck Cartledge, PhD. 20 January 2018

Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2

1 The ggplot2 workflow

Plotting with ggplot2: Part 2. Biostatistics

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Advanced Graphics in R

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

Graphing Bivariate Relationships

Creating elegant graphics in R with ggplot2

Introduction to R and Statistical Data Analysis

Machine Learning: Algorithms and Applications Mockup Examination

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Introduction to Artificial Intelligence

Lecture 4: Data Visualization I

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

The diamonds dataset Visualizing data in R with ggplot2

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

Statistical transformations

Creating publication-ready Word tables in R

data visualization Show the Data Snow Month skimming deep waters

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules

Data visualization with ggplot2

Linear discriminant analysis and logistic

LaTeX packages for R and Advanced knitr

Stat 849: Plotting responses and covariates

DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN. Graphics. Compact R for the DANTRIP team. Klaus K. Holst

Advanced Statistics 1. Lab 11 - Charts for three or more variables. Systems modelling and data analysis 2016/2017

grammar statistical graphics Building a in Clojure Kevin Lynagh 2012 November 9 Øredev Keming Labs Malmö, Sweden

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Data analysis case study using R for readily available data set using any one machine learning Algorithm

03 - Intro to graphics (with ggplot2)

Stat 849: Plotting responses and covariates

Manuel Oviedo de la Fuente and Manuel Febrero Bande

Experimental Design + k- Nearest Neighbors

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

Data Visualization. Module 7

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

The Basics of Plotting in R

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

Importing and visualizing data in R. Day 3

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

plotly The Johns Hopkins Data Science Lab February 28, 2017

Ben Baumer Instructor

Regression Models Course Project Vincent MARIN 28 juillet 2016

Making Data Speak USING R SOFTWARE. 1 November 2017

Introduction to R Software

Clojure & Incanter. Introduction to Datasets & Charts. Data Sorcery with. David Edgar Liebke

Boxplot

Package cowplot. March 6, 2016

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

The Beauty of ggplot2 Jihui Lee February 23, 2017

Introduction to ggplot2 Graphics

Lab #7 - More on Regression in R Econ 224 September 18th, 2018

Package dotwhisker. R topics documented: June 28, Type Package

Introduction to Graphics with ggplot2

Basic Concepts Weka Workbench and its terminology

Introduction to R: Day 2 September 20, 2017

Introduction to Statistical Graphics Procedures

ggplot2 for beginners Maria Novosolov 1 December, 2014

Using Built-in Plotting Functions

Visual Analytics. Visualizing multivariate data:

Input: Concepts, Instances, Attributes

MULTIVARIATE ANALYSIS USING R

Visualizing high-dimensional data:

Econ 2148, spring 2019 Data visualization

Decision Trees In Weka,Data Formats

LESSON 14: Box plots questions

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data Mining: Exploring Data. Lecture Notes for Chapter 3

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton

Lab 10 Regression IV

A Data Explorer System and Rulesets of Table Functions

Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

USE IBM IN-DATABASE ANALYTICS WITH R

Package harrypotter. September 3, 2018

Combo Charts. Chapter 145. Introduction. Data Structure. Procedure Options

Mixed models in R using the lme4 package Part 2: Lattice graphics

1 Simple Linear Regression

Transcription:

Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-7 models) from the 197 Motor Trend magazine. Look at the help file for the mtcars dataset (?mtcars). The variables that will be used are as follows: mpg (miles/gallon), cyl (number of cylinders), and wt (weight lb/1000). Using either qplot() or ggplot(), create a scatterplot of miles per gallon on weight colored by the number of cylinders. Add separate fitted lines to the plot corresponding to the number of cylinders. Note: you will need to convert the variable cyl to a factor. #Load data data(mtcars) # Separate regressions of mpg on weight for each number of cylinders mtcars <- mutate(mtcars, cyl = factor(cyl)) ggplot(data = mtcars, aes(x = wt, y = mpg, colour = cyl)) + # Add scatterplot points geom_point() + # Add linear regression line geom_smooth(method = "lm") + xlab("weight") + ylab("miles per Gallon") + ggtitle("regression of MPG on Weight") 1

35 Regression of MPG on Weight 30 Miles per Gallon 25 20 15 cyl 6 8 10 2 3 5 Weight # qplot(wt, mpg, data = mtcars, geom = c("point", "smooth"), # method = "lm", color = cyl, # main = "Regression of MPG on Weight", # xlab = "Weight", ylab = "Miles per Gallon") 2

Part 2. Scatterplot of Sepal.Length and Petal.Length With the iris dataset in R, create a scatterplot of petal length vs. sepal length separately for each species using either qplot() or ggplot(). Use different colors for each species and let the size of each point denote petal width. Set alpha=0.7 to reduce the effects of overplotting. Note: you may need to change the argument scales in the facet_wrap function to allow the x-axis to vary between plots (see?facet_wrap). data(iris) ggplot(iris, aes(sepal.length, Petal.Length, color =, size = Petal.Width)) + # Add points for scatterplot geom_point(alpha = I(0.7)) + ggtitle("petal Length vs. Sepal Length") + xlab("sepal Length") + ylab("petal Length") + # Use facet_wrap to create separate plots for species facet_wrap(~, scales = "free_x") Petal Length vs. Sepal Length setosa versicolor virginica Petal Length 6 2 setosa versicolor virginica Petal.Width 0.5 1.0 1.5 2.0 2.5.5 5.0 5.5 5.0 5.5 6.0 6.5 7.0 5 6 7 8 Sepal Length # Note: by setting the alpha of each point to 0.7, we reduce the effects of overplotting. # qplot(sepal.length, Petal.Length, data = iris, color =, # size = Petal.Width, alpha = I(0.7), xlab = "Sepal Length", # ylab = "Petal Length", main = "Petal Length vs. Sepal Length") + # facet_grid( ~, scales = "free_x") 3

Part 3. Confidence Intervals The dataset OR\_df.RData contains the odds ratios and corresponding confidence intervals from Lab 8. Use the OR\_df data frame to create the plot below displaying the odds ratios and corresponding confidence intervals. # Read in data load("or_df.rdata") #plot ggplot(or_df, aes(x = Variable, y = OR, color = Variable)) + #Add points for ORs geom_point(size = 3, shape = 20) + #Add error bars geom_errorbar(aes(ymin = LB, ymax = UB), width = 0.3) + #Add main and axes titles ggtitle("associations between covariates and diabetes") + ylab("odds ratio") + xlab("covariates") 1.10 Associations between covariates and diabetes Odds ratio 1.05 1.00 Variable age chol hdl height weight 0.95 age chol hdl height weight Covariates # qplot(variable, OR, data = OR_df, color = Variable, geom = c("point", # "errorbar"), ymin = LB, ymax = UB, width = 0.3, # ylab = "Odds ratio", xlab = "Covariates", # main = "Associations between covariates and diabetes")

Part. Boxplot of sepal width by species Use the iris dataset in R to create notched boxplots of sepal width by species. Make each box a different color. To color the interior of the boxplots, specify the fill argument instead of colour. Note: to create notched boxplots, you may want to consult the help page for geom_boxplot (?geom_boxplot). (fill) ggplot(iris, aes(, Sepal.Width, fill = )) + #Add notched boxplot geom_boxplot(notch = TRUE) + ggtitle("sepal Width by Iris ") + xlab("") + ylab("sepal Width").5 Sepal Width by Iris.0 Sepal Width 3.5 3.0 setosa versicolor virginica 2.5 2.0 setosa versicolor virginica # # qplot(, Sepal.Width, data = iris, geom = "boxplot", notch = T, # fill =, xlab = "", ylab = "Sepal Width", # main = "Sepal Width by Iris ") 5