ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

Size: px
Start display at page:

Download "ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017"

Transcription

1 ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

2 Introduction Know your data: data exploration is an important part of research Data visualization is an excellent way to explore data ggplot2 is an elegant R library that makes it easy to create compelling graphs plots can be iteratively built up and easily modified 2/42

3 Learning objectives To create graphs used in manuscripts for epidemiology studies To review and incorporate previously learned aspects of formatting graphs To demonstrate novel data visualizations using Shiny 3/42

4 ggplot architecture review Aesthetics: specify the variables to display - - what are x and y? can also link variables to color, shape, size and transparency geoms : specify type of plot - do you want a scatter plot, line, bars, densities, or other type plot? Scales: for transforming variables(e.g., log, sq. root). - also used to set legend title, breaks, labels Facets: creating separate panels for different factors Themes: Adjust appearance: background, fonts, etc 4/42

5 Hemoglobin data Data from the National Health and Nutritional Examination Survey (NHANES) dataset, containing data about n=3,990 patients The file was created by merging demographic data with complete blood count file, and nutritional biochemistry lab file. Contains measures hemoglobin, iron status, and other anemiarelated parameters 5/42

6 Anemia data codebook age = age in years of participant (years) sex = sex of participant (Male vs Female) tsat = transferrin saturation (%) iron = total serum iron (ug/dl) hgb = hemoglobin concentration (g/dl) ferr = serum ferritin (mg/ml) folate = serum folate (mg/ml) race = participant race (Hispanic, White, Black, Other) rdw = red cell distribution width (%) wbc = white blood cell count (SI) anemia = indicator variable for anemia (according to WHO definition) 6/42

7 Scatter plot review: hemoglobin by age, stratified by ethnicity and sex ggplot(data=anemia, aes(x=age,y=hgb,color=sex)) + geom_smooth() + geom_jitter(aes(size=1/iron), alpha=0.1) + xlab("age")+ylab("hemoglobin (g/dl)") + scale_size(name = "Iron Deficiency") + scale_color_discrete(name = "Sex") + facet_wrap(~race)+theme_bw() 7/42

8 Scatter plot review: hemoglobin by age, stratified by ethnicity and sex 8/42

9 Box plots ggplot(data=anemia, aes(x=race,y=hgb)) + geom_boxplot() 9/42

10 Box plots with points ggplot(data=anemia, aes(x=race,y=hgb,color=sex)) + geom_boxplot()+ geom_jitter(alpha=0.1) 10/42

11 Box plots with coordinates flipped ggplot(data=anemia, aes(x=race,y=hgb,color=sex)) + geom_boxplot()+ geom_jitter(alpha=0.1) + coord_flip() 11/42

12 Violin plots Kernal density estimates that are placed on each side and mirrored so it forms a symmetrical shape Easy to compare several distributions 12/42

13 Violin plots ggplot(data=anemia, aes(x=race,y=hgb,color=race)) + geom_violin() 13/42

14 Violin plots with underlying data points ggplot(data=anemia, aes(x=race,y=hgb,color=race)) + geom_violin()+ geom_jitter(alpha=0.1) 14/42

15 Violin plots stratified by 2 variables ggplot(data=anemia, aes(x=sex,y=hgb,color=race)) + geom_violin() 15/42

16 Violin plots & boxplot with no outliers ggplot(data=anemia, aes(x=race,y=hgb, color=race)) + geom_violin() + geom_boxplot(width=.1, fill="black", outlier.color=na) + stat_summary(fun.y=median, geom="point", fill="white", shape=21, size=2.5) 16/42

17 Practice Use the anemia dataset to practice making scatterplots, boxplots, and violin plots Try faceting, flipping orientation, changing colors and labels str(anemia) ## Classes 'tbl_df', 'tbl' and 'data.frame': 3990 obs. of 13 variables: ## $ age : num ## $ sex : Factor w/ 2 levels "Male","Female": ## $ tsat : num ## $ iron : num ## $ hgb : num ## $ ferr : num ## $ folate: num ## $ vite : num ## $ vita : num ## $ race : Factor w/ 4 levels "Hispanic","White",..: ## $ rdw : num ## $ wbc : num ## $ anemia: num ## - attr(*, "na.action")=class 'omit' Named int [1:805] ##....- attr(*, "names")= chr [1:805] "26" "28" "32" "33"... 17/42

18 Forest plots First gather the data into the proper format including the following variables: Estimate Lower CI Upper CI Grouping variable 18/42

19 Forest plots For this example, we take the mean and calculate the upper and lower confidence interval for hemoglobin. We will stack the row observations into one variable called "Type". anemia1 <- anemia %>% select(sex,hgb) %>% group_by(sex) %>% summarise_all(funs("mean",n(),lower=(mean-((sd(.)/sqrt(n()))*1.96)), upper=(mean+((sd(.)/sqrt(n()))*1.96)))) colnames(anemia1)[1] <- "Type" anemia2 <- anemia %>% select(race,hgb) %>% group_by(race) %>% summarise_all(funs("mean",n(),lower=(mean-((sd(.)/sqrt(n()))*1.96)), upper=(mean+((sd(.)/sqrt(n()))*1.96)))) colnames(anemia2)[1] <- "Type" anemia3 <- rbind(anemia1,anemia2) 19/42

20 Forest plots ggplot(data=anemia3, aes(x=type, y=mean, ymin=lower, ymax=upper)) + geom_pointrange() 20/42

21 Forest plots: flip the axes, add labels ggplot(data=anemia3, aes(x=type, y=mean, ymin=lower, ymax=upper)) + geom_pointrange(shape=20) + coord_flip() + xlab("demographics") + ylab("mean Hemoglobin (95% CI)") + theme_bw() 21/42

22 Forest plots: calculating mean and CI within ggplot ggplot can calculate the mean and CI using stat_summary Further data manipulation would be needed to stack multiple variables 22/42

23 Calculating mean and CI within ggplot ggplot(anemia, aes(x=race, y=hgb)) + stat_summary(fun.data=mean_cl_normal) + coord_flip() + theme_bw() + xlab("demographics") + ylab("mean Hemoglobin (95% CI)") 23/42

24 Forest plots: adding faceting ggplot(any.fit3, aes(x=v3, y=a1, ymin=lower, ymax=upper)) + geom_pointrange(shape=20) + coord_flip() + xlab("predictor Variable") + ylab("adjusted Risk Difference per 100 (95% CI)") + scale_y_continuous(breaks=c(-20,-15,-10,-5,0,5,10,15,20,25), limits = c(-21,26)) + theme_bw() + geom_hline(yintercept=0, lty=2) + facet_grid(setting~., scales= 'free', space='free') 24/42

25 25/42

26 Practice Use the anemia dataset to practice making forest plots using other continuous variables Use dplyr to create a new, categorized age variable (hint: factor this before graphing). Create a forest plot of mean hemoglobin by age category. 26/42

27 Kaplan-Meier plots - WIHS data Women s Interagency HIV Study (WIHS) is an ongoing observational cohort study with semiannual visits at 10 sites in the US Data on 1,164 patients who were HIV-positive, free of clinical AIDS, and not on antiretroviral therapy (ART) at study baseline (Dec. 6, 1995) Contains measures information on age, race, CD4 count, drug use, ARV treatment, and time to aids/death 27/42

28 Kaplan-Meier plots MANY package options to plot survival functions All use the survival package to calculate survival over time - survfit(survival) + survplot(rms) - ggkm(sachsmc/ggkm) & ggplot2 - ggkm(michaelway/ggkm) Allows for multiple treatments and subgroups Does not take into account competing risks 28/42

29 Kaplan-Meier example 1 Calculate KM within ggplot Prep data wihs$outcome <- ifelse(is.na(wihs$art),0,1) wihs$time <- ifelse(is.na(wihs$aids_death_art), wihs$dropout,wihs$aids_death_art) wihs <- wihs %>% mutate(time = ifelse(is.na(time),study_end,time)) 29/42

30 KM plot within ggplot2 devtools::install_github("sachsmc/ggkm") library(ggkm) ggplot(wihs, aes(time = time, status = outcome)) + geom_km() 30/42

31 KM by treatment group ggplot(wihs, aes(time = time, status = outcome, color = factor(idu))) + geom_km() 31/42

32 Add confidence bands ggplot(wihs, aes(time = time, status = outcome, color = factor(idu))) + geom_km() + geom_kmband() 32/42

33 KM example #2 Calculated using survival package Plots KM curve with numbers at risk Same package name as previous example! remove.packages("ggkm") install_github("michaelway/ggkm") library(ggkm) 33/42

34 KM example 2 fit <- survfit(surv(time,outcome)~idu, data=wihs) ggkm(fit) 34/42

35 KM with numbers at risk ggkm(fit, table=true, marks = FALSE, ystratalabs = c("no IDU", "History of IDU")) 35/42

36 Cumulative incidence plots 1-survival probability ipwrisk package - coming soon! - calculates adjusted cumulative incidence curves using IPTW - addresses censoring (IPCW) and competing risks - produces tables and graphics 36/42

37 Sankey diagram Visualization that shows the flow of patients between states (over time) States, or nodes, can be treatments, comorbidities, hospitalizations etc. Paths connecting states are called links - proportion corresponds to thickness of line Example: 37/42

38 Basic sankey diagrams in R library(networkd3) library(reshape2) library(magrittr) nodes <- data.frame(name=c("renal Failure", "Hemodialysis at 6m", "Transplant at 6m", "Death by 6m", "Hemodialysis at 12m", "Transplant at 12m", "Death by 12m")) links <- data.frame(source=c(0,0,0,1,1,1,2,2,2,3), target=c(1,2,3,4,5,6,4,5,6,6), value=c(70,20,10,40,20,10,15,4,1,10)) sankeynetwork(links = links, Nodes = nodes, Source = "source", Target = "target", Value = "value", NodeID ="name", fontsize = 22, nodewidth = 30,nodePadding = 5) 38/42

39 Basic sankey diagrams in R Renal Failure Hemodialysis at 12m Hemodialysis at 6m Transplant at 6m Transplant at 12m Death by 6m Death by 12m 39/42

40 Final Tips Spend time planning your graph Make sure to have the data in the correct structure before you start graphing Start with a simple graph, gradually build in complexity 40/42

41 Further reading ggplot2: Cookbook for R: Quick-R: 41/42

42 Wrap-up Questions? Acknowledgements: Alan Brookhart, Sara Levintow Contact info: 42/42

Plotting with Rcell (Version 1.2-5)

Plotting with Rcell (Version 1.2-5) Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the

More information

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the

More information

MISSING DATA REPORT Survey Data

MISSING DATA REPORT Survey Data MISSING DATA REPORT Survey Data 2012-2016 Abstract The rates of non response for ANZDATA survey items over the last 5 years anzdata@anzdata.org.au www.anzdata.org.au The tables below show the rates of

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018 Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S0 SPSS Intro December 2014 Wilma Heemsbergen w.heemsbergen@nki.nl This Afternoon 13.00 ~ 15.00 SPSS lecture Short break Exercise 2 Database Example 3 Types of data Type

More information

Standard Safety Visualization Set-up Using Spotfire

Standard Safety Visualization Set-up Using Spotfire Paper SD08 Standard Safety Visualization Set-up Using Spotfire Michaela Mertes, F. Hoffmann-La Roche, Ltd., Basel, Switzerland ABSTRACT Stakeholders are requesting real-time access to clinical data to

More information

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project Data Content: Example: Who chats on-line most frequently? This Technology Use dataset in

More information

INTRODUCTION TO DATA. Welcome to the course!

INTRODUCTION TO DATA. Welcome to the course! INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

HTC Data Use Tool - User s Manual

HTC Data Use Tool - User s Manual HTC Data Use Tool - User s Manual Module 1: Inputting data into the HTC Data Use Tool Global Strategic Information UCSF Global Health Sciences http://globalhealthsciences.ucsf.edu/pphg/gsi Contact us:

More information

Creating elegant graphics in R with ggplot2

Creating elegant graphics in R with ggplot2 Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

JMP Clinical. Getting Started with. JMP Clinical. Version 3.1

JMP Clinical. Getting Started with. JMP Clinical. Version 3.1 JMP Clinical Version 3.1 Getting Started with JMP Clinical Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

Acknowledgments. Acronyms

Acknowledgments. Acronyms Acknowledgments Preface Acronyms xi xiii xv 1 Basic Tools 1 1.1 Goals of inference 1 1.1.1 Population or process? 1 1.1.2 Probability samples 2 1.1.3 Sampling weights 3 1.1.4 Design effects. 5 1.2 An introduction

More information

Package compeir. February 19, 2015

Package compeir. February 19, 2015 Type Package Package compeir February 19, 2015 Title Event-specific incidence rates for competing risks data Version 1.0 Date 2011-03-09 Author Nadine Grambauer, Andreas Neudecker Maintainer Nadine Grambauer

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility PharmaSUG2011 Paper TT12 Creating Forest Plots Using SAS/GRAPH and the Annotate Facility Amanda Tweed, Millennium: The Takeda Oncology Company, Cambridge, MA ABSTRACT Forest plots have become common in

More information

The diamonds dataset Visualizing data in R with ggplot2

The diamonds dataset Visualizing data in R with ggplot2 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part

More information

Plotting with ggplot2: Part 2. Biostatistics

Plotting with ggplot2: Part 2. Biostatistics Plotting with ggplot2: Part 2 Biostatistics 14.776 Building Plots with ggplot2 When building plots in ggplot2 (rather than using qplot) the artist s palette model may be the closest analogy Plots are built

More information

ggplot in 3 easy steps (maybe 2 easy steps)

ggplot in 3 easy steps (maybe 2 easy steps) 1 ggplot in 3 easy steps (maybe 2 easy steps) 1.1 aesthetic: what you want to graph (e.g. x, y, z). 1.2 geom: how you want to graph it. 1.3 options: optional titles, themes, etc. 2 Background R has a number

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Advanced Data Visualization using TIBCO Spotfire and SAS using SDTM. Ajay Gupta, PPD

Advanced Data Visualization using TIBCO Spotfire and SAS using SDTM. Ajay Gupta, PPD Advanced Data Visualization using TIBCO Spotfire and SAS using SDTM Ajay Gupta, PPD INTRODUCTION + TIBCO Spotfire is an analytics and business intelligence platform, which enables data visualization in

More information

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC PharmaSUG2010 - Paper TT16 Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC ABSTRACT Graphical representation of clinical data is used for concise visual presentations of

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

%ANYTL: A Versatile Table/Listing Macro

%ANYTL: A Versatile Table/Listing Macro Paper AD09-2009 %ANYTL: A Versatile Table/Listing Macro Yang Chen, Forest Research Institute, Jersey City, NJ ABSTRACT Unlike traditional table macros, %ANTL has only 3 macro parameters which correspond

More information

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons

More information

Alpha 1 i2b2 User Guide

Alpha 1 i2b2 User Guide Alpha 1 i2b2 User Guide About i2b2 Accessing i2b2 Data Available in i2b2 Navigating the Workbench Workbench Screen Layout Additional Workbench Features Creating and Running a Query Creating a Query Excluding

More information

Econ 2148, spring 2019 Data visualization

Econ 2148, spring 2019 Data visualization Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Chapter 1, TUFTE STYLE GRIDDING FOR READABILITY. Chapter 5, SLICE (CROSS-SECTIONAL VIEWS)

Chapter 1, TUFTE STYLE GRIDDING FOR READABILITY. Chapter 5, SLICE (CROSS-SECTIONAL VIEWS) Chapter, TUFTE STYLE GRIDDING FOR READABILITY Chapter 5, SLICE (CROSS-SECTIONAL VIEWS) Number of responses 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 Distribution of ethnicities in each income group of SF bay area

More information

Session 1 Nick Hathaway;

Session 1 Nick Hathaway; Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................

More information

Using Built-in Plotting Functions

Using Built-in Plotting Functions Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative

More information

NHANES June Introduction. Data information & loading data. Using dynamic data within a typical classroom

NHANES June Introduction. Data information & loading data. Using dynamic data within a typical classroom NHANES June 2016 Introduction The NHANES data come from the National Health and Nutrition Examination Survey, surveys given nationwide by the Center for Disease Controls (CDC). The data are collected to

More information

From Getting Started with the Graph Template Language in SAS. Full book available for purchase here.

From Getting Started with the Graph Template Language in SAS. Full book available for purchase here. From Getting Started with the Graph Template Language in SAS. Full book available for purchase here. Contents About This Book... xi About The Author... xv Acknowledgments...xvii Chapter 1: Introduction

More information

Introduction to ggvis. Aimee Gott R Consultant

Introduction to ggvis. Aimee Gott R Consultant Introduction to ggvis Overview Recap of the basics of ggplot2 Getting started with ggvis The %>% operator Changing aesthetics Layers Interactivity Resources for the Workshop R (version 3.1.2) RStudio ggvis

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel? Lecture 3: Programming Statistics in R Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Questions from last lecture? Problems with Stata? Problems with Excel? 2

More information

Chemistry 30 Tips for Creating Graphs using Microsoft Excel

Chemistry 30 Tips for Creating Graphs using Microsoft Excel Chemistry 30 Tips for Creating Graphs using Microsoft Excel Graphing is an important skill to learn in the science classroom. Students should be encouraged to use spreadsheet programs to create graphs.

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

R Scripts and Functions Survival Short Course

R Scripts and Functions Survival Short Course R Scripts and Functions Survival Short Course MISCELLANEOUS SCRIPTS (1) Survival Curves Define specific dataset for practice: Hodgkin (Site=33011), Young (Age

More information

Package DSBayes. February 19, 2015

Package DSBayes. February 19, 2015 Type Package Title Bayesian subgroup analysis in clinical trials Version 1.1 Date 2013-12-28 Copyright Ravi Varadhan Package DSBayes February 19, 2015 URL http: //www.jhsph.edu/agingandhealth/people/faculty_personal_pages/varadhan.html

More information

NONPARAMETRIC SUMMARY CURVES FOR COMPETING RISKS IN R

NONPARAMETRIC SUMMARY CURVES FOR COMPETING RISKS IN R NONPARAMETRIC SUMMARY CURVES FOR COMPETING RISKS IN R By Pawel Paczuski 1 University of Michigan November 19, 2012 Abstract In survival analysis, when a subject may fail due to one of K 2 causes, we have

More information

How to Use the Cancer-Rates.Info/NJ

How to Use the Cancer-Rates.Info/NJ How to Use the Cancer-Rates.Info/NJ Web- Based Incidence and Mortality Mapping and Inquiry Tool to Obtain Statewide and County Cancer Statistics for New Jersey Cancer Incidence and Mortality Inquiry System

More information

PRESENTING DATA. Overview. Some basic things to remember

PRESENTING DATA. Overview. Some basic things to remember PRESENTING DATA This handout is one of a series that accompanies An Adventure in Statistics: The Reality Enigma by me, Andy Field. These handouts are offered for free (although I hope you will buy the

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Chapter 6. THE NORMAL DISTRIBUTION

Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

Graphing Bivariate Relationships

Graphing Bivariate Relationships Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship

More information

Online Reports. ACS NSQIP National Conference Salt Lake City, Utah Pre-Conference Session July 21, 2012

Online Reports. ACS NSQIP National Conference Salt Lake City, Utah Pre-Conference Session July 21, 2012 Online Reports ACS NSQIP National Conference Salt Lake City, Utah Pre-Conference Session July 21, 2012 Accessing Online Reports Data Main Page Right Hand Side Menu o Quick link that jumps you to an individual

More information

Week 2: Frequency distributions

Week 2: Frequency distributions Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data

More information

High Value Reports in HCT Status Update Feb 2016

High Value Reports in HCT Status Update Feb 2016 High Value Reports in HCT Status Update 2015 Feb 2016 1 Highlights of SCTOD expectations Collect data (and specimens) ALL allogeneic HCTs with a U.S. recipient or donor Related donor-recipient repository

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

Package rereg. May 30, 2018

Package rereg. May 30, 2018 Title Recurrent Event Regression Version 1.1.4 Package rereg May 30, 2018 A collection of regression models for recurrent event process and failure time. Available methods include these from Xu et al.

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Tabular & Graphical Presentation of data

Tabular & Graphical Presentation of data Tabular & Graphical Presentation of data bjectives: To know how to make frequency distributions and its importance To know different terminology in frequency distribution table To learn different graphs/diagrams

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

Paper: PO19 ARROW Statistical Graphic System ABSTRACT INTRODUCTION pagesize=, layout=, textsize=, lines=, symbols=, outcolor=, outfile=,

Paper: PO19 ARROW Statistical Graphic System ABSTRACT INTRODUCTION pagesize=, layout=, textsize=, lines=, symbols=, outcolor=, outfile=, Paper: PO19 ARROW Statistical Graphic System Cheng Jun Tian, Johnson & Johnson PRD, Titusville, New Jersey, 08560 Qin Li, Johnson & Johnson PRD, Titusville, New Jersey, 08560 Jiangfan Li, Johnson & Johnson

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct

More information

Individual Covariates

Individual Covariates WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation

More information

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018 Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The

More information

Survey Questions and Methodology

Survey Questions and Methodology Survey Questions and Methodology Winter Tracking Survey 2012 Final Topline 02/22/2012 Data for January 20 February 19, 2012 Princeton Survey Research Associates International for the Pew Research Center

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

LAST UPDATED: October 16, 2012 DISTRIBUTIONS PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

LAST UPDATED: October 16, 2012 DISTRIBUTIONS PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder LAST UPDATED: October 16, 2012 DISTRIBUTIONS Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R. LAST UPDATED: October

More information

Data Dictionary for Quarterly Dialysis Facility Compare

Data Dictionary for Quarterly Dialysis Facility Compare Data Dictionary for Quarterly Dialysis Facility Compare Release Date: April 2015 This document provides the variable name, variable type, maximum length and a description for each column included in the

More information

Package PTE. October 10, 2017

Package PTE. October 10, 2017 Type Package Title Personalized Treatment Evaluator Version 1.6 Date 2017-10-9 Package PTE October 10, 2017 Author Adam Kapelner, Alina Levine & Justin Bleich Maintainer Adam Kapelner

More information

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

AcaStat User Manual. Version 10 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

AcaStat User Manual. Version 10 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. AcaStat User Manual Version 10 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents NEW IN VERSION 10... 6 INTRODUCTION... 7 GETTING HELP...

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Data Visualization Principles for Scientific Communication

Data Visualization Principles for Scientific Communication Data Visualization Principles for Scientific Communication 8-888 Introduction to Linguistic Data Analysis Using R Jerzy Wieczorek 11//15 Follow along These slides and a summary checklist are at http://www.stat.cmu.edu/~jwieczor/

More information

Epi Info 2000 Basics. Data Entry and Documentation

Epi Info 2000 Basics. Data Entry and Documentation Data Entry and Documentation Data Backup and Security Introduction to the Analysis Program Exercises Epi Info 2000 Basics Epi Info 2000 (EI2K) is an epidemiologic data management and analysis program written

More information

Survey Questions and Methodology

Survey Questions and Methodology Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Solution to Tumor growth in mice

Solution to Tumor growth in mice Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA Paper HW04 There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA ABSTRACT Clinical Trials data comes in all shapes and sizes depending

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design

Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design Randomized the Clinical Trails About the Uncontrolled Trails The protocol Development The

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT PRIMER FOR ACS OUTCOMES RESEARCH COURSE: TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT STEP 1: Install STATA statistical software. STEP 2: Read through this primer and complete the

More information

Visualizing the World

Visualizing the World Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing

More information