ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017
|
|
- Russell Greene
- 5 years ago
- Views:
Transcription
1 ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017
2 Introduction Know your data: data exploration is an important part of research Data visualization is an excellent way to explore data ggplot2 is an elegant R library that makes it easy to create compelling graphs plots can be iteratively built up and easily modified 2/42
3 Learning objectives To create graphs used in manuscripts for epidemiology studies To review and incorporate previously learned aspects of formatting graphs To demonstrate novel data visualizations using Shiny 3/42
4 ggplot architecture review Aesthetics: specify the variables to display - - what are x and y? can also link variables to color, shape, size and transparency geoms : specify type of plot - do you want a scatter plot, line, bars, densities, or other type plot? Scales: for transforming variables(e.g., log, sq. root). - also used to set legend title, breaks, labels Facets: creating separate panels for different factors Themes: Adjust appearance: background, fonts, etc 4/42
5 Hemoglobin data Data from the National Health and Nutritional Examination Survey (NHANES) dataset, containing data about n=3,990 patients The file was created by merging demographic data with complete blood count file, and nutritional biochemistry lab file. Contains measures hemoglobin, iron status, and other anemiarelated parameters 5/42
6 Anemia data codebook age = age in years of participant (years) sex = sex of participant (Male vs Female) tsat = transferrin saturation (%) iron = total serum iron (ug/dl) hgb = hemoglobin concentration (g/dl) ferr = serum ferritin (mg/ml) folate = serum folate (mg/ml) race = participant race (Hispanic, White, Black, Other) rdw = red cell distribution width (%) wbc = white blood cell count (SI) anemia = indicator variable for anemia (according to WHO definition) 6/42
7 Scatter plot review: hemoglobin by age, stratified by ethnicity and sex ggplot(data=anemia, aes(x=age,y=hgb,color=sex)) + geom_smooth() + geom_jitter(aes(size=1/iron), alpha=0.1) + xlab("age")+ylab("hemoglobin (g/dl)") + scale_size(name = "Iron Deficiency") + scale_color_discrete(name = "Sex") + facet_wrap(~race)+theme_bw() 7/42
8 Scatter plot review: hemoglobin by age, stratified by ethnicity and sex 8/42
9 Box plots ggplot(data=anemia, aes(x=race,y=hgb)) + geom_boxplot() 9/42
10 Box plots with points ggplot(data=anemia, aes(x=race,y=hgb,color=sex)) + geom_boxplot()+ geom_jitter(alpha=0.1) 10/42
11 Box plots with coordinates flipped ggplot(data=anemia, aes(x=race,y=hgb,color=sex)) + geom_boxplot()+ geom_jitter(alpha=0.1) + coord_flip() 11/42
12 Violin plots Kernal density estimates that are placed on each side and mirrored so it forms a symmetrical shape Easy to compare several distributions 12/42
13 Violin plots ggplot(data=anemia, aes(x=race,y=hgb,color=race)) + geom_violin() 13/42
14 Violin plots with underlying data points ggplot(data=anemia, aes(x=race,y=hgb,color=race)) + geom_violin()+ geom_jitter(alpha=0.1) 14/42
15 Violin plots stratified by 2 variables ggplot(data=anemia, aes(x=sex,y=hgb,color=race)) + geom_violin() 15/42
16 Violin plots & boxplot with no outliers ggplot(data=anemia, aes(x=race,y=hgb, color=race)) + geom_violin() + geom_boxplot(width=.1, fill="black", outlier.color=na) + stat_summary(fun.y=median, geom="point", fill="white", shape=21, size=2.5) 16/42
17 Practice Use the anemia dataset to practice making scatterplots, boxplots, and violin plots Try faceting, flipping orientation, changing colors and labels str(anemia) ## Classes 'tbl_df', 'tbl' and 'data.frame': 3990 obs. of 13 variables: ## $ age : num ## $ sex : Factor w/ 2 levels "Male","Female": ## $ tsat : num ## $ iron : num ## $ hgb : num ## $ ferr : num ## $ folate: num ## $ vite : num ## $ vita : num ## $ race : Factor w/ 4 levels "Hispanic","White",..: ## $ rdw : num ## $ wbc : num ## $ anemia: num ## - attr(*, "na.action")=class 'omit' Named int [1:805] ##....- attr(*, "names")= chr [1:805] "26" "28" "32" "33"... 17/42
18 Forest plots First gather the data into the proper format including the following variables: Estimate Lower CI Upper CI Grouping variable 18/42
19 Forest plots For this example, we take the mean and calculate the upper and lower confidence interval for hemoglobin. We will stack the row observations into one variable called "Type". anemia1 <- anemia %>% select(sex,hgb) %>% group_by(sex) %>% summarise_all(funs("mean",n(),lower=(mean-((sd(.)/sqrt(n()))*1.96)), upper=(mean+((sd(.)/sqrt(n()))*1.96)))) colnames(anemia1)[1] <- "Type" anemia2 <- anemia %>% select(race,hgb) %>% group_by(race) %>% summarise_all(funs("mean",n(),lower=(mean-((sd(.)/sqrt(n()))*1.96)), upper=(mean+((sd(.)/sqrt(n()))*1.96)))) colnames(anemia2)[1] <- "Type" anemia3 <- rbind(anemia1,anemia2) 19/42
20 Forest plots ggplot(data=anemia3, aes(x=type, y=mean, ymin=lower, ymax=upper)) + geom_pointrange() 20/42
21 Forest plots: flip the axes, add labels ggplot(data=anemia3, aes(x=type, y=mean, ymin=lower, ymax=upper)) + geom_pointrange(shape=20) + coord_flip() + xlab("demographics") + ylab("mean Hemoglobin (95% CI)") + theme_bw() 21/42
22 Forest plots: calculating mean and CI within ggplot ggplot can calculate the mean and CI using stat_summary Further data manipulation would be needed to stack multiple variables 22/42
23 Calculating mean and CI within ggplot ggplot(anemia, aes(x=race, y=hgb)) + stat_summary(fun.data=mean_cl_normal) + coord_flip() + theme_bw() + xlab("demographics") + ylab("mean Hemoglobin (95% CI)") 23/42
24 Forest plots: adding faceting ggplot(any.fit3, aes(x=v3, y=a1, ymin=lower, ymax=upper)) + geom_pointrange(shape=20) + coord_flip() + xlab("predictor Variable") + ylab("adjusted Risk Difference per 100 (95% CI)") + scale_y_continuous(breaks=c(-20,-15,-10,-5,0,5,10,15,20,25), limits = c(-21,26)) + theme_bw() + geom_hline(yintercept=0, lty=2) + facet_grid(setting~., scales= 'free', space='free') 24/42
25 25/42
26 Practice Use the anemia dataset to practice making forest plots using other continuous variables Use dplyr to create a new, categorized age variable (hint: factor this before graphing). Create a forest plot of mean hemoglobin by age category. 26/42
27 Kaplan-Meier plots - WIHS data Women s Interagency HIV Study (WIHS) is an ongoing observational cohort study with semiannual visits at 10 sites in the US Data on 1,164 patients who were HIV-positive, free of clinical AIDS, and not on antiretroviral therapy (ART) at study baseline (Dec. 6, 1995) Contains measures information on age, race, CD4 count, drug use, ARV treatment, and time to aids/death 27/42
28 Kaplan-Meier plots MANY package options to plot survival functions All use the survival package to calculate survival over time - survfit(survival) + survplot(rms) - ggkm(sachsmc/ggkm) & ggplot2 - ggkm(michaelway/ggkm) Allows for multiple treatments and subgroups Does not take into account competing risks 28/42
29 Kaplan-Meier example 1 Calculate KM within ggplot Prep data wihs$outcome <- ifelse(is.na(wihs$art),0,1) wihs$time <- ifelse(is.na(wihs$aids_death_art), wihs$dropout,wihs$aids_death_art) wihs <- wihs %>% mutate(time = ifelse(is.na(time),study_end,time)) 29/42
30 KM plot within ggplot2 devtools::install_github("sachsmc/ggkm") library(ggkm) ggplot(wihs, aes(time = time, status = outcome)) + geom_km() 30/42
31 KM by treatment group ggplot(wihs, aes(time = time, status = outcome, color = factor(idu))) + geom_km() 31/42
32 Add confidence bands ggplot(wihs, aes(time = time, status = outcome, color = factor(idu))) + geom_km() + geom_kmband() 32/42
33 KM example #2 Calculated using survival package Plots KM curve with numbers at risk Same package name as previous example! remove.packages("ggkm") install_github("michaelway/ggkm") library(ggkm) 33/42
34 KM example 2 fit <- survfit(surv(time,outcome)~idu, data=wihs) ggkm(fit) 34/42
35 KM with numbers at risk ggkm(fit, table=true, marks = FALSE, ystratalabs = c("no IDU", "History of IDU")) 35/42
36 Cumulative incidence plots 1-survival probability ipwrisk package - coming soon! - calculates adjusted cumulative incidence curves using IPTW - addresses censoring (IPCW) and competing risks - produces tables and graphics 36/42
37 Sankey diagram Visualization that shows the flow of patients between states (over time) States, or nodes, can be treatments, comorbidities, hospitalizations etc. Paths connecting states are called links - proportion corresponds to thickness of line Example: 37/42
38 Basic sankey diagrams in R library(networkd3) library(reshape2) library(magrittr) nodes <- data.frame(name=c("renal Failure", "Hemodialysis at 6m", "Transplant at 6m", "Death by 6m", "Hemodialysis at 12m", "Transplant at 12m", "Death by 12m")) links <- data.frame(source=c(0,0,0,1,1,1,2,2,2,3), target=c(1,2,3,4,5,6,4,5,6,6), value=c(70,20,10,40,20,10,15,4,1,10)) sankeynetwork(links = links, Nodes = nodes, Source = "source", Target = "target", Value = "value", NodeID ="name", fontsize = 22, nodewidth = 30,nodePadding = 5) 38/42
39 Basic sankey diagrams in R Renal Failure Hemodialysis at 12m Hemodialysis at 6m Transplant at 6m Transplant at 12m Death by 6m Death by 12m 39/42
40 Final Tips Spend time planning your graph Make sure to have the data in the correct structure before you start graphing Start with a simple graph, gradually build in complexity 40/42
41 Further reading ggplot2: Cookbook for R: Quick-R: 41/42
42 Wrap-up Questions? Acknowledgements: Alan Brookhart, Sara Levintow Contact info: 42/42
Plotting with Rcell (Version 1.2-5)
Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson
More informationData visualization with ggplot2
Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2
More informationRstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang
Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning
More informationLab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018
LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory
More informationIntroduction to Graphics with ggplot2
Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to
More information03 - Intro to graphics (with ggplot2)
3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the
More informationBIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA
BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the
More informationMISSING DATA REPORT Survey Data
MISSING DATA REPORT Survey Data 2012-2016 Abstract The rates of non response for ANZDATA survey items over the last 5 years anzdata@anzdata.org.au www.anzdata.org.au The tables below show the rates of
More informationSTATA 13 INTRODUCTION
STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA
More informationExploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018
Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S0 SPSS Intro December 2014 Wilma Heemsbergen w.heemsbergen@nki.nl This Afternoon 13.00 ~ 15.00 SPSS lecture Short break Exercise 2 Database Example 3 Types of data Type
More informationStandard Safety Visualization Set-up Using Spotfire
Paper SD08 Standard Safety Visualization Set-up Using Spotfire Michaela Mertes, F. Hoffmann-La Roche, Ltd., Basel, Switzerland ABSTRACT Stakeholders are requesting real-time access to clinical data to
More informationLet s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project
Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project Data Content: Example: Who chats on-line most frequently? This Technology Use dataset in
More informationINTRODUCTION TO DATA. Welcome to the course!
INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)
More informationBIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...
BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...
More informationImporting and visualizing data in R. Day 3
Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation
More informationVisualizing Data: Customization with ggplot2
Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers
More informationHTC Data Use Tool - User s Manual
HTC Data Use Tool - User s Manual Module 1: Inputting data into the HTC Data Use Tool Global Strategic Information UCSF Global Health Sciences http://globalhealthsciences.ucsf.edu/pphg/gsi Contact us:
More informationCreating elegant graphics in R with ggplot2
Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is
More informationStatistical transformations
Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn
More informationFacets and Continuous graphs
Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display
More informationJMP Clinical. Getting Started with. JMP Clinical. Version 3.1
JMP Clinical Version 3.1 Getting Started with JMP Clinical Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of
More informationThe following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.
Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created
More informationAcknowledgments. Acronyms
Acknowledgments Preface Acronyms xi xiii xv 1 Basic Tools 1 1.1 Goals of inference 1 1.1.1 Population or process? 1 1.1.2 Probability samples 2 1.1.3 Sampling weights 3 1.1.4 Design effects. 5 1.2 An introduction
More informationPackage compeir. February 19, 2015
Type Package Package compeir February 19, 2015 Title Event-specific incidence rates for competing risks data Version 1.0 Date 2011-03-09 Author Nadine Grambauer, Andreas Neudecker Maintainer Nadine Grambauer
More informationLecture 4: Data Visualization I
Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationLecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression
Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationCreating Forest Plots Using SAS/GRAPH and the Annotate Facility
PharmaSUG2011 Paper TT12 Creating Forest Plots Using SAS/GRAPH and the Annotate Facility Amanda Tweed, Millennium: The Takeda Oncology Company, Cambridge, MA ABSTRACT Forest plots have become common in
More informationThe diamonds dataset Visualizing data in R with ggplot2
Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part
More informationPlotting with ggplot2: Part 2. Biostatistics
Plotting with ggplot2: Part 2 Biostatistics 14.776 Building Plots with ggplot2 When building plots in ggplot2 (rather than using qplot) the artist s palette model may be the closest analogy Plots are built
More informationggplot in 3 easy steps (maybe 2 easy steps)
1 ggplot in 3 easy steps (maybe 2 easy steps) 1.1 aesthetic: what you want to graph (e.g. x, y, z). 1.2 geom: how you want to graph it. 1.3 options: optional titles, themes, etc. 2 Background R has a number
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationAdvanced Data Visualization using TIBCO Spotfire and SAS using SDTM. Ajay Gupta, PPD
Advanced Data Visualization using TIBCO Spotfire and SAS using SDTM Ajay Gupta, PPD INTRODUCTION + TIBCO Spotfire is an analytics and business intelligence platform, which enables data visualization in
More informationData Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC
PharmaSUG2010 - Paper TT16 Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC ABSTRACT Graphical representation of clinical data is used for concise visual presentations of
More informationPython for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT
Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.
More information%ANYTL: A Versatile Table/Listing Macro
Paper AD09-2009 %ANYTL: A Versatile Table/Listing Macro Yang Chen, Forest Research Institute, Jersey City, NJ ABSTRACT Unlike traditional table macros, %ANTL has only 3 macro parameters which correspond
More informationLarge data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.
Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons
More informationAlpha 1 i2b2 User Guide
Alpha 1 i2b2 User Guide About i2b2 Accessing i2b2 Data Available in i2b2 Navigating the Workbench Workbench Screen Layout Additional Workbench Features Creating and Running a Query Creating a Query Excluding
More informationEcon 2148, spring 2019 Data visualization
Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by
More informationThings you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.
1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.
More informationChapter 1, TUFTE STYLE GRIDDING FOR READABILITY. Chapter 5, SLICE (CROSS-SECTIONAL VIEWS)
Chapter, TUFTE STYLE GRIDDING FOR READABILITY Chapter 5, SLICE (CROSS-SECTIONAL VIEWS) Number of responses 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 Distribution of ethnicities in each income group of SF bay area
More informationSession 1 Nick Hathaway;
Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................
More informationUsing Built-in Plotting Functions
Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative
More informationNHANES June Introduction. Data information & loading data. Using dynamic data within a typical classroom
NHANES June 2016 Introduction The NHANES data come from the National Health and Nutrition Examination Survey, surveys given nationwide by the Center for Disease Controls (CDC). The data are collected to
More informationFrom Getting Started with the Graph Template Language in SAS. Full book available for purchase here.
From Getting Started with the Graph Template Language in SAS. Full book available for purchase here. Contents About This Book... xi About The Author... xv Acknowledgments...xvii Chapter 1: Introduction
More informationIntroduction to ggvis. Aimee Gott R Consultant
Introduction to ggvis Overview Recap of the basics of ggplot2 Getting started with ggvis The %>% operator Changing aesthetics Layers Interactivity Resources for the Workshop R (version 3.1.2) RStudio ggvis
More informationChapter 6. THE NORMAL DISTRIBUTION
Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells
More informationAn introduction to ggplot: An implementation of the grammar of graphics in R
An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More information7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?
Lecture 3: Programming Statistics in R Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Questions from last lecture? Problems with Stata? Problems with Excel? 2
More informationChemistry 30 Tips for Creating Graphs using Microsoft Excel
Chemistry 30 Tips for Creating Graphs using Microsoft Excel Graphing is an important skill to learn in the science classroom. Students should be encouraged to use spreadsheet programs to create graphs.
More informationData Visualization Using R & ggplot2. Karthik Ram October 6, 2013
Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")
More informationWHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
More informationR Scripts and Functions Survival Short Course
R Scripts and Functions Survival Short Course MISCELLANEOUS SCRIPTS (1) Survival Curves Define specific dataset for practice: Hodgkin (Site=33011), Young (Age
More informationPackage DSBayes. February 19, 2015
Type Package Title Bayesian subgroup analysis in clinical trials Version 1.1 Date 2013-12-28 Copyright Ravi Varadhan Package DSBayes February 19, 2015 URL http: //www.jhsph.edu/agingandhealth/people/faculty_personal_pages/varadhan.html
More informationNONPARAMETRIC SUMMARY CURVES FOR COMPETING RISKS IN R
NONPARAMETRIC SUMMARY CURVES FOR COMPETING RISKS IN R By Pawel Paczuski 1 University of Michigan November 19, 2012 Abstract In survival analysis, when a subject may fail due to one of K 2 causes, we have
More informationHow to Use the Cancer-Rates.Info/NJ
How to Use the Cancer-Rates.Info/NJ Web- Based Incidence and Mortality Mapping and Inquiry Tool to Obtain Statewide and County Cancer Statistics for New Jersey Cancer Incidence and Mortality Inquiry System
More informationPRESENTING DATA. Overview. Some basic things to remember
PRESENTING DATA This handout is one of a series that accompanies An Adventure in Statistics: The Reality Enigma by me, Andy Field. These handouts are offered for free (although I hope you will buy the
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationChapter 6. THE NORMAL DISTRIBUTION
Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells
More informationGraphing Bivariate Relationships
Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship
More informationOnline Reports. ACS NSQIP National Conference Salt Lake City, Utah Pre-Conference Session July 21, 2012
Online Reports ACS NSQIP National Conference Salt Lake City, Utah Pre-Conference Session July 21, 2012 Accessing Online Reports Data Main Page Right Hand Side Menu o Quick link that jumps you to an individual
More informationWeek 2: Frequency distributions
Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data
More informationHigh Value Reports in HCT Status Update Feb 2016
High Value Reports in HCT Status Update 2015 Feb 2016 1 Highlights of SCTOD expectations Collect data (and specimens) ALL allogeneic HCTs with a U.S. recipient or donor Related donor-recipient repository
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationPackage rereg. May 30, 2018
Title Recurrent Event Regression Version 1.1.4 Package rereg May 30, 2018 A collection of regression models for recurrent event process and failure time. Available methods include these from Xu et al.
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationTabular & Graphical Presentation of data
Tabular & Graphical Presentation of data bjectives: To know how to make frequency distributions and its importance To know different terminology in frequency distribution table To learn different graphs/diagrams
More informationPackage ggextra. April 4, 2018
Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',
More informationSelected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.
Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data
More informationPaper: PO19 ARROW Statistical Graphic System ABSTRACT INTRODUCTION pagesize=, layout=, textsize=, lines=, symbols=, outcolor=, outfile=,
Paper: PO19 ARROW Statistical Graphic System Cheng Jun Tian, Johnson & Johnson PRD, Titusville, New Jersey, 08560 Qin Li, Johnson & Johnson PRD, Titusville, New Jersey, 08560 Jiangfan Li, Johnson & Johnson
More informationData Science and Machine Learning Essentials
Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct
More informationIndividual Covariates
WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation
More informationLecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018
Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The
More informationSurvey Questions and Methodology
Survey Questions and Methodology Winter Tracking Survey 2012 Final Topline 02/22/2012 Data for January 20 February 19, 2012 Princeton Survey Research Associates International for the Pew Research Center
More informationggplot2 for beginners Maria Novosolov 1 December, 2014
ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working
More informationLAST UPDATED: October 16, 2012 DISTRIBUTIONS PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder
LAST UPDATED: October 16, 2012 DISTRIBUTIONS Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R. LAST UPDATED: October
More informationData Dictionary for Quarterly Dialysis Facility Compare
Data Dictionary for Quarterly Dialysis Facility Compare Release Date: April 2015 This document provides the variable name, variable type, maximum length and a description for each column included in the
More informationPackage PTE. October 10, 2017
Type Package Title Personalized Treatment Evaluator Version 1.6 Date 2017-10-9 Package PTE October 10, 2017 Author Adam Kapelner, Alina Levine & Justin Bleich Maintainer Adam Kapelner
More informationMaximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University
Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationAcaStat User Manual. Version 10 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.
AcaStat User Manual Version 10 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents NEW IN VERSION 10... 6 INTRODUCTION... 7 GETTING HELP...
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationData Visualization Principles for Scientific Communication
Data Visualization Principles for Scientific Communication 8-888 Introduction to Linguistic Data Analysis Using R Jerzy Wieczorek 11//15 Follow along These slides and a summary checklist are at http://www.stat.cmu.edu/~jwieczor/
More informationEpi Info 2000 Basics. Data Entry and Documentation
Data Entry and Documentation Data Backup and Security Introduction to the Analysis Program Exercises Epi Info 2000 Basics Epi Info 2000 (EI2K) is an epidemiologic data management and analysis program written
More informationSurvey Questions and Methodology
Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationSolution to Tumor growth in mice
Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly
More informationIntroduction to Data Visualization
Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationThere s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA
Paper HW04 There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA ABSTRACT Clinical Trials data comes in all shapes and sizes depending
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More informationModule I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design
Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design Randomized the Clinical Trails About the Uncontrolled Trails The protocol Development The
More informationBar Charts and Frequency Distributions
Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats
More informationTYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT
PRIMER FOR ACS OUTCOMES RESEARCH COURSE: TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT STEP 1: Install STATA statistical software. STEP 2: Read through this primer and complete the
More informationVisualizing the World
Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing
More information