DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN. Graphics. Compact R for the DANTRIP team. Klaus K. Holst

Similar documents
Linear discriminant analysis and logistic

Course outline. An Introduction to R Graphics 2. Standard graphics in R. Outline: Session 2. R graphics systems. 1. Overview of R graphics

Intro to R for Epidemiologists

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Data Visualization in R 2. Standard graphics in R

Introduction to R and Statistical Data Analysis

Les exemples des fonctions graphiques de haut niveau

Chapter 7 Graphing Tools

An Introduction to R Graphics

Advanced Graphics in R

Advanced Statistics 1. Lab 11 - Charts for three or more variables. Systems modelling and data analysis 2016/2017

Customising spatial data classes and methods *

DATA VISUALIZATION WITH GGPLOT2. Coordinates

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules

LaTeX packages for R and Advanced knitr

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Introduction to R for Epidemiologists

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Chuck Cartledge, PhD. 20 January 2018

Graphing Bivariate Relationships

STAT 1291: Data Science

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Clojure & Incanter. Introduction to Datasets & Charts. Data Sorcery with. David Edgar Liebke

Creating publication-ready Word tables in R

Introduction to R/Bioconductor

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

Data Visualization in R

programming R: Functions and Methods

R Graphics. SCS Short Course March 14, 2008

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton

Input: Concepts, Instances, Attributes

Introduction to R: Part II

An Introduction to R Graphics with examples

A Data Explorer System and Rulesets of Table Functions

Machine Learning: Algorithms and Applications Mockup Examination

Introduction to Statistical Graphics Procedures

DATA VISUALIZATION WITH GGPLOT2. Grid Graphics

Data Visualization in R

Experimental Design + k- Nearest Neighbors

MULTIVARIATE ANALYSIS USING R

Introduction to Artificial Intelligence

While not exactly the same, these definitions highlight four key elements of statistics.

Making Data Speak USING R SOFTWARE. 1 November 2017

Data Mining: Exploring Data. Lecture Notes for Chapter 3

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

Introduction to Graphics with ggplot2

Package reghelper. April 8, 2017

Decision Trees In Weka,Data Formats

R Graphics. Feng Li School of Statistics and Mathematics Central University of Finance and Economics

Manuel Oviedo de la Fuente and Manuel Febrero Bande

Boxplot

Introduction to R. Richard Wang, PhD CBI fellow Nelson & Miceli labs

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Package tidylpa. March 28, 2018

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

INTRODUCTION TO R. Basic Graphics

IST 3108 Data Analysis and Graphics Using R Week 9

Visualizing high-dimensional data:

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

blogr: R for blogs Shane M. Conway December 13, 2009

Stat 302 Statistical Software and Its Applications Density Estimation

Basic Concepts Weka Workbench and its terminology

Package TPD. June 14, 2018

Introduction to R. A Short Overview. Thomas Girke. December 12, Introduction to R Slide 1/70

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

mmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones

Chapter 60 The STEPDISC Procedure. Chapter Table of Contents

Visualizing the World

Data Manipulation using dplyr

USE IBM IN-DATABASE ANALYTICS WITH R

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Introduction to R. Daniel Berglund. 9 November 2017

Partitioning Cluster Analysis with Possibilistic C-Means Zeynel Cebeci

Advanced Graphics with R

Package PCADSC. April 19, 2017

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

The STEPDISC Procedure

03 - Intro to graphics (with ggplot2)

LESSON 14: Box plots questions

Data Mining: Exploring Data

k-nearest Neighbors + Model Selection

Using the missforest Package

Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

Orange3 Educational Add-on Documentation

The Basics of Plotting in R

Lab 6 More Linear Regression

Tutorial for the R Statistical Package

plot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");

EPL451: Data Mining on the Web Lab 5

Fitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

ANN exercise session

netzen - a software tool for the analysis and visualization of network data about

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Advanced Econometric Methods EMET3011/8014

Chapter 1, TUFTE STYLE GRIDDING FOR READABILITY. Chapter 5, SLICE (CROSS-SECTIONAL VIEWS)

RAJESH KEDIA 2014CSZ8383

An introduction to ggplot: An implementation of the grammar of graphics in R

Transcription:

Graphics Compact R for the DANTRIP team Klaus K. Holst 2012-05-16

The R Graphics system R has a very flexible and powerful graphics system Basic plot routine: plot(x,y,...) low-level routines: lines, points, text, title, legend,... Example of high-level routines: qqnorm, hist, boxplot,... Many classes have attached plot-method, e.g.: plot(density(iris$sepal.length))

Graphics methods plot(density(iris$sepal.length))

Iris data For illustration purposes we are using the Anderson s Iris flower data set consisting of measurements of sepal length and width and petal length and with of 50 flowers of specie iris setosa, versicolor and virginica.

Iris data pairs(iris, col=as.numeric(iris$species))

Graphics methods (m1 <- methods("plot")) [1] plot.holtwinters* plot.tukeyhsd plot.acf* [4] plot.correspondence* plot.data.frame* plot.decompos [7] plot.default plot.dendrogram* plot.density [10] plot.ecdf plot.factor* plot.formula* [13] plot.function plot.hclust* plot.histogra [16] plot.isoreg* plot.lda* plot.lm [19] plot.mca* plot.medpolish* plot.mlm [22] plot.ppr* plot.prcomp* plot.princomp [25] plot.profile* plot.profile.nls* plot.ridgelm* [28] plot.shingle* plot.spec plot.stepfun [31] plot.stl* plot.table* plot.trellis* [34] plot.ts plot.tskernel* Non-visible functions are asterisked

The R Graphics system Traditional system Lattice Grid (library(grid), see R Graphics, by Paul Murrell) http://www.stat.auckland.ac.nz/ paul/rg2e/ ggplot2 (Grammar of Graphics) Limited support for interactive graphics

Lattice Library of high-level plot functions including scattter-plots, density plots, qq-plots, level plots,... Particularly well suited to visualize longitudinal and grouped data. k <- 10 n <- 20 id <- y <- x <- c() set.seed(1) for (i in seq(n)) { x0 <- seq(k)+runif(k,0,0.5) u <- rnorm(1,sd=1.5) x <- c(x,x0) id <- c(id,rep(i,k)) y <- c(y,rnorm(k,u+x0,sd=0.8)) } z <- as.factor(rep(c(0,1),each=k*n/2)) d <- data.frame(y=y,x=x,id=id,z=z)

Lattice print(xyplot(y ~ x, group=id, type="l", data=d, col="darkblu

Lattice print(xyplot(y ~ x z, group=id, type="l", data=d))

Lattice xyplot(y ~ x id, type="l", data=d)

ggplot2 Implementation of Grammar of Graphics library(ggplot2) p1 <- qplot(sepal.length,sepal.width,data=iris,colour=species) p1

ggplot2 p2 <- p1 + xlab(expression(eta)) + ylab(expression(phi(eta))) p2 <- p2 + opts(axis.title.x=theme_text(size=15,vjust=0.5), axis.title.y=theme_text(size=15,vjust=0.5,angle=90)) p2

ggplot2 p3 <- p2 + stat_smooth(span=0.75,se=true,colour="darkblue") p3+geom_point(shape=1,size=3)

3D graphics, surface plots x <- seq(0,3,by=.05) z <- outer(x,x,fun=function(x,y) sin(x*y)+x^2) contour(x,x,z)

3D graphics, contour plots filled.contour(x,x,z)

3D graphics, images image(z,col=heat.colors(50))

3D graphics, surface persp(x,x,z,theta=-40,phi=30,col="lightgray")

3D graphics, scatter plots library(scatterplot3d) with(iris, scatterplot3d(sepal.length, Petal.Length, Petal.Width,color=as.numeric(Species)))

3D graphics, rgl Interactive plots via rgl library(rgl) persp3d(x,x,z,col="orange") with(iris, plot3d(cbind(sepal.length, Petal.Length, Petal.Width), col=as.numeric(species), type="s",radius=0.07))

Interactive graphics A few third party libraries are available which implements some support for interacting with plots such as brushing Iplots rggobi playwith rpanel rgl, misc3d all available on CRAN

Identifying points, brushing source("color.r") plot(sepal.length ~ Petal.Length, data=iris) idp <- with(iris, Id(Petal.Length,Sepal.Length)) Text("My label")

Mathematical annotation x <- seq(0,2, by=0.001) f1 <- sqrt(x) f2 <- x^2 plot(x, f1, type= l, lty=1, col="red", ylab="f(x)") lines(x, f2, lty=2, col="blue") legend("bottomright", legend=c(expression(f[1](x)==sqrt(x)), expression(phi(x)==x^2) ), col=c("red","blue"), lty=c(1,2)) title(expression(sum(delta[i],i==1,n)==integral(f(x)*dx,a,b)

Mathematical annotation

Large data load("prt.rda") str(prt) head(prt) prt <- prt[order(prt$id),] prt$twinnum <- unlist(lapply(table(prt$id),seq)) prtwide <- reshape(prt,direction="wide", idvar=c("id","zyg","country"),timevar="twinnum") head(prt) data.frame : 29222 obs. of 6 variables: $ country: Factor w/ 4 levels "Denmark","Finland",..: 1 1 1 $ time : num 97 80.9 68 61.5 78.8... $ status : int 1 1 1 1 1 1 1 1 1 2... $ zyg : Factor w/ 2 levels "DZ","MZ": 1 1 1 1 1 1 2 2 1 $ id : int 1 1 3 3 5 5 9 9 12 12... $ cancer : num 0 0 0 0 0 0 0 0 0 1... country time status zyg id cancer

Large data plot(time.1~time.2,prtwide)

Large data plot(time.1~time.2,prtwide,pch=16,cex=0.5)

Large data library(hexbin) hp <- with(prtwide, hexbin(time.1,time.2)) plot(hp)

Large data plot(hp, style="centroids")

Large data cols <- c(rgb(0.5,0,0,0.1),rgb(0,0,0.5,0.1)) plot(time.1~time.2,prtwide,pch=16,cex=1, col=cols[as.numeric(prtwide$zyg)])

Colors and transparency The Col function from color.r can add transparency to a named color Col("blue",0.2) To select a color with the mouse you can call color(1)

Colors and transparency x <- seq(0,0.75,by=0.01) y1 <- log(x+1) y2 <- x^2 xx <- c(x,rev(x)) yy <- c(y1,rev(y2)) plot(xx,yy, type="n") polygon(xx,yy, col=col("blue",0.2)) polygon(xx,rev(yy)+0.1, col=col("red",0.2))

Colors and transparency

Practicals: transparent confidence bands 1 Fit a linear regression model to the iris data with Sepal.Length as the outcome and Sepal.Width as the covariate. Include the interaction with Species and assess the statistical significance of the interaction using a Likelihood Ratio Test or Wald test. 2 Make a scatter-plot of the two continuous variables (colored by Species) and add the estimated regression lines with transparent 95% confidence limits (in matching colors).