Mixed models in R using the lme4 package Part 2: Lattice graphics

Similar documents
Outline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.

Introduction to Lattice Graphics. Richard Pugh 4 th December 2012

Outline. Mixed models in R using the lme4 package Part 1: Introduction to R. Following the operations on the slides

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Stat 849: Plotting responses and covariates

Scientific Graphing in Excel 2013

Stat 849: Plotting responses and covariates

1.3 Graphical Summaries of Data

Roger D. Peng, Associate Professor of Biostatistics Johns Hopkins Bloomberg School of Public Health

Scientific Graphing in Excel 2007

DSCI 325: Handout 18 Introduction to Graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R

4 Displaying Multiway Tables

AN INTRODUCTION TO LATTICE GRAPHICS IN R

Introduction to R. A Statistical Computing Environment. J.C. Wang. Department of Statistics Western Michigan University

Middle Years Data Analysis Display Methods

CREATING POWERFUL AND EFFECTIVE GRAPHICAL DISPLAYS: AN INTRODUCTION TO LATTICE GRAPHICS IN R

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Extending ODS Output by Incorporating

Fathom Dynamic Data TM Version 2 Specifications

Table of Contents (As covered from textbook)

Chapter 3 - Displaying and Summarizing Quantitative Data

How to Make Graphs in EXCEL

Tutorial: SeqAPass Boxplot Generator

Boxplot

Package xpose4specific

Chapter 2 Describing, Exploring, and Comparing Data

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

8 Organizing and Displaying

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Statistics 251: Statistical Methods

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

In Minitab interface has two windows named Session window and Worksheet window.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Package beanplot. R topics documented: February 19, Type Package

Exploratory Data Analysis - Part 2 September 8, 2005

DAY 52 BOX-AND-WHISKER

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

wireframe: perspective plot of a surface evaluated on a regular grid cloud: perspective plot of a cloud of points (3D scatterplot)

Bar Graphs and Dot Plots

Package plotluck. November 13, 2016

An Introduction to Minitab Statistics 529

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Data Management Project Using Software to Carry Out Data Analysis Tasks

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Regression III: Advanced Methods

Homework 1 Excel Basics

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center

From Getting Started with the Graph Template Language in SAS. Full book available for purchase here.

Frequency Distributions

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition

8. MINITAB COMMANDS WEEK-BY-WEEK

1 Introduction to Using Excel Spreadsheets

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Exploring and Understanding Data Using R.

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Chapter 2: Looking at Multivariate Data

Rockefeller College MPA Excel Workshop: Clinton Impeachment Data Example

Statistical transformations

3 Graphical Displays of Data

SAS Visual Analytics 8.2: Getting Started with Reports

UNIT 15 GRAPHICAL PRESENTATION OF DATA-I

Introduction to R: Day 2 September 20, 2017

Unit 1. Word Definition Picture. The number s distance from 0 on the number line. The symbol that means a number is greater than the second number.

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

Excel Manual X Axis Labels Below Chart 2010

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Package RcmdrPlugin.HH

Frequency Distributions and Graphs

Excel Spreadsheets and Graphs

ggplot2: elegant graphics for data analysis

Tips and Guidance for Analyzing Data. Executive Summary

Multistat2 1

Package r2d2. February 20, 2015

Revision Topic 11: Straight Line Graphs

CAMBRIDGE TECHNOLOGY IN MATHS Year 11 TI-89 User guide

NCSS Statistical Software

Chapter 6: DESCRIPTIVE STATISTICS

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Chapter 8 An Introduction to the Lattice Package

Reference and Style Guide for Microsoft Excel

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

STATGRAPHICS Sigma Express for Microsoft Excel. Getting Started

Technology Assignment: Scatter Plots

Exploratory Data Analysis September 3, 2008

Math 227 EXCEL / MEGASTAT Guide

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

BioFuel Graphing instructions using Microsoft Excel 2003 (Microsoft Excel 2007 instructions start on page mei-7)

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

GLOSSARY OF TERMS. Commutative property. Numbers can be added or multiplied in either order. For example, = ; 3 x 8 = 8 x 3.

Plotting Graphs. Error Bars

Can You Make A Box And Whisker Plot In Excel 2007

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Grade 9 Math Terminology

Muskogee Public Schools Curriculum Map, Math, Grade 8

Transcription:

Mixed models in R using the lme4 package Part 2: Lattice graphics Douglas Bates University of Wisconsin - Madison and R Development Core Team <Douglas.Bates@R-project.org> University of Lausanne July 1, 2009

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Exploring and presenting data When possible, use graphical presentations of data. Time spend creating informative graphical displays is well invested. Ron Snee, a friend who spent his career as a statistical consultant for DuPont, once said, Whenever I am writing a report, the most important conclusion I want to communicate is always presented as a graphic and shown early in the report. On the other hand, if there is a conclusion I feel obligated to include but would prefer people not notice, I include it as a table. One of the strengths of R is its graphics capabilities. There are several styles of graphics in R. The style in Deepayan Sarkar s lattice package is well-suited to the type of data we will be discussing. Deepayan s book, Lattice: Multivariate Data Visualization with R (Springer, 2008) provides in-depth documentation and explanations of lattice graphics.

The formula/data method of specifying graphics The first two arguments to lattice graphics functions are usually formula and data. This specification is used in model-fitting functions (lm, aov, lmer,...) and in other functions such as xtabs. The formula incorporates a tilde, ( ), character. A one-sided formula specifies the value on the x-axis. A two-sided formula specifies the x and y axes. The second argument, data, is usually the name of a data frame. Many optional arguments are available. Ones that we will use frequently allow for labeling axes (xlab, ylab), and controlling the type of information, type. The lattice package is not attached by default. You must enter library(lattice) before you can use lattice functions.

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

A simple scatterplot in lattice > xyplot(optden ~ carb, Formaldehyde) 0.8 0.6 optden 0.4 0.2 0.2 0.4 0.6 0.8 carb

Scatterplots in lattice A scatter plot is the most versatile plot in applied statistics. It is simply a plot of a numeric response, y, versus a numeric covariate, x. The lattice function xyplot produces scatter plots. I typically specify type = c("g","p") requesting a background grid in addition to the plotted points. The type argument takes a selection from p points g background grid l lines b both points and lines r reference (or regression ) straight line smooth scatter-plot smoother lines In evaluating a scatterplot the aspect ratio (ratio of vertical size to horizontal size) can be important. In particular, differences in slopes are most apparent near 45 o.

General principles of lattice graphics The formula is of the form x or y x or y x f where x is the variable on the x axis (usually continuous), y is the variable on the y axis and f is a factor that determines the panels. Titles can be added with xlab, ylab, main and sub. Titles can be character strings or, more generally, expressions that allow for special characters, subscripts, superscripts, etc. See help(plotmath) for details. The groups argument, if used, specifies different point styles and different line styles for each level of the group. If lines are calculated, each group has separate lines. If groups is used, we usually also use auto.key to add a key relating the line or point styles to the groups. The layout specifies the number of columns and rows of panels.

An enhanced scatterplot of the Formaldehyde data 0.8 0.6 Optical density 0.4 0.2 0.2 0.4 0.6 0.8 Amount of carbohydrate (ml)

Saving plots I recommend using the facilities in the R application to save plots and transcripts. To save a plot, ensure that the graphics window is active and use the menu item File Save To Clipboard Windows Metafile. (On a Mac, save as PDF.) Then switch to a word processor and paste the figure. Adjust the aspect ratio of the graphics window to suit the pasted version of the plot before you copy the graphic. Those who want more control (and less cutting and pasting) could consider the Sweave system or the odfweave package.

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Histograms and density plots A histogram is a type of bar plot created from dividing numeric data into adjacent bins (typically having equal width). The purpose of a histogram is to show the distribution or density of the observations. It is almost never a good way of doing this. A densityplot is a better way of showing the density or, even better, comparing the densities of observations associated with different groups. Also, densityplots for different groups can be overlaid. If you have only a few observations you may want to use a comparative box-and-whisker plot (bwplot) or a comparative dotplot instead. Density plots based on a small number of observations tend to be rather lumpy. If the data are bounded, perhaps because the data must be positive, a density plot can blur the boundary. However, this may indicate that the data are more meaningfully represented on another scale.

Histogram of the InsectSprays data > histogram(~count, InsectSprays) 30 25 20 Percent of Total 15 10 5 0 0 5 10 15 20 25 count

Density plot of the InsectSprays data > densityplot(~count, InsectSprays) 6 5 4 Density 3 2 1 0 10 0 10 20 30 count

Density plot of the square root of the count > densityplot(~sqrt(count), InsectSprays, xlab = "Square root of 0.25 0.20 Density 0.15 0.10 5 0 0 2 4 6 Square root of count

Density plot of the square root with fancy label > densityplot(~sqrt(count), InsectSprays, xlab = expression(sqrt 0.25 0.20 Density 0.15 0.10 5 0 0 2 4 6 count

Comparative density plot of square root > densityplot(~sqrt(count), InsectSprays, groups = spray, + auto.key = list(columns = 6)) A B C D E F Density 0 2 4 6 count

Comparative density plot, separate panels > densityplot(~sqrt(count) spray, InsectSprays, layout = c(1, + 6)) Density F E D C B A 0 2 4 6 count

Comparative density plot, separate panels, strip at left > densityplot(~sqrt(count) spray, InsectSprays, layout = c(1, + 6), strip = FALSE, strip.left = TRUE) F E Density D C B A 0 2 4 6 count

Comparative density plot, separate panels, reordered > densityplot(~sqrt(count) reorder(spray, count), InsectSprays F B Density A D E C 0 2 4 6 count

Outline Presenting data Scatter plots Histograms and density plots Box-and-whisker plots and dotplots

Box-and-whisker plot and dotplot A box-and-whisker plot gives a rough summary (based on the five-number summary - min, 1st quartile, median, 3rd quartile, max) of the distribution. A dotplot consists of points on a number line. For a large number of data values we jitter the y values to avoid overplotting. By default a density plot also shows a dotplot. Box-and-whisker plots or dotplots are often used for comparison of groups. It is widely believed that a comparative boxplot should have the response on the vertical axis. Most of the time it is more effective to put the response on the horizontal axis. If the default ordering of the groups is arbitrary reorder them according to the level of the response (mean response, by default). Reordering makes it easier to see if the variability increases with the level of the response.

Vertical comparative box-and-whisker plot > bwplot(sqrt(count) ~ spray, InsectSprays) 5 4 3 count 2 1 0 A B C D E F

Horizontal comparative box-and-whisker plot > bwplot(spray ~ sqrt(count), InsectSprays) F E D C B A 0 1 2 3 4 5 count

Reordered horizontal comparative box-and-whisker plot > bwplot(reorder(spray, count) ~ sqrt(count), InsectSprays) F B A D E C 0 1 2 3 4 5 count

Compressed horizontal comparative box-and-whisker plot > bwplot(reorder(spray, count) ~ sqrt(count), InsectSprays, + aspect = 0.2) F B A D E C 0 1 2 3 4 5 count You can extract much more information from this, compressed plot than from the original vertical box-and-whisker plot. In Edward Tufte s phrase, this plot has a greater information/ink ratio.

Comparative dotplots When the number of observations per group is small, a box-and-whisker plot can obscure the structure of the data, rather than illuminating it. By default, the density plot provides a dotplot on the rug. A comparative dotplot displays all of the data. The principles described for a comparative boxplot (factor on vertical axis, reorder levels if no natural order, choose an appropriate scale) apply here too. By default, the character in the dotplot is filled. I often use optional arguments pch = 21 and jitter.y = TRUE to avoid overplotting. Setting type = c("p","a") provides a line joining the group averages. Interaction plots can be produced as a comparative dotplot with groups

Comparative dotplot of InsectSprays > dotplot(reorder(spray, count) ~ sqrt(count), InsectSprays, + type = c("p", "a"), pch = 21, jitter.y = TRUE) F B A D E C 0 1 2 3 4 5 count

Summary In order of importance the graphic displays I consider are scatter plots, density plots, box-and-whisker plots, dot plots and histograms. Pay careful attention to layout and axis labels. Include units in the axis labels, if known. For mixed models we always have at least one unordered categorical covariate and often have a numeric response. Comparative dot plots and box-and-whisker plots will be important data presentation techniques for us. Plots of a continuous response by levels of a categorical variable work best with the category on the vertical axis. Consider reordering the levels of the category if they do not have a natural order.