SAS Graph: Introduction to the World of Boxplots Brian Spruell, Constella Group LLC, Durham, NC

Similar documents
Coders' Corner. Paper ABSTRACT GLOBAL STATEMENTS INTRODUCTION

Tips to Customize SAS/GRAPH... for Reluctant Beginners et al. Claudine Lougee, Dualenic, LLC, Glen Allen, VA

It s Not All Relative: SAS/Graph Annotate Coordinate Systems

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

Effective Forecast Visualization With SAS/GRAPH Samuel T. Croker, Lexington, SC

INTRODUCTION TO THE SAS ANNOTATE FACILITY

The Plot Thickens from PLOT to GPLOT

DAY 52 BOX-AND-WHISKER

Innovative Graph for Comparing Central Tendencies and Spread at a Glance

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

IMPROVING A GRAPH USING PROC GPLOT AND THE GOPTIONS STATEMENT

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Modifying Graphics in SAS

Graphical Techniques for Displaying Multivariate Data

A Juxtaposition of Tables and Graphs Using SAS /GRAPH Procedures

15 Wyner Statistics Fall 2013

THE IMPACT OF DATA VISUALIZATION IN A STUDY OF CHRONIC DISEASE

Want Quick Results? An Introduction to SAS/GRAPH Software. Arthur L. Carpenter California Occidental Consultants

Arthur L. Carpenter California Occidental Consultants

ODS LAYOUT is Like an Onion

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Clip Extreme Values for a More Readable Box Plot Mary Rose Sibayan, PPD, Manila, Philippines Thea Arianna Valerio, PPD, Manila, Philippines

ABC s of Graphs in Version 8 Caroline Bahler, Meridian Software, Inc.

Taming the Box Plot. Sanjiv Ramalingam, Octagon Research Solutions, Inc., Wayne, PA

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

Chapter 3 - Displaying and Summarizing Quantitative Data

A Generalized Procedure to Create SAS /Graph Error Bar Plots

CHAPTER 2: SAMPLING AND DATA

How individual data points are positioned within a data set.

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA

The GANNO Procedure. Overview CHAPTER 12

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

Making Presentations More Fun with DATA Step Graphics Interface (DSGI) Hui-Ping Chen, Eli Lilly and Company, Indianapolis, Indiana

Understanding and Comparing Distributions. Chapter 4

Box and Whisker Plot Review A Five Number Summary. October 16, Box and Whisker Lesson.notebook. Oct 14 5:21 PM. Oct 14 5:21 PM.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Chapter 1 Introduction. Chapter Contents

SUGI 29 Posters. Paper A Group Scatter Plot with Clustering Xiaoli Hu, Wyeth Consumer Healthcare., Madison, NJ

MAT 155. Z score. August 31, S3.4o3 Measures of Relative Standing and Boxplots

NAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT

SparkLines Using SAS and JMP

Tips for Producing Customized Graphs with SAS/GRAPH Software. Perry Watts, Fox Chase Cancer Center, Philadelphia, PA

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

PharmaSUG 2012 Paper CC13

1.3 Graphical Summaries of Data

Chapter 1. Looking at Data-Distribution

A Plot & a Table per Page Times Hundreds in a Single PDF file

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

Measures of Position

No. of blue jelly beans No. of bags

SAS/GRAPH : Using the Annotate Facility

Using Annotate Datasets to Enhance Charts of Data with Confidence Intervals: Data-Driven Graphical Presentation

Understanding Statistical Questions

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

ABSTRACT. The SAS/Graph Scatterplot Object. Introduction

Presentation Quality Graphics with SAS/GRAPH

Controlling Titles. Purpose: This chapter demonstrates how to control various characteristics of the titles in your graphs.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

The GTESTIT Procedure

Boxplots. Lecture 17 Section Robb T. Koether. Hampden-Sydney College. Wed, Feb 10, 2010

Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

AND NUMERICAL SUMMARIES. Chapter 2

CREATING THE DISTRIBUTION ANALYSIS

Descriptive Statistics: Box Plot

STA 570 Spring Lecture 5 Tuesday, Feb 1

MATH NATION SECTION 9 H.M.H. RESOURCES

CHAPTER 2 DESCRIPTIVE STATISTICS

Box Plots. OpenStax College

Paper Abstract. Introduction. SAS Version 7/8 Web Tools. Using ODS to Create HTML Formatted Output. Background

A Dynamic Imagemap Generator Carol Martell, Highway Safety Research Center, Chapel Hill, NC

Numerical Summaries of Data Section 14.3

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Averages and Variation

Chapter 6: DESCRIPTIVE STATISTICS

3.3 The Five-Number Summary Boxplots

USING SAS PROC GREPLAY WITH ANNOTATE DATA SETS FOR EFFECTIVE MULTI-PANEL GRAPHICS Walter T. Morgan, R. J. Reynolds Tobacco Company ABSTRACT

Picturing Statistics Diana Suhr, University of Northern Colorado

Section 9: One Variable Statistics

Measures of Central Tendency:

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC

When comparing in different sets of, the deviations should be compared only if the two sets of data use the

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Middle Years Data Analysis Display Methods

Measures of Position. 1. Determine which student did better

CHAPTER 3: Data Description

WHOLE NUMBER AND DECIMAL OPERATIONS

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

Mean,Median, Mode Teacher Twins 2015

Men s Basketball Student Ticket Distribution

2.1: Frequency Distributions and Their Graphs

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Using SAS/GRAPH Software to Create Graphs on the Web Himesh Patel, SAS Institute Inc., Cary, NC Revised by David Caira, SAS Institute Inc.

Transcription:

DP06 SAS Graph: Introduction to the orld of Boxplots Brian Spruell, Constella Group C, Durham, NC ABSTRACT Boxplots provide a graphical representation of a data s distribution. Every elementary statistical course includes an introduction to the construction and use of boxplots. Although boxplots prove to be quite useful they tend to be somewhat tedious to create. Thankfully, SAS graph comes equipped with the ability to generate boxplots. This paper will serve as an introduction to SAS graph boxplots. It will show several techniques the author has learned to manipulate SAS into generating a boxplot to conform to user specifications. Some topics covered in this paper include displaying several boxplots on one graph, median connectors, and horizontal boxplots. The paper will also demonstrate modifying a boxplot with annotate. INTRODUCTION You can graphically display summary data with boxplots. The most common boxplots show the data s median, first quartile, third quartile, minimum and maximum data points. Boxplots provide a pretty good picture of a data s variation. You can use a boxplot to help detect whether data is skewed or flag a potential outlier. Boxplots do tend to be somewhat tedious to produce by hand. Thankfully SAS graph comes equipped with the ability to generate boxplots. I will start with simple examples and eventually progress to more complex ones. STANDARD BOXPOT The following table displays all the points scored by the 2006 North Carolina State olfpack basketball team. The team played a total of thirty four games. The table only lists opponents and the amount of points NCSU scored and the outcome of the game from North Carolina State s perspective: Opponent NCSU points Outcome Univ. of Guelph (Exh.) Mount Olive (Exh.) 75 97 Stetson (Hisp. College Fund Class.) 91 Citadel (Hisp. College Fund Class.) 91 Delaware(Hisp. College Fund Class.) 73 VMI Notre Dame (ooden Classic) Iowa (ACC/Big 10 Challenge) App. State (Reynolds Coliseum) UNC-Asheville Miami Alabama New Hampshire George ashington UNC-Greensboro North Carolina Boston College Georgia Tech Duke ake Forest Seton Hall Clemson Virginia Maryland Miami Georgia Tech Florida State Virginia Tech North Carolina Boston College ake Forest ake Forest (ACC Tournament) 75 61 42 92 86 81 68 81 79 83 69 78 87 68 92 65 94 66 62 87 68 86 70 95 74 63 71 1

Cal (NCAA Tournament) Texas (NCAA Tournament) 58 54 I developed the following piece of sas code to display the basketball data with a boxplot GOPTIONS RESET = all GSFNAME = gsasfile GSFMODE = replace ROTATE=landscape FTEXT = swiss CBACK = white TARGETDEVICE= HPJG2 dev=jpeg xmax=11in ymax=8.5in xpixels=3300 ypixels=2550; filename gsasfile "D:\My Documents\SEGUI\OCT2006\SESUGI1.jpeg"; symbol1 interpol=boxt00 color=black width=3 bwidth=2; title h=1.7 'Total Breakdown of Points vs Outcome'; axis1 label=(h=1.2 ' ') minor=none offset=(25 pct); proc gplot data=ncsu06; plot score*outcome/ haxis=axis1; run; quit; The symbol1 interpol=boxt00 tells SAS I want a boxplot to be produced by my proc gplot statement. I added some axis options to make the graph more presentable. BOXPOT OPTIONS Several SAS graph options exist which you can use to modify the appearance of your boxplot. The most noticeable modification involves changing the appearance of the line connectors, also known as whiskers. The default setting, interpol=box, will only draw a connector line from the box to 1.5 times the interquartile range (IQR). The interquartile range is the difference between the 75 th and 25 th percentiles. Data points outside 1.5 times the IQR are classified as potential outliers. Modifying the line connectors involves making changes to the Interpol statement by adding a value to the end of BOX. SAS boxplot values range from 00 to 25. In the example I presented earlier I specified the option OO. I added it to the end of interpol=boxt. This option tells SAS to draw the connector lines from the box (25 th and 75 th percentiles) to the minimum and maximum values respectively. Using 25 as an option prevents the drawing of the line 2

connectors. In those instances only the box is displayed. 05 produces a boxplot with the connectors going from 5 th percentiles lowest to 95 th percentile highest. Modifying the appearance of the line connectors may cause several data points to fall outside your range. Data points that fall outside your line connector range are marked by a plot symbol. No data falls outside the 00 option since the line connectors go from the quartiles to the minimum and maximum values within a dataset. The plot symbol is designated within the symbol statement. In the examples to follow it is given a value of circle. You can control the color and size of plot symbols by specifying both an h and cv after value=. The graphs below (along with the code which produced them with differences bolded) demonstrate modifications to a boxplot s line connectors: Options 10 : GOPTIONS RESET = all GSFNAME = gsasfile GSFMODE = replace ROTATE=landscape FTEXT = swiss CBACK = white TARGETDEVICE= HPJG2 dev=jpeg xmax=11in ymax=8.5in xpixels=3300 ypixels=2550; symbol1 interpol=boxt10 color=black width=3 bwidth=2 value=circle cv=red height=1; title h=1.7 'Total Breakdown of Points vs Outcome'; axis1 label=(h=1.2 ' ') minor=none offset=(25 pct); proc gplot data=ncsu06; plot score*outcome/ haxis=axis1; run;quit; 3

Options 25 : GOPTIONS RESET = all GSFNAME = gsasfile GSFMODE = replace ROTATE=landscape FTEXT = swiss CBACK = white TARGETDEVICE= HPJG2 dev=jpeg xmax=11in ymax=8.5in xpixels=3300 ypixels=2550; symbol1 interpol=boxt25 color=black width=3 bwidth=2 value=circle cv=red height=1; title h=1.7 'Total Breakdown of Points vs Outcome'; axis1 label=(h=1.2 ' ') minor=none offset=(25 pct); proc gplot data=ncsu06; plot score*outcome/ haxis=axis1; run;quit; You can change the color of the lines outlining the boxplots, as well as the color which fills them. Option F colors the boxplot with the color specified in CV=. The outline of the boxplot is modified by changing the value to CO=. I used the T option in all examples presented so far. This option tells SAS to draw tops and bottoms to the line connectors. You can also add a line which connects the medians between neighboring boxplots through the J option. Below I am displaying a graph, along with the code which generates it, demonstrating several of the options just discussed: 4

Red boxplot with blue outlines ( F option): GOPTIONS RESET = all GSFNAME = gsasfile GSFMODE = replace ROTATE=landscape FTEXT = swiss CBACK = white TARGETDEVICE = HPJG2 dev=jpeg xmax=11in ymax=8.5in xpixels=3300 ypixels=2550; symbol1 interpol=boxft00 color=blue width=3 bwidth=2 value=circle cv=red height=1; title h=1.7 'Total Breakdown of Points vs Outcome'; axis1 label=(h=1.2 ' ') minor=none offset=(25 pct); proc gplot data=ncsu06; plot score*outcome/ haxis=axis1; run;quit; 5

In this next set of code I will make use of the J option which will produce a connector line from the median of one boxplot to another: J Option: GOPTIONS RESET = all GSFNAME = gsasfile GSFMODE = replace ROTATE=landscape FTEXT = swiss CBACK = white TARGETDEVICE = HPJG2 dev=jpeg xmax=11in ymax=8.5in xpixels=3300 ypixels=2550; symbol1 interpol=boxjft00 color=blue width=3 bwidth=2 value=circle cv=red height=1; title h=1.7 'Total Breakdown of Points vs Outcome'; axis1 label=(h=1.2 ' ') minor=none offset=(25 pct); proc gplot data=ncsu06; plot score*outcome/ haxis=axis1; run;quit; 6

PROFICIENCY TESTING BOXPOTS In 2004 I assisted several SAS programmers in developing proficiency testing code for laboratories involved in gene expression analysis. The proficiency testing report had a page which displayed lab variation with boxplots. I was put in charge of producing the sas code which generated these boxplots. The proficiency testing involved four rounds. The example I will use throughout the remainder of the paper contains data from the first two rounds for a particular lab. The first round of testing took place in April 2004 and contained data for thirteen laboratories. The second round of testing was undertaken in September 2004 and had data for sixteen labs. Several laboratories were added between the first two rounds. The lab we are going to look at has data for both round one and round two. The boxplots I generated for the proficiency testing report displayed data for a specific lab against other labs within a round as well as the distribution of data among labs between rounds. This involved generating boxplots which would be displayed side by side on the same axis. This first example simply shows all labs average signal present values within a round: All abs Average Signal Present within a Round: 7

I used the following code to generate the above plot: GOPTIONS RESET = all GSFNAME = gsasfile GSFMODE = replace ROTATE=landscape FTEXT = swiss CBACK = white TARGETDEVICE = HPJG2 dev=jpeg xmax=11in ymax=8.5in xpixels=3300 ypixels=2550; symbol1 interpol=boxt00 c=blue l=2 w=2.5; title h=1.5 'Average Signal Present'; axis1 order=(0 to 5 by 1) minor=none label=(a=360 h=2.5pct font='siss' "Testing Round") ; axis2 order=(850 to 1250 by 50) minor=none label=(a=90 h=2.5pct font='siss' "Observed Avg Signal Present"); proc gplot data=inputds ; plot value*time = group /haxis=axis1 vaxis=axis2 nolegend annotate=anno_all; run; quit; By using interpol option BOXT00 I forced the boxplot whiskers to stretch from the minimum average signal value to the maximum signal value. The T option causes the tops to be drawn on the whiskers. I then added a connector which connected the medians of the two boxplots: 8

The following symbol statement produced the above graph: Symbol1 interpol=boxjt00 c=blue l=2 w=2.5; I can change the color of the connector by changing the value to the c parameter. I can also adjust line type by changing the value given to the l paramenter. changes the width of the connector line. MEDIAN INE ITH MEDIAN CONNECTORS You will notice that adding a connector line will suppress the printing of the median line in both boxplots. I did not want the line to be suppressed. I even wanted to add the line in a different color (red) to emphasize the location of each round s median. I accomplished this goal by drawing two boxplots, one on top of the other. The first boxplot would be produced without the connector line. It would be in the color I wanted the median line to appear. On top of this first boxplot I drew a second one with a connector line. The desired result is shown below: The above graph was generated by adding two symbol statements to the gplot procedure: 1) *** BOXPOT IN RED TO DRA MEDIAN ***; symbol1 interpol=boxt00 color=red width=7 bwidth=2; 2) *** BOXPOT IN BACK / GRAY TO CONNECT A MEDIANS ***; symbol2 interpol=boxjt00 color=black /*grayaa*/ line=2 width=7 bwidth=2 ; The first symbol statement draws the boxplot in red with the median line. The second symbol statement adds the J to draw the connector between the two median lines. 9

INDIVIDUA AB NEXT TO ENTIRE ROUND Next I wanted to display individual lab data next to each boxplot. This was accomplished through the use of the following symbol statement: symbol1 interpol=hioctj value=dot h=.5 c=black l=1; The green plot displays data for an inidivudal lab. I wrote several other symbol statements to generate a line connector between the two plots. *** INE IN GREEN TO CONNECT MEDIAN FOR INDIVIDUA AB ***; symbol3 interpol=join value=none color=green line=1 width=7; *** INE IN GREEN TO CONNECT DOTS AT EACH TIME POINT FOR INDIVIDUA AB ***; symbol4 interpol=join value=dot height=1 color=green line=1 width=7 repeat=4; The interpol=hilo statement tells sas to generate a vertical line which connects y-axis values for each x-axis value. Adding C causes sas to draw marks at the close value instead of the default mean value. ike with boxplot the T will add a top and bottom to each line and the J option causes a connector line to link the two mean values between the two rounds. ANNOTATE I then decided to add the maximum value above each boxplot in blue. I determined the maximum value for each boxplot dataset and placed that value in a macro variable. Using annotate I was able to display the value above each plot. 10

data max_anno; %label(1,1245,compress("&max_val1"),bue,90,0,1,'siss',5); %label(2,1200,compress("&max_val2"),bue,90,0,1,'siss',5); run; That annotate dataset was then appended to the proc gplot statement in order for SAS to use it: proc gplot data=inputds format shift_time timefm. value ; plot value*shift_time = group /haxis=axis1 vaxis=axis2 nolegend annotate=max_anno; run; quit; The above code with the annotate dataset generates the following plot: HORIZONTA BOXPOT Next I will demonstrate how to generate horizontal boxplots. The client wanted to see the above data presented in boxplots which were horizontal in orientation. There was no option I could use to force SAS to produce the boxplot horizontally. I contacted SAS technical support and they suggested I make use of plot2 associated with SAS gplot. Using the plot2 statement I was able to generate axis labels for the other side of the graphical window. I then manipulated the orientation of my text so I could flip the graph using greplay. Before any manipulation on the axis values I wanted to show you the output I received when I used a plot2 statement within SAS: 11

Notice the extra axes on the right of the graph. hen I rotate my graph with proc greplay I will want the values displayed on the right to become my x- axis. I no longer need the y-axis labels and values associated with my first plot statement. I will modify my axis statement to suppress printing of values, labels and tick marks for my y-axis with the following axis statement: axis2 order=(850 to 1250 by 50) minor=none major=none minor=none value=none label=none; 12

The resulting graph will look something like this: I now need to modify the orientation of the labels and tick mark values for both remaining axes. Currently they look to be correctly orientated, but when I rotate my graphs they will be off. So I modify the two other axis statements to get the orientation I want: axis1 value=(a=90) order=(0 to 5 by 1) minor=none label=(a=90 h=2.5pct font='siss' "Testing Round") ; axis3 value=(a=90) order=(850 to 1250 by 50) minor=none label=(a=90 h=2.5pct font='siss' "Observed Avg Signal Present"); 13

One last modification remains. Currently the title is in its proper location, however, once we rotate the graph it will go from being at the top of the graph to being off to its right. In order to correct this I will have to suppress the printing of the title and add it as a label for the current y- axis. Instead of giving axis2 s label a value of none I will give it the following value: axis2 order=(850 to 1250 by 50) minor=none major=none minor=none value=none label=(a=90 h=3pct font='siss' "Average Signal Present"); 14

The resulting pre-rotated graph will look something like this: 15

I am almost at the point where I can rotate the graph to make the boxplots appear horizontally. I would like the axis on the right to be longer than it currently is. I can make minor modifications to the axis3 statement to lengthen the axis: 16

Rotating the above graph with proc greplay gives the following graphical ouput, which was our desires result: CONCUSION Boxplots prove to be an excellent visualization of variation within data. SAS graph makes producing boxplots easy and simple. There exists much flexibility to ensure the data is presented in any desired format with few modifications. Hopefully the techniques presented within this paper will prove useful. 17

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Brian Spruell Constella Group, C 2605 Meridian Parkway, Suite 200 Durham, NC 27713 (919) 313-7673 bspruell@constellagroup.com www.constellagroup.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 18