Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One

Similar documents
Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

A Plot & a Table per Page Times Hundreds in a Single PDF file

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Using SAS/GRAPH Software to Create Graphs on the Web Himesh Patel, SAS Institute Inc., Cary, NC Revised by David Caira, SAS Institute Inc.

Displaying Multiple Graphs to Quickly Assess Patient Data Trends

Innovative Graph for Comparing Central Tendencies and Spread at a Glance

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA

The GANNO Procedure. Overview CHAPTER 12

Top Award and First Place Best Presentation of Data Lan Tran-La. Scios Nova, Inc. BLOOD PRESSURE AND HEART RATE vs TIME

A Juxtaposition of Tables and Graphs Using SAS /GRAPH Procedures

SUGI 29 Posters. Paper A Group Scatter Plot with Clustering Xiaoli Hu, Wyeth Consumer Healthcare., Madison, NJ

ODS and Web Enabled Device Drivers: Displaying and Controlling Large Numbers of Graphs. Arthur L. Carpenter and Richard O. Smith Data Explorations

MANAGING SAS/GRAPH DISPLAYS WITH THE GREPLAY PROCEDURE. Perry Watts IMS Health

Tips to Customize SAS/GRAPH... for Reluctant Beginners et al. Claudine Lougee, Dualenic, LLC, Glen Allen, VA

Chapter 1 Introduction. Chapter Contents

Internet, Intranets, and The Web

Paper CC01 Sort Your SAS Graphs and Create a Bookmarked PDF Document Using ODS PDF ABSTRACT INTRODUCTION

IMPROVING A GRAPH USING PROC GPLOT AND THE GOPTIONS STATEMENT

PROC CATALOG, the Wish Book SAS Procedure Louise Hadden, Abt Associates Inc., Cambridge, MA

Developing Data-Driven SAS Programs Using Proc Contents

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

Developing a Dashboard to Aid in Effective Project Management

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

Contents of SAS Programming Techniques

SAS/GRAPH : Using the Annotate Facility

Splitting Axis Text. Splitting Text in Axis Tick Mark Values

Paper SIB-096. Richard A. DeVenezia, Independent Consultant, Remsen, NY

The GSLIDE Procedure. Overview. About Text Slides CHAPTER 27

ODS LAYOUT is Like an Onion

ABC s of Graphs in Version 8 Caroline Bahler, Meridian Software, Inc.

Picturing Statistics Diana Suhr, University of Northern Colorado

CHAPTER 1 Introduction to SAS/GRAPH Software

OS/390 DASD I/O Drill Down Computer Performance Chart Using ODS SAS/GRAPH & MXG Software

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

SAS/GRAPH Introduction. Winfried Jakob, SAS Administrator Canadian Institute for Health Information

Going Under the Hood: How Does the Macro Processor Really Work?

Presentation Quality Graphics with SAS/GRAPH

ABSTRACT INTRODUCTION MACRO. Paper RF

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Compute; Your Future with Proc Report

Unlock SAS Code Automation with the Power of Macros

Graphically Enhancing the Visual Presentation and Analysis of Univariate Data Using SAS Software

INTRODUCTION TO THE SAS ANNOTATE FACILITY

USING SAS PROC GREPLAY WITH ANNOTATE DATA SETS FOR EFFECTIVE MULTI-PANEL GRAPHICS Walter T. Morgan, R. J. Reynolds Tobacco Company ABSTRACT

Usinq the VBAR and BBAR statements and the TEMPLATE Facility to Create side-by-side, Horizontal Bar Charts with Shared Vertical Axes Labels

Six Cool Things You Can Do In Display Manager Jenine Milum, Charlotte, NC Wachovia Bank

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

EXST SAS Lab Lab #6: More DATA STEP tasks

An Introduction to PROC GREPLAY

Changing Titles on Graphs With Minimal Processing

Paper Abstract. Introduction. SAS Version 7/8 Web Tools. Using ODS to Create HTML Formatted Output. Background

The GTESTIT Procedure

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

The Plot Thickens from PLOT to GPLOT

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Data Quality Review for Missing Values and Outliers

Making Presentations More Fun with DATA Step Graphics Interface (DSGI) Hui-Ping Chen, Eli Lilly and Company, Indianapolis, Indiana

Paper S Data Presentation 101: An Analyst s Perspective

Something for Nothing! Converting Plots from SAS/GRAPH to ODS Graphics

SparkLines Using SAS and JMP

A Dynamic Imagemap Generator Carol Martell, Highway Safety Research Center, Chapel Hill, NC

Interactive Graphs from the SAS System

Quick Results with the Output Delivery System

Internet/Intranet, the Web & SAS

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Data Driven Annotations: An Introduction to SAS/GRAPH s Annotate Facility

SAS Online Training: Course contents: Agenda:

TS-659: Exporting SAS/GRAPH Output to PDF Files from Release 8.2 and higher

Want Quick Results? An Introduction to SAS/GRAPH Software. Arthur L. Carpenter California Occidental Consultants

Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

SAS Training Spring 2006

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Introduction. Getting Started with the Macro Facility CHAPTER 1

Creating Graphs Using SAS ODS Graphics Designer

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Modifying Graphics in SAS

ODS/RTF Pagination Revisit

USING DATA TO SET MACRO PARAMETERS

Using SAS/GRAPH Software to Create Graphs on The Web Himesh Patel, SAS Institute Inc., Cary, NC

ABC Macro and Performance Chart with Benchmarks Annotation

Applied Regression Modeling: A Business Approach

Using Graph-N-Go With ODS to Easily Present Your Data and Web-Enable Your Graphs Curtis A. Smith, Defense Contract Audit Agency, La Mirada, CA

The GIMPORT Procedure

One SAS To Rule Them All

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

Introduction to SAS/GRAPH Statistical Graphics Procedures

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Tips and Tricks in Creating Graphs Using PROC GPLOT

WHAT ARE SASHELP VIEWS?

PharmaSUG 2015 Paper PO03

ODS The output delivery system

An Introduction to ODS for Statistical Graphics in SAS 9.1 Robert N. Rodriguez SAS Institute Inc., Cary, North Carolina, USA

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

It s Not All Relative: SAS/Graph Annotate Coordinate Systems

Applied Regression Modeling: A Business Approach

THE IMPACT OF DATA VISUALIZATION IN A STUDY OF CHRONIC DISEASE

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Transcription:

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions Paul Walker, Capital One INTRODUCTION A common task in data analysis is assessing the distribution of variables by means of univariate statistics, and graphs such as histograms, scatter plots, box plots, etc. This task may be relatively straight-forward when analyzing a small dataset with only a handful of variables. However, when the data set is large and contains hundreds or even thousands of variables, the task can be daunting. The method described in this paper was motivated by the need to efficiently examine the distributions of a large number of variables. The method is characterized by the following: It is an improvement over proc univariate insofar as the output is more targeted to what you want to see. It provides an automated way to create graphs for a large number of variables. It provides a way to include summary statistics on your graphs. It integrates multiple graphs into one.gif file. SOLUTION TO THE PROBLEM A method was developed to automatically create the style graph shown in Figure 1 for each variable specified by the analyst. For each variable, the graph is contained in a.gif file that is named usually according to the name of the variable being graphed. The analyst specifies the variables in the form of a single-spacedelimited list. Within a Windows operating system environment, one can quickly wade through large numbers of graphics files using the Thumbnails or Filmstrip viewing options in Windows Explorer. I find this much more convenient than putting the graphs into a word or PDF document, which makes searching for an individual variable difficult. Moreover, having individual.gif files is useful if you want to organize the variables into different categories. In predictive model building, you might make folders for variables you want to discard, and variables you want to include in the model. Figure 1: Histogram / Scatter Plot Pair There are essentially four steps in the macro that creates the graph shown in Figure 1: 1. Parsing the user-specified variable list into macro variables indexed by an integer. 2. Getting summary statistics from other procs into macro variables. 3. Creating multiple graphs, including putting summary statistics in the footnote section of each graph. 4. Combine your graphs into one.gif file using proc greplay. I will describe the general idea of the code in each of these four steps. The reader should be familiar with SAS/GRAPH and the MACRO language. STEP 1: PARSE THE VARIABLE LIST The analyst must specify the list of variables that he/she wants to examine. For example, suppose the analyst has the following variable list: age height weight fastgluc postgluc 1

We need to assign these variable names to macro variables indexed by an integer. Thus, we would want the following macro variables and corresponding values: Macro Variables Values Var1 age Var2 height Var3 weight Var4 fastgluc Var5 postgluc nvars 5 The code in Step 1 of the Appendix performs this parsing. The variable list may contain any number of variable names. Now that we have the variable names in macro variables indexed by an integer, we can iterate through each variable name via a macro do loop, e.g. %do i=1 %to &nvars. When you want to refer to the i th variable, you would use the double ampersand technique &&var&i. For example, when i=3, &&var&i resolves to weight. STEP 2: GET SUMMARY STATS The purpose of the second step is to obtain those key summary statistics that you really want to see, and put them into macro variables. Once they are stored in macro variables, you can write them into the footnotes of your graphs produced in step 3. I will illustrate this trick using proc means. First, I use proc means to create a temporary output dataset named means. proc means data=&lib..&ds; var &&var&i; output out=work.means median(&&var&i)=median; This temporary dataset has one observation. It will contain the automatic variables _TYPE_ and _FREQ_, and the user-created variable median. Thus, to put the value of the median into a macro variable with the same name, we use the call symput technique. data _null_; set means; call symput( median,trim(left(median))); Hopefully the reader is able to distinguish the three uses of the word median in the above blocks of code. One use is as a function in proc means, the other is as a regular variable name, and the other is as a macro variable name. In short, the preceding code puts the value of the median of the i th variable into the macro variable median, which can then be called via &median. In practice, you would usually want to be more descriptive than just the median. STEP 3: CREATE GRAPHS The reader should already be familiar with the syntax of proc gchart and proc gplot. What I will illustrate is how to write summary statistics into a graph s footnote. To do so, you need to use a footnote statement with the call to the macro variable in double quotation marks. For example, a histogram could be produced as follows: proc gchart data=&lib..&ds; vbar &&var&i; footnote the median is &median ; You can create multiple graphs in this fashion (for example, histograms and scatter plots for the i th variable), and then in step 4 we put them together using proc greplay. STEP 4: USE PROC GREPLAY Before getting to the actual greplay proc, there are some options and pieces of code you need to specify. Since we want to output each set of graphs in a single.gif file, you will need to specify the filename as well as the device driver. The following code achieves this: goptions device=gif gsfname=nesug; filename nesug C:\Graphs\&&var&i gif ; Here the file is named with the variable s name. Alternatively, you could precede the variable name with some descriptive phrase, such as graphlist_. The triple period in the filename statement is necessary to resolve the double ampersand reference &&var&i. Recall that we only want one.gif file created for every variable. To achieve this, you must do two things. First, you must store each individual graph (histogram, scatterplot, etc.) that you produce for a given variable into a temporary sas catalog. Second, you must ensure that these individual graphs are not written to.gif files, i.e. that we do not create any extraneous.gif files. The following code achieves both of these: goptions nodisplay; proc gchart data=&lib..&ds gout=work.gseg; vbar &&var&i / name= histo ; Essentially, this code turns off the display, which prevents.gif output files from being produced. Later on, immediately before we use proc greplay, we will turn the display back on. The options gout= and name= create a catalog entry histo in the work.gseg temporary catalog. Using SAS Explorer, you can view the contents of the catalog (see Figure 2). I have also created the entry Scatter into the catalog. 2

you specify the catalog entries you want to put together. Since the template V2 has slots for two graphs, you must specify which goes in position 1 and 2. Figure 2: Temporary Catalog with Graphics Entries We will now use proc greplay to put the two catalog entries together in a.gif file. You must first turn the display back on, so that the.gif file will be produced. goptions display; You must also specify the template catalog which contains the template you want to use, as well as the particular template from it. It is possible to create custom templates, but I usually use one of the default templates provided in the sashelp.templt catalog, which can be browsed via SAS Explorer, as in Figure 3. The code I have described above works fine if you only graph one variable. For multiple variables, you have to add one extra block of code, and here s why. If you try to create a catalog entry named histo when an entry with that name already exists, then sas will automatically name your catalog entry histo1 and keep increasing the trailing integer for each additional entry you try to create named histo. Thus, the macro must include code to delete the catalog entries created by the previous variable in the iteration. The code should be placed before any of the graphics statements (proc gchart, proc gplot, etc.). The following code would delete catalog entries named histo and scatter from the temporary work.gseg catalog. proc catalog catalog=work.gseg; save histo.grseg scatter.grseg; delete histo.grseg scatter.grseg; The save statement makes sure that only the histo and scatter entries exist in the catalog, thus eliminating any extraneous catalog entries. Then the delete statement deletes them. You may see an error statement in the log on the first iteration, because the catalog entries histo and scatter do not yet exist. PUTTING IT ALL TOGETHER In the Appendix I have provided the complete code necessary to create the graph in Figure 1. This graph is based on the sasuser.diabetes dataset. The macro call used to create this (and other graphs not shown) was: %let mylist = age height weight fastgluc postgluc; %graphlist ( lib=sasuser, ds=diabetes, ivlist=&mylist, dv=pulse, path=c:\graphs); Figure 3: Browsing the sashelp.tmplt Catalog The following code puts the catalog entries named histo and scatter together using the V2 template. proc greplay nofs igout=work.gseg; tc sashelp.templt; template v2; treplay 1:histo 2:scatter; The igout= option specifies which catalog you are pulling entries from. The treplay statement is where 3 Notice that I used a macro variable named mylist to contain the variable list, which I then referenced in the macro call. If your list is very long, you might even want to include your list of variables in an external macro file, for example: %macro mylist; age height weight fastgluc postgluc %mend mylist; You would then reference it as %mylist instead of &mylist. Either way, the above macro call will produce five different graphs, which will be put into your C:\GRAPHS folder. A snapshot of the folder using

thumbnails view in Windows Explorer appears in Figure 4. When analyzing hundreds of variables, you may find that having one.gif file for each variable makes browsing and categorizing the variables much easier than having a several hundred page document produced (for example, by proc univariate). Figure 5: Logit Plots / Transformations Example Figure 4: Windows Explorer View of the.gif Files In practice, you will probably want to have output more tailored to your needs than a histogram / scatter plot pair with a few summary statistics included. For example, when building predictive models where the response variable is binary, you might use the technique described in this paper to create a different set of graphs with different summary statistics. For instance, Figure 5 shows four different plots in one.gif file. Each plot is a logit plot with a frequency plot overlay. The upper left graph is for the original variable, and the other three plots are common transformations that might be made when model building (square root, log, and reciprocal). In conclusion, if you follow the four steps outlined in this paper, you should be able to produce customized individual.gif files containing graphs and summary statistics that will hopefully make whatever analysis you are doing more efficient. EXTENSIONS This paper has described how to put multiple graphs into a single.gif file. However, I have not addressed the question of how to make a single title or footnote for all the graphs in the.gif file. For example, in Figure 5 it would be nice to have a title across the top which says logit plots for 4 transformations. This can be achieved via proc glside and creating a custom template. This approach is described in reference [1]. Finally, it should be noted that another approach to putting multiple graphs onto the same page is to use a PDF device driver. To do so, you would use the ods pdf option startpage=never as well as some graphics options such a vorigin=, horigin=, vsize=, and hsize= to position the graphs. This useful approach is described in reference [2], and is actually somewhat easier to program than using proc greplay as described in this paper. REFERENCES [1] Gayari, Michelle. Creating Graphs Using Templates. SUGI 22, paper 170. [2] Delaney, Kevin P. Multiple Graphs on One Page, the easy way (PDF) and the hard way (RTF). SUGI 28, paper 94. [3] SAS OnlineDoc, Version 8. SAS Macro Language Reference. [4] SAS OnlineDoc, Version 8. SAS Procedures Guide. 4

[5] SAS OnlineDoc, Vesion 8. SAS/GRAPH Software: Reference. CONTACT The author can be reached at: Paul Walker 15000 Capital One Drive, Building #2 Richmond, VA 23238 804-284-2311 walker.627@osu.edu DISCLAIMER SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. ************************************************************************************** APPENDIX: FULL MACRO CODE WITH COMMENTS %macro graphlist ( lib= /* data library */, ds= /* data set */, ivlist= /* list of independent variables to be graphed */, dv= /* dependent variable for scatterplot */, path= /* output path for.gif files */ ) ; STEP 1: Parse the independent variable list (&ivlist). *------------ Define variables used in parsing --------------------*; %let null = ; %let blank = %quote( ); %let fflag = 0; %let num = 0; *------------ Loop through the list of variables ------------------*; %do %until(&&fflag=1); %let num=%eval(&num+1); %let var&num = %scan(&ivlist,&num,&blank); %if &null=&&var&num %then %let fflag=1; %end; *------------ Number of variables in your list --------------------*; %let nvars=%eval(&num-1); STEP 2: Get summary statistics into macro variables. *------------ Start looping through each variable ----------------*; %do i = 1 %to &nvars; *------------ Get summary stats from proc means ------------------*; proc means data = &lib..&ds noprint; var &&var&i &dv; output out=means mean(&&var&i) = mean median(&&var&i) = median 5

n(&&var&i) = n nmiss(&&var&i) = nmiss; *------------ Put proc means output into macro vars -------------*; data _null_; set means; call symput('mean', trim(left(put(mean,best12. )))); call symput('median', trim(left(put(median,best12.)))); call symput('n', trim(left(put(n,best12. )))); call symput('nmiss', trim(left(put(nmiss,best12. )))); *------------ Get summary stats from proc corr ----------------*; proc corr data = &lib..&ds noprint outp=pearson; var &&var&i &dv; *------------ Put proc corr output into macro vars -----------*; data _null_; set pearson end=last; if last then call symput('corr', trim(left(put(&&var&i,5.3)))); STEP 3: Create graphs which include summary statistics. *----------- Delete catalog entries --------------------------*; proc catalog catalog=work.gseg; save histo.grseg scatter.grseg; delete histo.grseg scatter.grseg; *----------- Set general graphics options --------------------*; goptions device=gif gsfname=paul xpixels=800 ypixels=800 gunit=pct ftext=courier nodisplay; ods listing; filename paul "&path.\graphlist_&&var&i...gif"; *----------- Create histogram --------------------------------*; symbol; axis; title; note; footnote; axis1 label=(angle=90 height=4 "frequency") value=(height=3); axis2 label=(height=4 "&&var&i") 6

value=(angle=90 height=3); proc gchart data = &lib..&ds gout=work.gseg; vbar &&var&i / name="histo" levels=10 raxis=axis1 maxis=axis2 ; title height=5 "histogram of &&var&i"; footnote justify=center height=3 " " justify=center height=4 "mean=&mean, median=&median, nmiss=&nmiss, n=&n"; *----------- Create scatterplot ----------------------------*; symbol; axis; title; note; footnote; symbol1 v=triangle height=4 width=4; axis3 label=(angle=90 height=4 "&dv") value=(height=3); axis4 label=(height=4 "&&var&i") value=(height=3); proc gplot data = &lib..&ds gout=work.gseg; plot &dv * &&var&i / name="scatter" vaxis=axis3 haxis=axis4; title height=5 "scatterplot of &dv against &&var&i"; footnote justify=center height=4 " " justify=center height=4 "correlation between &dv and &&var&i is &corr."; STEP 4: Put the graphs together using proc greplay. *----------- Turn on the display --------------------------*; goptions display; *----------- Use proc greplay -----------------------------*; proc greplay nofs igout=work.gseg; tc sashelp.templt; template V2; treplay 1:histo 2:scatter; *----------- End of the "%do" loop -----------------------*; %end; %mend graphlist; 7