Tips for Producing Customized Graphs with SAS/GRAPH Software Perry Watts, Fox Chase Cancer Center, Philadelphia, PA Abstract * SAS software is used to produce customized graphics displays by solving a set of related problems. First, the problem of selectively displaying labels at unevenly spaced intervals along a horizontal axis is solved by invoking the FORMAT procedure from PROC GPLOT. Next, a problem that gets in the way of generating the first graph is used for segmenting an axis to highlight the presence of outlying values in a displayed data set. A third more involved problem deals with overlapping labels along a midpoint axis. This time the problem is solved by invoking PROC FORMAT from an ANNOTATE data set rather than from the graphics procedure itself. Text distortion resulting from mapping related graphs to oblong templates is the fourth problem addressed in the presentation. A simple, external solution sends graphics output to Computer Graphics Metafiles (CGMs) containing scaled TrueType fonts. Unfortunately, CGMs have their own set of limitations which must be addressed before the developer can take full advantage of the file format. The presentation concludes by showing how to get one symbol instead of the usual three to identify subgroups in a LEGEND statement. The solution involves a manipulation of the width parameter in the legend s SHAPE clause. Axes Labels at Unevenly Spaced Intervals Occasionally it is necessary to produce a graph with an axis containing major ticks labels positioned at unevenly spaced intervals. Unfortunately, there is no feature in SAS/GRAPH that will automatically handle this situation. One can use the order clause in the Axis Statement to select a subset of intervals, but SAS will plot them as if they were evenly spaced. Using a unit interval, on the other hand, will only clutter the graph and confuse the viewer, since all major tick marks are labeled by default. The serial development of the graph in Figures - shows how a simple application of PROC FORMAT solves the problem. In Figure, the measurements are placed at an accurate distance from each other, but the graph is cluttered, and * This publication was supported by Grant 7- from the National Institute for Allergies and Infectious Diseases, NIAID. it is difficult for the viewer to link the horizontal coordinates to the data display located relatively high up on the response axis. Figure simplifies the graph but distorts the time interval whereas Figure solves the problem by invoking PROC FORMAT from GCHART to selectively label major tick marks. Figure could be further enhanced by removing unlabeled ticks with a graphics editor. Here is the code for PROC FORMAT: proc format; value weekfmt = = = = = 8= 8 = = = + other= ; run; Segmenting an Axis for Outliers In the original abstract for this paper, SKIPMISS, was listed as a means for segmenting an axis in order to emphasize the presence of outlying values in a data set. The current solution bypasses SKIPMISS by adding a small amount of code to an existing ANNOTATE data set used for labeling data points in the graph about ODC activity displayed in Figure. While the unevenly spaced intervals in Figure are defined as a problem, they become an asset in the ODC graph, because this time the data values should not be proportionately spaced. As in Figure, the order clause in the Axis Statement is used to generate the tick marks. There is also no need for PROC FORMAT in the ODC graph, but both axes are modified slightly to accommodate the hatch mark with its intervening space in the response axis. First the offset value for the horizontal axis is set to zero with offset=(,). Setting the horizontal offset to zero ensures that the value for the x-coordinate in the vertical axis is truly zero. Then the hatch mark which is a rotated equal sign (=) is created. The label command in ANNOTATE is used for drawing the hatch mark, and move and draw commands place a thick white line between the equal sign directly over the response axis. Using when= a (after) in ANNOTATE also ensures that the hatch mark will be the last item plotted on the page, so it goes over not under an already formed axis. Creating Visible Axes Labels Ordinarily each bar in a histogram is automatically labeled with its midpoint value in the GCHART procedure ( SAS/GRAPH Software Volume II 7).
This practice is satisfactory when a single bar summarizes data for a range of values. However, when the range is one, as it is in Figure, the numbers are bound to overlap regardless of the point value assigned to the font. This problem is magnified for multiple plot displays where numbers must be made even larger for terminal viewing. As in the segmented axis example, ANNOTATE is used for obtaining a customized axis. Notice that it provides the developer with complete control over what values are plotted along the axis. Zero to five, for example is a longer interval than five to, and is the maximum value for all the runs in set #. Code for the ANNOTATE data set that is needed for drawing the midpoint axis (maxis) in the mutation graph along with anfmt which provides spacing between the numbers is shown below: proc format; value anfmt ='' ='' ='' ='' ='' ='' ='' ='' ='' ='' other=''; data anno; length function color $8; length text $; length position $; retain color 'black' xsys '' ysys '' hsys ''; do i= to ; chi=left(put(i,anfmt.)); if(chi ne '') then do; function='label'; x=i; y=; size=&hh; position='e'; text=chi; output; stop; run; The format, anfmt, highlighted in anno above is used to display text not manage data in the cell mutation graph. The data are still managed from the axis statement s order clause which processes the full range of values in unit steps. Here is the code for axis which governs the data: axis label=none order=( to by ) major=none minor=none value=none; Unlike the Baseline graph displayed in Figure, PROC FORMAT cannot be used for both axis display and data management in the cell mutation graph. Values for intervening mutation numbers exist in Figure whereas only labeled weeks from baseline contain data values in Figure. In fact, Figure shows what happens when anno is removed from the cell mutation program, and axis is altered to display formatted data: Correcting Text Distortion The next example extends the cell mutation graph above by mapping three similar plots to oblong templates so that they can be viewed together on a single page. Again, PROC GCHART is used for creating histograms having unit ranges along a midpoint axis. The same process of invoking PROC FORMAT from an ANNOTATE data set is also used to generate axis numbers, but the axis is no longer linear in scale. The presence of a nonlinear scale is noted by the fact that many more than 9 bars appear between sites and in the Vkappa Shannon plot displayed in Figures 7 and 8. Every tenth bar is highlighted so that the viewer can get an accurate count of the number of sites in a given plot. As shown below, anfmt which numbers the Vkappa axis easily manages the nonlinear scale: value anfmt ='' ='' ='' ='' ='' ='' ='' 7='7' 8='8' 9='9' other=' '; The ANNOTATE data set, anno, on the other hand, is a bit more complicated than anno displayed earlier, because tick marks need to be simulated. PROC GCHART does not support tick marks along the midpoint axis, but the presence of a nonlinear scale in the Shannon plots requires them, so they are inserted with a vertical slash ( ) mark: data anno; /* For X axis values,ticks */ length function color $8; length text $; length position $; retain color 'black' xsys '' ysys '' hsys; i=; do while (i le &maxx.); chi=left(put(i,anfmt.)); if(chi ne ' ') then do; /*major ticks*/ function='label';x=i;y=;size=.7; position='e';text=' ';output; function='label';size=8;position='e'; text=chi;output; else do; /*minor ticks*/ function='label';x=i;y=;size=.; position='e';text=' ';output; i+; stop; run; Position E in both anno and anno centers the text a half cell below the y-coordinate. This way numbers and tick marks are separated from the horizontal axis itself. Figure 7 shows the graph that is printed directly from SAS in a Microsoft Windows environment. Axes numbers become distorted when a single graph is compressed along one axis but stretched out on another.
On the other hand, the title, labels, and footnotes are not so distorted in Figure 7, because they are displayed from a larger fourth template which is less oblong in shape. Figure 8 shows a corrected version of the Shannon graphs. Fortunately CGMs support scaleable fonts which automatically correct the distortion. Otherwise, a developer would be faced with the impossible task of placing each number on all sets of axes from the vantage point of the larger fourth template. Correcting the distortion with a CGM only involves changing a SAS program s default font from SWISS to HWCGM, and updating the GOPTIONS statement with: device=cgmmwc and gsfname=cgmname. CGM Limitations Unfortunately, there are a number of features that limit the effectiveness of CGMs for displaying SAS/GRAPH output. First, the file format is not available across all platforms, and Microsoft Word doesn t support it in the Macintosh environment. Secondly, the line width clause is not supported outside of SAS. A close examination of Figures - in this article, for example, shows that all axes lines have very narrow widths. Error bars in some of the graphs are thicker, because they are generated in ANNOTATE which relies on size rather than width for line width. Possibly the failure to uniformly translate line width to the external environment can be attributed to the fact that units are not specified as a parameter for the width clause ( SAS/GRAPH Software Usage 9). Sometimes axes labels will be truncated when a graph is written to a CGM file. A clumsy fix for this problem involves inserting a blank footnote into a graph in the same manner suggested for moving an axis frame away from a graph border ( SAS/GRAPH Software Usage 9). On some platforms, however, null or blank footnotes will be ignored by the SAS compiler, and the truncation problem won t be fixed. All one has to do in this instance is to add a footnote in the graph s background color containing the characters BAD FIX. Then axes labels will be fully displayed. The truncation problem described here emphasizes the lack of control a SAS developer has over the space surrounding the procedure output area. CGMs also ignore the implicit carriage return that SAS software inserts for multiple line axes labels. For example, when the target device for the SAS program which generates the graph in Figure 9 is set to winprtc, the following simple code will produce multiple line labels for the horizontal axis: value= (tick= h= j=c ' ' j=c 'Pathologist' tick= h=. j=c 'Mean' j=c ' ' ) The word Pathologist is printed out under a blank line whereas Mean appears over the numbers and. The SAS compiler automatically inserts a carriage return every time it encounters a justification (j) clause. Output is then formatted properly when it goes directly to the screen. However, if graphics output is redirected to a CGM, carriage returns must be added to obtain multiple line axes labels. The altered code below is anything but straight forward: /* First create a carriage return character as a macro variable */ data _null_; cr=byte(); call symput('cr',put(cr,$.)); stop; run; /* Next, insert cr into the code and color it white so that it is not displayed as a small box in the output. Add three justification clauses (emboldened j= characters below) to center justify initial characters, and lastly make sure that the color of the text to be seen is black.*/ value= (tick= h= j=c c=white "&cr" c=black j=c 'Pathologist' tick= h=. j=c c=black "Mean" j=r c=white "&cr" c=black j=c ' ' ) In addition to generating a carriage-return, the byte function shown above is also used for creating the math symbol, ±, in Figure 8. A comparison with Figure 7 shows that this is one time when Microsoft Word is superior to SAS in its display of special symbols. Displaying One Symbol in a Legend A discussion of symbols brings us to the last coding example in this presentation. If values are not specified for the shape clause in a LEGEND statement, width and height will be set to and (cells) respectively. Invariably the value of will produce the three symbol legend that is shown in Figure 9. Setting width to a small value such as. generates the single symbol display shown in Figure. The term width in the LEGEND statement does not refer to the width of the symbol per se. Because the symbol is proportionate, height takes care of symbol width as well. Instead, width refers to the amount of space allotted for each legend value in the output ( SAS/GRAPH Software Volume ). If width is a small value, only one symbol will be printed.
Summary and Conclusions While five different graphics problems have been described in this paper, the smaller number of techniques that are used to solve them must be judiciously applied. For example, the same application of PROC FORMAT works well for spacing intervals and preventing overwrites, but FORMAT alone cannot be used to display data above a blank axis label. ANNOTATE is needed in such situations to preserve the integrity of the data. Again, even though an uneven order clause is essential for solving a segmented axis problem, it gets in the way of accurately displaying data at unevenly spaced time intervals. CGMs also don t provide universal solutions to text formatting problems. Nevertheless, the provision of a scaled font improves the quality of multiple graphics displays, and CGMs are very easy to insert into Microsoft Word documents and PowerPoint presentations. References Microsoft Corporation. User s Guide Microsoft Word: The World s Most Popular Word Processor, Version.. Microsoft Corporation, 99. Microsoft Typography Features of TrueType TrueType fonts. December 99. <http://www.microsoft.com/ truetype/what/ttfonts.htm> ( March 997). SAS Procedures Guide, Version, Third Edition. Cary NC: SAS Institute Inc., 99. SAS/GRAPH Software, Usage, Version, First Edition. Cary NC: SAS Institute Inc., 99. SAS/GRAPH Software, Volumes and, Reference, Version, First Edition. Cary NC: SAS Institute Inc., 99. Value. 7... 7 8 9 7 8 9 Weeks from Baseline.. Figure. A display of unit intervals.
Value. 7... 8 + Weeks from Baseline.. Figure. A display of selected intervals, but time is distorted. Value. 7... 8 + Weeks from Baseline.. Figure. A display of unevenly spaced time intervals.
ODC Activity vs Differentiation Grade 7. =..7. O D C A c t i v i t y... ±.8. ±.. ±. Well (n=) Mod (n=) Differentiation Grade Mod-Poor (n=) Poor (n= ). ±. Figure. A segmented response axis emphasizes the presence of outlying values in a data set. Number of Cells by Mutation for Selected Runs from Set # (Grouped by Burst Size) Run= Grp=All #Cells 9 8 7 #Mutations Figure. Numerical values for the number of cell mutations are placed sufficiently far apart from each other to prevent overlapping.
Number of Cells by Mutation for Selected Runs from Set # (Grouped by Burst Size) #Cells 7 Run= Grp=All #Mutations Figure. Incorrect results are displayed when main axis values alone are formatted in PROC GCHART. 7
Figure 7. Shannon Plots produced directly from SAS with targetdevice=winprtc. H 9 VAlpha Shannon(H ± s) Measure of Diversity 7 8 9 9 VBeta 7 8 9 VKappa 7 8 9 Site (Kabat-Wu Numbering) Figure 8. Text distortion is eliminated by sending output to a CGM file. 8 Amino Acids (-) Excluded
% PreOp ChemoRT vs None in Pancreatic Cancer By Pathologist NO ChemoRT ChemoRT 8 Pathologist Mean Median STD MIN MAX Range Cancer Cells Figure 9. Multiple line axes labels as well as multiple legend symbols are displayed. % PreOp ChemoRT vs None in Pancreatic Cancer By Pathologist NO ChemoRT ChemoRT 8 Pathologist Mean Median STD MIN MAX Range Cancer Cells Figure. The graph is improved by having only one symbol in the legend. 9