SparkLines Using SAS and JMP Kate Davis, International Center for Finance at Yale, New Haven, CT ABSTRACT Sparklines are intense word-sized graphics for use inline text or on a dashboard that condense table of numbers into elegant quantitative visualizations. Dr. Edward Tufte introduced sparklines to the information visualization community in his book "Beautiful Evidence". This paper outlines constructing painfree sparklines for both SAS and JMP, including a complete introduction to their construction and the "bank to 45" and other construction rules. Examples in both JMP and SAS, including SAS macros, are presented. INTRODUCTION Sparklines are more than simply small graphics of a single variable. Sparklines are meant to be used inline in documents to convey quantitative information not just in place of the traditional tabular numeric form, but to provide additional analytic knowledge of the underlying process. Because sparklines are analytic exhibits, brute force machine generation or the graphics without critical review is not recommended. These examples are restricted to times series compatible data of uniform interval with no missing measurements. Sparkline construction follows three guiding principles: Sparklines should be the same point-size as the accompanying text and should have an appropriate width for a good aspect ratio; Unintentional optical clutter should be remoted; and the resolution should be of cartographic or typographic quality, usually 1200 dpi. The process of generating sparklines for review is nearly painless in both SAS and JMP once the proper aspect ratio, colors, borders and resolution are defined for standard graphics procedures. ASPECT RATIO The aspect ratio, or Shape Parameter, of a graphic is the ratio of the height to the width. The aspect ratio for any quantitative graph should be chosen precisely to present data in the most objective fashion. As Cleveland [1993, p. 336] demonstrated with the Wolfer sunspot data set, the aspect ratio used can change not only the initial interpretation of data, but allow detections of trends not obvious as an arbitrary aspect ratio. Figure 1 Plot of Sunspot Data, default JMP aspect ratio The plot in Figure 1 seems to demonstrate high volatility in sunspot numbers over two centuries. MATHEMATICS OF THE SHAPE PARAMETER In a two parameter graph, each pair of data points (x i-1,y i-1 and (x i,y i determine one line segment of length s i. The line segments s i is the hypotenuse of the right triangle formed by these points. The base is 1 unit by our assumptions and the height is (y i - y i-1 or dif(y as a standard SAS/JMP formula. The number of interest is i, which in Figure 2 represents the angle opposite the dif leg. 1
Figure 2 Close up of one data triangle Tan( i = dif(y i /1= dif(y i, so i =arctan( dif(y i Clearly, the selection of the physical units for each logical x unit and the logical height will affect the actual value for is I, so any aspect ratio should in some way produce optimal individual i. BANKING TO 45 Cleveland introduced the Banking to 45 as a way to choose an average of 45. By using this principle, the overall average right triangle will be isosceles. There is no analytic solution to solve the summation of arctangents of absolute values, but many numerically attainable approximations have been offered for time series data. The most easily implemented is the median-absolute-slope criterion [Cleveland, 1988], which seems to work well in practice for sparklines. The compromise is to choose so that s i =1 The approximation is: * = range{ y i }/(n*median{ dif(y i } This is easily implemented for a fixed point size sparkline. If the sparkline height is 12pt, then the width of the sparkline is 12pt/ *. In the sunspot example, * = (154.4/(176*13.5 = 0.065, which yields an optimal length of 185 points, and the sunspot graph is now an appropriate size for text. CLUTTER FREE The second principle is a clutter free graphic. This simply means removing all extraneous lines and text that are not an integral part of the information, and choosing colors that allow the graphic to be fully integrated into the document. The first step to a clutter free graphic is to remove all background colors, borders, and extra plot points and text. All borders and plot points have been removed, the background is now transparent, and the line color is now dark gray instead of black. Another option is to use bars instead of plot lines. These bar spark-lines are often called sparkbars. 2
ADD SOME COLOR Once the sparkline has been reduced to a simple graphic, colors can be reintroduced to emphasize certain statistics. Tufte has suggested that the starting and ending points of a sparkline should be represented by points colored green, and the high and low values by red points. %sparkline(dsn=sunspot,yvar=sunspots,xvar=year; %sparkline(dsn=sunspot,yvar=sunspots,xvar=year,anno=dots; %sparkbar(dsn=sunspot,yvar=sunspots,xvar=year; HIGH RESOLUTION The goal of creating sparklines is to include the visualization in the context of a wider analysis. The inline sparklines presented have been generated using JMP for Macintosh and simply copied and pasted using the operating system s default understanding of the graphical capabilities of both JMP and Microsoft Word and the sparkling resolution matches the overall resolution of the document. These graphics can be generated using the SAS macros or JMP script snippets and setting the appropriate graphics options for the output method. The SAS/Graph procedures produce excellent graphics that can be used in webpages, desktop publishing documents and standard word processing documents. CONCLUSIONS With the appropriate preparation and attention to detail, both SAS/Graph and JMP provide an excellent platform to create and disseminate visual information in the form of Sparklines. REFERENCES AND LINKS Tufte, Edward (1983 The visual display of quantitative information, Cheshire, Connecticut: Graphics Press Tufte, Edward (2006 Beautiful Evidence, Cheshire, Connecticut: Graphics Press (http://www.edwardtufte.com Cleveland, William S. (1988. The Shape Parameter of a Two-Variable Graph, Journal of the American Statistical Association, Vol. 83, No. 402 (Jun., 1988, [pp. 289-300 Cleveland, William S. (1993, A model for Studying Display Methods of Statistical Graphics, Journal of Computational and Graphical Statistics, Vol. 2, No. 4 (Dec., 1993, pp. 323-343 Cleveland, William S. (1994. Visualizing Information, Summit, New Jersey: Hobart Press (http://cm.bell-labs.com/cm/ms/departments/sia/wsc/ Robbins, Naomi B. (2005. Creating More Effective Graphs, Hoboken: Wiley Interscience (http://www.nbr-graphs.com ACKNOWLEDGMENTS SAS and JMP is are Registered Trademarks of the SAS Institute, Inc. of Cary, North Carolina. Thank you to Drs. Edward Tufte, William Cleveland and Naomi B. Robbins for their continued efforts to rid the published world of ChartJunk. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Kate Davis International Center for Finance Yale School Of Management 46 Hillhouse Avenue New Haven, CT 06511 Email: Kate@Belisle.org Web: icf.som.yale.edu 3
CODE AND SCRIPTS JMP SCRIPT SNIPLETS Default Plot Overlay Plot( X( :Year, Y( :Sunspot Number, Sort X( 0, Y Axis[1] << {{Scale( Linear, Format( "Best", Min( -5, Max( 160, Inc( 50, Show Minor Ticks( 0 }}, Separate Axes( 1, X Axis << {{Scale( Linear, Format( "Best", Min( 1745, Max( 1930, Inc( 50 }}, Connect Points( 1, SendToReport( "106", {Scale( Linear, Format( "Best", Min( -5, Max( 160, Inc( 50, Show Minor Ticks( 0 }, "101", {Scale( Linear, Format( "Best", Min( 1745, Max( 1930, Inc( 50 } Banking to 45 Difference Formula Dif(Sunspot Number = Abs( Dif( :Sunspot Number, 1 Create Summary Table Data Table( "Subset of Sunspots Data" << Summary( Group, N( :Sunspot Number, Range( :Sunspot Number, Median( :Name( "Dif(Sunspot Number" Plot with correct Aspect Ratio Overlay Plot( X( :Year, Y( :Sunspot Number, Sort X( 0, Y Axis[1] << {{Scale( Linear, Format( "Best", Min( -5, Max( 160, Inc( 50, Show Minor Ticks( 0 }}, Separate Axes( 1, X Axis << {{Scale( Linear, Format( "Best", Min( 1745, Max( 1930, Inc( 50 }}, Connect Points( 1, SendToReport( "Overlay Plot", FrameBox, Frame Size( 185, 12 4
Clutter Free Overlay Plot( X( :Year, Y( :Sunspot Number, Sort X( 0, Y Axis[1] << {{Scale( Linear, Format( "Best", Min( -5, Max( 160, Inc( 150, Show Minor Ticks( 0 }}, Separate Axes( 1, Connect Thru Missing( 1, X Axis << {{Scale( Linear, Format( "Best", Min( 1745, Max( 1930, Inc( 50, Show Minor Ticks( 0 }}, Connect Points( 1, Show Points( 0, :Sunspot Number( Connect Color( 1, SendToReport( "106", {Scale( Linear, Format( "Best", Min( -5, Max( 160, Inc( 150, Show Minor Ticks( 0 }, "101", {Scale( Linear, Format( "Best", Min( 1745, Max( 1930, Inc( 50, Show Minor Ticks( 0 }, "Overlay Plot", FrameBox, {Frame Size( 185, 12, DispatchSeg( LineSeg( 1, {Line Color( "Gray" } } 5
SAS MACROS %macro gamma(dsn=_last_,y=y, x=x,dsnout=dsnout; ** Creates a data file with the summary vars needed to calculate aspect ratio; proc sort data=&dsn. out= raw; by &x. ; data aspect; set raw; by &x.; vdot=dif(&y.; absvdot=(abs(vdot; proc summary nway noprint; var absvdot &y. ; output out= bar min= max= median= /autoname; data &dsnout.; set bar; vrange=(&y._max-&y._min; gammastar=vrange/(_freq_*absvdot_median; %mend gamma; %macro sparkline (dsn, yvar,xvar,height=12, Anno=NONE; %gamma(dsn=&dsn.,y=&yvar.,x=&xvar.,dsnout= stats; data _null_; set stats; call symput('gamma',gammastar; call symput('miny',&yvar._min; call symput('minx',&xvar._min; call symput('maxy',&yvar._max; call symput('maxy',&yvar._max; width=round(&height./gammastar; call symput('width',width; data anno; %if &Anno.=DOTS %then %do; data anno; *Create dots; set raw end=last; by &xvar.; retain function "SYMBOL" text "DOT" when "A" size &height. xsys ysys '2' hsys '3' ; if _n_=1 or last then do; x=&xvar.; y=&yvar.; color="green"; output; end; if &yvar.=&maxy. or &yvar.=&miny. then do; x=&xvar.; y=&yvar.; color="red"; output; end; %end; goptions noborder RESET=ALL vsize=&height.pt hsize=&width.pt; axis1 length=95 pct; symbol1 interpol=join value=none width=0.5 color=gray; proc gplot data= raw; plot &yvar.*&xvar. / noaxis noframe overlay annotate= anno haxis=axis1; quit; %mend sparkline; %Macro SparkBar(dsn, yvar,xvar, height=12; %gamma(dsn=&dsn.,y=&yvar.,x=&xvar.,dsnout= stats; data _null_; set stats; call symput('gamma',gammastar; width=round(&height./gammastar; call symput('width',width; goptions noborder RESET=ALL vsize=&height.pt hsize=&width.pt; pattern value=solid color=gray; proc gchart data=&dsn.; vbar &xvar. / sumvar=&yvar. discrete noaxis noframe ; quit; %mend sparkbar; 6