Picturing Statistics Diana Suhr, University of Northern Colorado Abstract Statistical results could be easier to understand if you visualize them. This Hands On Workshop will give you an opportunity to program SAS statistical procedures (e.g., PROC FREQ, PROC MEANS, PROC CORR) and learn to illustrate the results with PROC GCHART/PROC GPLOT. Comparisons will be shown between PROC GCHART, PROC CHART, PROC PLOT, and PROC GPLOT. Introduction Understanding statistics can be difficult. Visual representations (graphs, plots) sometimes make statistical results easier to understand. This hands on workshop will give you an opportunity to learn ways to picture statistics. Syntax for PROC CHART, PROC GCHART, PROC PLOT, and PROC GCHART is shown. Examples illustrate frequencies, correlations, and means. PROC CHART PROC CHART produces vertical and horizontal bar charts (histograms), block charts, pie charts, and star charts (SAS Procedures, 1990). PROC CHART produces charts for both numeric and character variables. PROC CHART automatically selects intervals. However, interval midpoints can be explicitly defined. PROC CHART syntax is PROC CHART <options>; <options are DATA= FORMCHAR <list> formchar LIP= > BY variable(s); VBAR variable(s) <options>; HBAR variable(s) <options>; BLOCK variable(s) <options>; PIE variable(s) <options>; STAR variable(s) <options>; See SAS Procedures Guide and SAS Online Docs for explanations of statement specific options. Standard and statement-specific options include ASCENDING AXIS= CFREQ CPERCENT DESCENDING DISCRETE FREQ FREQ=variable GROUP=variable G100 LEVELS=# of midpoints MEAN MIDPOINTS=midpoint list MISSING NOHEADER NOSPACE NOSTATS NOSYMBOL NOLEGEND NOZEROS PERCENT REF=value SUBGROUP=variable SUM SUMVAR=variable SYMBOL= character-list TYPE=(CFREQ CPERCENT FREQ MEAN) PERCENT SUM. The types of charts are VBAR for vertical bar chart HBAR for horizontal bar chart BLOCK for block chart PIE for pie chart STAR for star chart. The appearance of the chart can be changed by specifying The type of chart Summary measures Grouping variables Line-size, page-size, form-character options. Types of statistics that can be presented are TYPE=FREQ for frequency counts TYPE=PCT for percentages TYPE=CFREQ for cumulative percentages TYPE=CPCT for cumulative percentages TYPE=SUM for totals TYPE=MEANS for averages. Options to control grouping are DISCRETE groups numeric variables as categorical variables GROUP= groups variable values SUBGROUP= determines subgroups MIDPOINTS= uses interval midpoints SUMVAR= names the variables to collect summaries for means, sums, or frequencies. PROC GCHART The GCHART procedure produces vertical and horizontal bar charts (also called histograms), block charts, pie and donut charts, and star charts. You can use these charts to represent pictorially a given variable value or the relationship between two or more variables or the value of a statistic calculated for one or more variables (SAS Online Doc, 1999). Syntax for PROC GCHART is PROC GCHART <options>; <options are DATA= ANNOTATE= GOUT= IMPAGEMAP= > BLOCK chart-variable(s) </options>; HBAR HBAR3D VBAR VBAR3D chart-variables </options>; PIE PIE3D DONUT chart-varible(s) </options>; STAR chart-varible(s) </options>; PROC PLOT PROC PLOT plots the values of two variables for each observation. To produce a plot, specify which variables to plot. PROC PLOT syntax is PROC PLOT <options> <options are DATA= UNIFORM NOMISS NOLEGEND VTOH= FORMCHAR= HPERCENT= VPERCENT= >; BY variable(s); PLOT vertical variable horizontal variable <options>; <options are HAXIS= VAXIS= HZERO VZERO HREVERSE VREVERSE HEXPAND VEXPAND HSPACE= VSPACE= HREF= VREF= HREFCHAR= VREFCHAR= BOX HPOS= VPOS= OVERLAY CONTOUR S<level>= SLIST= > PROC GPLOT The GPLOT procedure produces two-dimensional graphs that plot one variable against another within a set of coordinate axes. The coordinates of each point on the plot correspond to two variable values in an observation of the input data set. PROC GPLOT produces plots for character variables, as well as numeric variables. Graphs are automatically scaled to the values of the data, although scaling can be controlled with options or with AXIS statements. The GPLOT procedure can produce
several kinds of graphs: overlays plots; plots against one or two vertical axes; bubble plots in which circles of varying proportions representing the values of a third variable are plotted on the vertical and horizontal axes; plots with a legend; scatter graphs, needle plots, and plots with simple or spline-interpolated lines (SAS Online Doc, 1999). The syntax for PROC GPLOT is PROC GPLOT <options > <options are DATA= ANNOTATE= GOUT= UNIFORM IMAGEMAP= > BUBBLE plot-request(s) </options>; BUBBLE2 plot-request(s) </options >; PLOT plot-request(s) </options >; PLOT2 plot-request(s) </options >; Graphics Options A few graphics options can be set for PROC GCHART AND PROC GPLOT so that your results match the graphics shown in the examples (SAS/GRAPH Software, 1990). gooptions gunit=pct cback=white htitle=6 htext=3 ftext=swissb ctext=blue; gunit= sets the character height units measurement to percentage of display height cback= sets monitor background color to white htitle= sets the first title text height to 6 (in units of display height percent) htext= sets graph text height to 3 (in unites of display height percent) ftext= sets graph text font to swissb ctext= sets graph text color to blue Options for PROC GCHART AND PROC GPLOT may be reset to default values by using goptions reset = all. +++ +++++++++++! #"!#"#!# ++++++++$$%!" %&!!!!,----- %!!!$!!! proc chart data = rawsub; vbar deg; +++++!!-+++++ +++++ $!-++++++++++!-++++++++++!-++++++++++ %!-++++++++++ +++++ +++++ +++++, hbar deg; Examples Frequencies count the values of a variable. The PROC FREQ procedure produces a frequency table. A visual representation produced with PROC CHART or PROC GCHART can illustrate frequencies. proc freq data = rawsub; tables deg;! "!##"#!# $$ %&%!"!!!! '(() vbar deg; proc chart data = rawsub; hbar deg; Hbar produces a histogram with frequencies.
hbar3d deg; vbar deg/subgroup=gend; block deg; vbar deg/group=gend; proc freq; tables gender deg;. / ----.# &#% %&$%&%&!"% $%$! """"$ ---- #$#& % &!% $$ "%" %&% "$"$" % ----! $$%!" "!# %&!!!! '(() Title height=5 pct Research Study#95 ; Title2 font=simplex group by gender ; Pattern1 value=right color=blue; Pattern2 value=x3 color=red; vbar deg / subgroup=gend;
vbar deg / group=gend; vbar3d deg / subgroup=gend; Correlations Correlation measures the strength of the linear relationship between two variables. If one variable can be expressed exactly as a linear relationship of another variable, then the correlation is 1 (directly related) or 1 (inversely related). A correlation of 0 indicates no relationship. PROC CORR computes correlation coefficients. proc corr; var s5 s7; 0 (..(12)%!$ 3445!6)! ("(# ("!!!!!!$ 7!!! (#!$!!!!! 7!!! proc plot; plot s5s7; vbar deg3d / group=gend; plot s5s7; proc corr; var yrstch yrscoach;
0 (..( 3445!6)! 2.0(( proc sort data = rawsub; (( (!!!!!8!%!!!! &$& (8!%!!!!!!!!! &%! Graphing mean values Means (averages) may be calculated and graphed for groups with the following code. by grp; proc means noprint; by grp; var s2 s3 s7 s9; output out=mnfl mean=m2 m3 m7 m9; proc gplot data = mnfl; plot m2grp; plot yrstchyrscoach; symbol1 interpol=join; proc gplot data=mnfl; plot m2grp; proc plot; plot yrstchyrscoach; plot m2grp m3grp; The code above produces two plots. If you want one plot, an overlay plot, use the code below. plot m2grp m3grp/overlay; Options haxis and vaxis can be used to change the horizontal axis and vertical axis. Try the following code and see what happens. Did the axis change? Did the axis default to a maximum value? pattern1 color=red value=solid; pattern2 color=blue value=solid; plot m2grp m3grp/overlay area=2; proc plot; plot yrstchyrscoach /haxis=0 to 50 by 2 vaxis=0 10 20 30 40 50; plot yrstchyrscoach /haxis=0 to 50 by 2 vaxis=0 10 20 30 40 50;
plot m2grp m3grp m7grp m9grp /overlay; Analysis of Variance PROC GLM can be used to determine significant differences between the means of two or more groups. The overlay plot above illustrates means of the four groups on four items. proc glm; class grp; model s2 s3 s7 s9 = grp; means grp; What now? You ve run plots or graphs and want to get the pictures into a document or a presentation. Adobe PhotoShop or Microsoft Paint will assist you. Copy from SAS and paste to PhotoShop or Paint and resize your picture. Then copy and paste into Word or Powerpoint. Conclusion Try picturing statistics with PROC PLOT, PROC GPLOT, PROC CHART, or PROC GCHART. The results will provide you with an easier way to explain your statistical results. References SAS Applications Guide, 1980 Edition, Cary, N.C.: SAS Institute. SAS/Graph Software, Version 6, First Edition. Cary, N.C.: SAS Institute, 1990. SAS Language, Version 6. Cary, N.C.: SAS Institute, 1990. SAS Language and Procedures, Version 6, First Edition. Cary, N.C.: SAS Institute, 1989 SAS OnlineDoc, Version 8, SAS/STAT User s Guide, Chapter 63. Cary, N.C.: SAS Institute, 1999. SAS Procedures, Version 6, Third Edition. Cary, N.C.: SAS Institute, 1990. About the author Diana Suhr is a Statistical Analyst in the Office of Institutional Research at the University of Northern Colorado. She earned a Ph.D. in Educational Psychology at UNC in 1999. The first programming language she learned was Fortran in 1970. She has been a SAS programmer since 1984. Contact Diana Suhr, Statistical Analyst Institutional Research University of Northern Colorado Greeley, CO 80639 970-351-2193, diana.suhr@unco.edu SAS and all other SAS Institute product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.