Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny Kathy Hardis Fraeman, United BioSource Corporation, Bethesda, MD ABSTRACT Often tables or summary reports need to be produced with SAS where all possible values of one or more variables need to be included as rows in a table. However, the actual data to be summarized in a table might include variables that don t have all of the variables possible values, even though the table needs a corresponding zero-filled row for that variable value. These zero-filled table rows for non-existent variable values will be missing from the table unless additional programming is done. One programming method to make sure all rows are included would be to hard code all possible values of a variable, although this method could be tedious if a large number of variables and/or values are involved. A more dynamic method of determining all possible values of a variable is to attach a SAS format to each table variable, where the format contains all of the variable s possible values. SAS can dynamically determine the name of a format attached to a variable using SYSFUNC with SCL or a dictionary table using PROC SQL. SAS can then generate a data set of all possible values for the variable by using the CNTLOUT = <dataset> option of. The output data set generated from can be dynamically used to ensure that all possible values of a variable, even values that don t actually exist in the data, will be included as rows in a table. INTRODUCTION Tables or summary reports may need to be produced using SAS where all possible values of one or more variables must be included as rows in the tables. However, the actual input data to be summarized in such a table might include not include all possible combinations of data values of all relevant variables. If the table needs rows for all possible combinations of these data values, zero-filled table rows for these non-existent variable values will be missing from the table unless additional programming is done. One programming method to make sure all rows are included would be to hard code all possible values and combinations of values of a variable or variables. This method has the disadvantage of potentially being tedious if a large number of variables and/or values are involved, and not dynamic if the possible values of a variable will change over time. A more dynamic method of determining all possible values of a variable is to attach a SAS format to each variable, where the format contains all of the variable s possible values. This paper will show how determine the format attached to a variable, how to determine the data values defined in the format, and then how to use this information to create a table that will include rows for all possible data values as defined by the format. SAMPLE DATA The sample data in the SAS data set IN.SALES used in this paper are given below: Obs employee year num dollar 1 Hall FY 2008 10 $10,000.00 2 Hall FY 2010 15 $15,500.00 3 Oates FY 2008 8 $500.00 4 Brooks FY 2008 15 $11,111.00 5 Brooks FY 2010 20 $12,345.67 6 Abbot FY 2008 50 $75,757.00 7 Abbot FY 2010 75 $99,999.99 8 Costello FY 2008 33 $33,333.00 9 Costello FY 2010 44 $44,444.44
Both employee and year are numeric variables with attached formats. The formats used for those variables are given as: proc format library = library; value emplfmt 1 = "Hall" 2 = "Oates" 3 = "Brooks" 4 = "Dunn" 5 = "Abbot" 6 = "Costello" ; value yearfmt 2008 = "FY 2008" 2009 = "FY 2009" 2010 = "FY 2010" ; The format EMPLFMT is attached to the variable EMPLOYEE, and format YEARFMT is attached to the variable YEAR. These formats include all possible values for the variables and can be updated when new possible data values are added. REPORT WITH MISSING ROWS A report is needed using the sample data that gives the number of sales and the dollar amount of sales for each fiscal year and by employee within year. The PROC REPORT code to produce such a table is: title "Report with Missing Rows"; proc report data=in.sales nowindows headline headskip; column year employee num dollar; define year / group 'Year' order=data; define employee / display 'Employee' order = data; define num / display 'Number of sales'; define dollar / display 'Amount of sales'; break after year / skip; The report with the original data looks like:
Report with Missing Rows Number Amount of Year Employee of sales sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Abbot 50 $75,757.00 Costello 33 $33,333.00 FY 2010 Hall 15 $15,500.00 Brooks 20 $12,345.67 Abbot 75 $99,999.99 Costello 44 $44,444.44 The above table has missing rows for combinations of the variables year and employee were no data were available. No data were available at all for the year 2009, and not all employees had sales data for both 2008 and 2010. If the table needs to have rows for all possible combinations of year and employee even if no data exists for those combinations values of the variables attached formats can be used to determine all possible combination of data values. DETERMINING A VARIABLE S FORMAT SAS can dynamically determine the name of a format attached to a variable with either of two different techniques: %SYSFUNC with SCL a dictionary table using PROC SQL Each method will be discussed separately below. METHOD 1 -- %SYSFUNC WITH SAS SCREEN CONTROL LANGUAGE %SYSFUNC was originally developed in SAS version 6.12 to allow the incorporation of SCL (SAS Component Language, formerly Screen Control Language) functions into the SAS macro programming environment. Among its many capabilities, %SYSFUNC can determine the existence of a SAS data set and characterize the attributes of the data set s variables, such as the variable s format. The SCL code to determine a variable s format and put the name of the format in a macro variable is: %let dsid = %sysfunc(open(in.sales, i)); %let varnum = %sysfunc(varnum(&dsid, EMPLOYEE)); %let format = %sysfunc(varfmt(&dsid, &varnum)); %let rc = %sysfunc(close(&dsid)); %put EMPLOYEE VARIABLE FORMAT = &format;
IN.SALES is the name of the data set, and EMPLOYEE is the name of the variable in the data set IN.SALES. The name of the format attached to EMPLOYEE is EMPLFMT., and the value of the macro variable &format will be displayed in the SAS log as: 999 %put EMPLOYEE VARIABLE FORMAT = &format; EMPLOYEE VARIABLE FORMAT = EMPLFMT. Note that the. in the format name is included in the value of the macro variable. METHOD 2 -- PROC SQL DICTIONARY TABLES Structured Query Language (SQL) is a standard and widely used language and has been implemented in SAS as PROC SQL. Dictionary tables provide metadata about SAS data sets and variables, and they can be generated at runtime by using PROC SQL The PROC SQL code to determine a variable s format using a dictionary table is given below: proc sql; create table formats as select format from dictionary.columns where upcase(libname) = 'IN' and upcase(memname) = 'SALES' and upcase(name) = ('EMPLOYEE') ; quit; proc print data=formats; The PROC PRINT of the data set FORMATS will look like: Obs format 1 EMPLFMT. Again, note that the. in the format name is included in the value of the variable. DETERMINING ALL VALUES IN A FORMAT USING THE CNTLOUT OPTION OF A SAS format library is stored in SAS as a catalog, and the values of a user-defined SAS format library can be put into a SAS data set using the CNTLOUT option of. This SAS format data set created with the CNTLOUT option contains multiple variables relevant to the format library, but the three variables in the CNTLOUT data set relevant to this analysis are: FMTNAME name of the format START starting value of the format LABEL descriptive label associated with the value of START
The SAS code to use the CNTLOUT= option with to put the format information in a SAS data set named FORMATLIB is given below: proc format library=library cntlout = formatlib (keep = fmtname start label); proc print data=formatlib; The PROC PRINT of the data set FORMATLIB will look like: Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello 7 YEARFMT 2008 FY 2008 8 YEARFMT 2009 FY 2009 9 YEARFMT 2010 FY 2010 For the CNTLOUT data created by note that the. in the format name is not included in the value of the variable FMTNAME, although the. is included in the format names generated by both the %SYSFUNC and SQL dictionary table methods shown above. Format names generated by the CNTLOUT = option of need to have a. appended to the end of the format name to be compared to the format other formats. A. can be appended using the CATS function shown below. fmtname = cats(fmtname,. ); USING ALL VALUES OF AN ATTACHED FORMAT TO FILL IN MISSING TABLE ROWS The two SAS programming techniques described above can be combined to be determine all possible combinations of a single variable, or as in this paper, combinations of two variables METHOD 1 -- %SYSFUNC WITH SAS SCREEN CONTROL LANGUAGE AND CNTLOUT %SYSFUNC and the CNTLOUT= option of can be combined in the following macro to determine the name of a format associated with a variable &VAR and put all of the format s defined values in an output data set &OUTVALS:
/************************************************/ /* OPTION 1: /* Get values of formats using %SYSFUNC and SCL /* Note that input data set name IN.SALES is /* coded in the macro /*************************************************/ %macro getfmt1(var=, outvals=); %let dsid %let varnum %let varfmt %let rc = %sysfunc(open(in.sales,i)); = %sysfunc(varnum(&dsid, &var)); = %sysfunc(varfmt(&dsid, &varnum)); = %sysfunc(close(&dsid)); %put &varfmt; proc format library = library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') = "&varfmt")); title "Data set &outvals from macro GETFMT1 -- SYSFUNC and SCL"; proc print data = &outvals; %mend getfmt1; %getfmt1(var=employee, outvals=empvals); %getfmt1(var=year, outvals=yearvals); Note that the input data set IN.SALES is hardcoded in the example of the macro given above. The two output data sets created by the above macro look like: Data set empvals from macro GETFMT1 -- SYSFUNC and SCL Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello
Data set yearvals from macro GETFMT1 -- SYSFUNC and SCL Obs FMTNAME START LABEL 1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 2010 METHOD 2 -- PROC SQL DICTIONARY TABLES AND CNTLOUT PROC SQL dictionary tables and the CNTLOUT= option of can also be combined in the following macro to determine the name of a format associated with a variable &VAR and put all of the format s defined values in an output data set &OUTVALS: /******************************************************************/ /* OPTION 2: /* Get values of formats using SQL Dictionary Table /* Note that input data set name IN.SALES is coded in the macro /*****************************************************************/ %macro getfmt2(var=, outvals=); proc sql; create table &var.fmt as select format from dictionary.columns where upcase(libname) = 'IN' and upcase(memname) = 'SALES' and upcase(name) = upcase("&var") ; quit; /*****************************************************/ /* Turn the name of the format into a macro variable /*****************************************************/ data _null_; set &var.fmt; call symputx("varfmt", format, 'L'); %put &varfmt;
proc format library=library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') = "&varfmt")); title "Data set &outvals from macro GETFMT2 -- SQL Dictionary Table"; proc print data = &outvals; %mend getfmt2; %getfmt2(var=employee, outvals=empvals); %getfmt2(var=year, outvals=yearvals); The output data from this macro GETFMT2 is given below and looks exactly the same as the output for the macro GETFMT1. Data set empvals from macro GETFMT2 -- SQL Dictionary Table Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello Data set yearvals from macro GETFMT2 -- SQL Dictionary Table Obs FMTNAME START LABEL 1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 2010 CREATE A DATA SET WITH ALL POSSIBLE VALUES OF BOTH VARIABLES, BASED ON ATTACHED FORMATS The following SAS code shows how to create a SAS data set with all possible values of the variables YEAR and EMPLOYEE, using the format values from the variables attached formats.
/********************************************************************/ /* Create SAS data sets of all possible values of YEAR and EMPLOYEE /********************************************************************/ data yearvals_mod (keep = year a); set yearvals; /*---------------------------------------------*/ /* Convert character variable START to numeric /*---------------------------------------------*/ year = input(trim(left(start)),8.); /*-------------------------------*/ /* Dummy variable for SQL join /*-------------------------------*/ a = 1; data empvals_mod (keep = employee b); set empvals; /*---------------------------------------------*/ /* Convert character variable START to numeric /*---------------------------------------------*/ employee = input(trim(left(start)),8.); /*-------------------------------*/ /* Dummy variable for SQL join /*-------------------------------*/ b = 1; /****************************************************************************/ /* Create a SAS data set of all possible combinations of YEAR and EMPLOYEE /* using all possible values of YEAR and EMPLOYEE /****************************************************************************/ proc sql; create table allrows as select year, employee from yearvals_mod y, empvals_mod e where y.a = e.b; quit; proc sort data=allrows; by year employee; The data set ALLROWS will look like this:
All rows needed for table Obs year employee 1 2008 1 2 2008 2 3 2008 3 4 2008 4 5 2008 5 6 2008 6 7 2009 1 8 2009 2 9 2009 3 10 2009 4 11 2009 5 12 2009 6 13 2010 1 14 2010 2 15 2010 3 16 2010 4 17 2010 5 18 2010 6 Formats will be attached to the variables after the merge with the actual data. CREATE A DATA SET WITH ZERO-FILLED VALUES WHEN DATA ARE MISSING The data set created above can be merged with the input data to the table program IN.SALES to create zero-filled rows as follows: proc sort data=in.sales out=sales; by year employee; data sales_all; merge allrows (in=a) sales (in=s); by year employee; /*------------------------------*/ /* Zero-fill the missing rows /*------------------------------*/ if a and ^s then do; num = 0; dollar = 0; end;
When the data set SALES_ALL is used as input to the PROC REPORT program, the table will have zero-filled rows for values of the variables that don t occur in the data. Report without Missing Rows Number Amount of Year Employee of sales sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Dunn 0 $0.00 Abbot 50 $75,757.00 Costello 33 $33,333.00 FY 2009 Hall 0 $0.00 Oates 0 $0.00 Brooks 0 $0.00 Dunn 0 $0.00 Abbot 0 $0.00 Costello 0 $0.00 FY 2010 Hall 15 $15,500.00 Oates 0 $0.00 Brooks 20 $12,345.67 Dunn 0 $0.00 Abbot 75 $99,999.99 Costello 44 $44,444.44 CONCLUSION This SAS programming techniques using SAS formats included in this paper demonstrate just a few of the many wonderful ways that SAS formats can be used to improve the SAS programming process. ACKNOWLEDGMENTS SAS is a Registered Trademark of the SAS Institute, Inc. of Cary, North Carolina. CONTACT INFORMATION Please contact the author with any comments, questions, or gardening tips: Kathy H. Fraeman United BioSource Corporation 7101 Wisconsin Avenue, Suite 600 Bethesda, MD 20832 (240) 235-2525 voice (301) 654-9864 fax kathy.fraeman@unitedbiosource.com