Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma ABSTRACT Today there is more pressure on programmers to deliver summary outputs faster without sacrificing quality. By using just a few programming strategies, the %MAKE_IT_COUNT macro is simple, straightforward to understand and easily adapted for changing reporting needs. This paper shares an example macro and explores the use of MULTILABEL and PRELOADED formats, PROC SUMMARY options, and dynamic ARRAYs. INTRODUCTION Having adaptable programs are a must-have in every programmer s tool box to accomplish tight timelines. To start, a programmer needs to think beyond a summary table mock-up by adding dynamic data steps and procedure features. The challenge is writing a macro that is adaptable without being cryptic, so the program maintains efficiency as it s passed to a fellow programmer. The %MAKE_IT_COUNT macro accomplishes this goal by making use of MULTILABEL and PRELOADED formats, PROC SUMMARY options, and dynamic arrays while preserving readability. Figure 1 Sample Dataset To demonstrate the use of the %MAKE_IT_COUNT macro (see Appendix for complete program), a sample dataset, a sample Table Mock-up, and SAS 9.3 were used. In this sample subject level dataset (ADSL), there are 100 subjects from three sites (SITEN = 001,002,003) in three regions (SITE = US, Europe, Other) receiving one of three treatments (TRT01A). Sex and age variables are also included and an age grouping (AGEGR1) variable has been derived. Figure 2 Sample Table Mock-up 1
MACRO PREREQUISITES There are a few prerequisites for using the %MAKE_IT_COUNT macro. A sorting order will need to be added to the demographic formats. Also, to maintain easy modification, the table mock-up columns should use a MULTILABEL format. Lastly, create macro variables for each column s denominator. DEMOGRAPHIC FORMATS WITH SORTING ORDER First, the macro assumes that the demographic variables are formatted and the sorting order is part of the format. For example, in the region format ($region), the values US, Europe, and Other are mapped to 01 US, 02 Europe, and 03 Other so that the regions will be displayed in the correct order. With this approach, two birds are killed with one stone: one, the display format can be adjusted i.e. changing case, abbreviations, etc. and two, the need for extra sorting variables are eliminated. Next, each section or report block of the table mock-up is also formatted (see Figure 2). In Figure 3, the format $blk is defined for the report blocks Region, Gender, and Age Grouping. This format is used for creating the row text for the final output. MULTILABEL FORMATS The Total column is defined using a MULTILABEL format. In a MULTILABEL format, more than one data point can be mapped to a single format value. In this case, the three treatment groups (TRT01AN = 1,2,3) are mapped to a fourth value 04. With this approach, the need to derive a new treatment group Total prior to running PROC SUMMARY is eliminated. Using a MULTILABEL format is a pivotal piece in updating for future versions of the table mock-up. This will be discussed in greater detail later in the paper. DENOMINATORS AS MACRO VARIABLES Figure 3 Formats The macro assumes that the denominators for each column are stored in a macro variable. This program uses a simple PROC SQL creating four macro variables (BIGN01-BIGN04) for the four columns in the table mock-up (Treatment A, Treatment B, Placebo, and Total). Figure 4 Denominators THE MACRO %MAKE_IT_COUNT The macro consists of a PROC SUMMARY, a PROC TRANSPOSE, and a DATA step to produce a report-ready output. Each report block is run through the macro separately to maintain flexibility as the future versions of the table mock-up can add or delete sections. There are three macro variables in the definition: &DS, &BLK, &VAR. The macro variable &DS defines the dataset where &VAR is stored, &BLK refers to the report block as defined in the PROC FORMAT, and &VAR is the variable used in the counts and percentages in the report block. 2
Below is a step by step demonstration using the macro call %MAKE_IT_COUNT(BLK = BLK01, VAR = SITE). For this example, all three report blocks are pulled from a single dataset, ADSL. Therefore, DS is defined in the MACRO statement (see Appendix). PROC SUMMARY OPTIONS In this macro, PROC SUMMARY is used to get the frequencies or counts for each category. The macro makes use of the CLASS Options PRELOADFMT and MLF. The CLASS Option PRELOADFMT loads all the values of the formatted variable. These formatted values are now the actual values in the output dataset. The CLASS Option MLF signals PROC SUMMARY to load the many to one mapping defined in the format. Figure 5 PROC SUMMARY Although this example has counts in each level, in cases where not all levels of a categorical variable are present in the data, using a preloaded format will present zero counts. This eliminates the need to create any report shell to ensure proper display. Figure 5 shows the syntax and resulting dataset when the macro is called for the first block of the table, Region. The variable SITE now holds both the sort order and Region names. TRT01AN now has four levels the three treatments and the total (TRT01AN = 04 ). PROC TRANSPOSE Because of the strategic ordering of the CLASS variables, no PROC SORT is needed before the PROC TRANSPOSE. The TRANSPOSE syntax is straightforward: The BY variable is &VAR, the ID variable is the treatment variable TRT01AN, and the VAR variable is the SUMMARY default variable _FREQ_. Now, the TRT01AN values are the column names with the prefix CNT and their values are the counts in _FREQ_. Figure 6 PROC TRANSPOSE THE DATA STEP First in the DATA step, the variable BLK is created using the &BLK macro variable defined in the macro call. This variable can be used in a PROC REPORT as a GROUP variable to keep the report blocks in order and for additional formatting like skipped lines or pagination. PROC REPORT syntax is not included in the %MAKE_IT_COUNT macro and is beyond the scope of this paper. Figure 7 The DATA Step Next, a blank row is created to hold the report block row text. This was defined in the $blk format at the beginning of the program. In this example &BLK resolves to BLK01 and is formatted to Region. Then, the ARRAYS are defined for the counts (CNT01-CNT04) and for the derived variables PCNT01-PCNT04 that will hold the counts and percentages. In the ARRAY statement, a previously defined macro variable &MAXNO is used so that the total number of variables remains 3
dynamic. This coupled with the use of common variable prefixes (CNT and PCNT) allows the number of columns to be flexible for future changes. Figure 8 Adding a New Column Finally, the percentages are calculated and concatenated with the counts. The denominators are stored in the macro variables BIGN01- BIGN04 as defined prior to the macro call. In the ARRAY p, the macrotized denominators are called using SYMGET. To create a nice uniform formatting of the counts and percentages, picture formats (count and perc) are used. UPDATING THE MACRO This macro was designed to add columns or change denominators with minimum turnaround time. Below is a step by step guide for making these types of updates. First, a subtotal for Treatment A and Treatment B will be added. Then, for the report block Age Grouping, the denominator will be updated to use US patients only. ADDING A NEW COLUMN The new Table mock-up requests an addition subtotal column for Treatment A and Treatment B. To make this update, first add the definition of the subtotal column (TRT01AN = 1 and TRT01AN = 2) to the format grp and map it to column 03. Remember to reorder the subsequent columns. Second, modify the PROC SQL to define the denominator for the new subtotal column and adjust the subsequent columns denominators to the new order. That s all! Two quick updates and the resulting datasets from the macro are updated with a new subtotal column. Figure 9 Changing a Denominator CHANGING A DENOMINATOR Here, another new Table mock-up is requested and the denominator for the Age Grouping report block is changed to US patients only. To change a denominator for a particular report block, first the new denominator needs to be defined. Additional SQL Statements are added for these new denominators. Next, the percentage derivation is updated. Triggering on the &BLK variable, a series of IF-ELSE statements are added. Again, in two quick modifications, the new output is ready for reporting! 4
GOING BEYOND THE MACRO This macro is designed for counts and percentages, but these same strategies can be used to create another version for other uses. DESCRIPTIVE STATISTICS PROC SUMMARY is more powerful than just producing frequencies. PROC SUMMARY produces the same statistics as PROC MEANS. A few modifications can create a macro that produces descriptive statistics and beyond. LIMITATIONS It is important to note that this macro is limited to nine columns because of the ARRAY definitions. If more than nine columns are needed for reporting, the ARRAY statement could be adjusted for when &MAXNO is greater than nine. CONCLUSION In this paper, the %MAKE_IT_COUNT macro has been dissected and demonstrated with demographic data. With each step, the use of procedural options, MULTILABEL and PRELOADED formats, along with the use of dynamic ARRAYs eliminated the need for data duplication, additional procedures and DATA steps. Further, two modification scenarios, additional columns and changing denominators, have been explored showing this macro s flexibility with changes to table mock-ups. 5
APPENDIX **********************************************************************; * %MAKE_IT_COUNT EXAMPLE *; **********************************************************************; LIBNAME MY_LIB "D:\USERS\BGILBERT\DESKTOP"; PROC FORMAT; PICTURE COUNT LOW-HIGH =" 009"; PICTURE PCNT LOW-HIGH =" 009.9%)" (PREFIX = "("); VALUE $BLK "BLK01"= "REGION" "BLK02"= "GENDER" "BLK03"= "AGE GROUPING"; VALUE GRP (MULTILABEL) 1 = "01" 2 = "02" 1,2,3 = "04"; 3 = "03" VALUE $REGION (MULTILABEL) "US" = "01 US" "EUROPE" = "02 EUROPE" "OTHER" = "03 OTHER"; VALUE GENDER (MULTILABEL) 1= "01 MALE" 2= "02 FEMALE"; VALUE AGEGRP (MULTILABEL) 1= "01 <= 65 YEARS" 2= "02 > 65 YEARS"; DATA ADSL; SET MY_LIB.ADSL; FORMAT TRT01AN GRP. SITE $REGION. SEXN GENDER. AGEGR1N AGEGRP.; PROC SQL NOPRINT; SELECT COUNT (DISTINCT USUBJID) INTO: BIGN01 FROM ADSL WHERE TRT01AN = 1; SELECT COUNT (DISTINCT USUBJID) INTO: BIGN02 FROM ADSL WHERE TRT01AN = 2; SELECT COUNT (DISTINCT USUBJID) INTO: BIGN03 FROM ADSL WHERE TRT01AN = 3; SELECT COUNT (DISTINCT USUBJID) INTO: BIGN04 FROM ADSL; QUIT; %MACRO MAKE_IT_COUNT(DS=ADSL, BLK=, VAR=); PROC SUMMARY DATA = &DS. NWAY COMPLETETYPES; CLASS &VAR. TRT01AN/ PRELOADFMT MLF; OUTPUT OUT = FREQS; PROC TRANSPOSE DATA = FREQS OUT = R&BLK. PREFIX = CNT; BY &VAR.; ID TRT01AN; VAR _FREQ_; %GLOBAL MAXNO; DATA _NULL_; SET FREQS END = LR; NO = INPUT(SUBSTR(REVERSE(TRT01AN),1,1),8.); MAXNO = MAX(MAXNO,NO); IF LR THEN CALL SYMPUT("MAXNO",COMPRESS(PUT(MAXNO,8.))); %PUT &MAXNO; DATA &BLK. (KEEP = BLK ROW PCNT:); LENGTH BLK $6 ROW $200; SET R&BLK; BLK = "&BLK."; IF _N_ = 1 THEN DO; ROW = PUT("&BLK.",$BLK.); OUTPUT; END; ROW = " " SUBSTR(&VAR.,4); ARRAY C{*} CNT01-CNT0&MAXNO.; ARRAY P{*} $ 50 PCNT01-PCNT0&MAXNO.; DO I = 1 TO HBOUND(C); IF C{I} > 0 THEN P{I} = PUT(C{I}, COUNT.) " (" PUT(ROUND((C{I}/INPUT(SYMGET(COMPRESS("BIGN" PUT(I,Z2.))),8.)*100),.1), 5.1) "%)"; ELSE IF C{I} = 0 THEN P{I} = PUT(C{I}, COUNT.); END; OUTPUT; %MEND; %MAKE_IT_COUNT(BLK=BLK01, VAR=SITE); %MAKE_IT_COUNT(BLK=BLK02, VAR=SEXN); %MAKE_IT_COUNT(BLK=BLK03, VAR=AGEGR1N); DATA FINAL; SET BLK01 BLK02 BLK03; 6
ACKNOWLEDGMENTS First, I would like to thank my Lord, Jesus Christ. It is through Him that I find my strength, patience, and resolve. Next, I would like to thank my family: my encouraging husband, Justin, and my kids (Hope, Faith, Justin, Danny, Charity, and Paul) who are my never-ending source of happiness. RECOMMENDED READING Base SAS Language Reference Base SAS Procedures Guide CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Britney Gilbert Juniper Tree Consulting, LLC Britney.Gilbert@JuniperTreeConsulting.com www.junipertreeconsulting.com @JuniperTree19 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7