# A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

1 ABSTRACT: A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA Programmers often need to summarize data into tables as per template. But study data might not always have all the data combinations as required in the template. In such cases we need add rows that contain zeroes for data combinations that do not exist in the data and insert them into our results. This paper discusses the different methods to add rows containing zeroes. INTRODUCTION: Look at the template for a Demographics table (Table 1) below. Looks familiar? As programmers we often program tables like this. Here, we summarize the number of subjects within four race categories and Treatment group. Table 1: Demographics Table Often, the data does not have all the data combinations as required in the template as shown in the example data (Table 2) below. Here, we can see that we have subjects for all races for the group but subject in only one race for the group. Table 2: Example Data. In such cases we need rows containing zero to complete the display. Adding a zero row always seems like an unnecessary chore. This paper discusses the different methods in SAS for displaying all combinations when they are not present in the source data.

2 METHODS: (1) DUMMY DATASET METHOD In the Dummy Dataset Method results from Proc Freq or Proc Means and merged with a dummy dataset. A dummy dataset is a dataset which contains all data combinations as required in the table. As a result, we get a dataset with all data combinations as in the table template. COUNT COUNT 1 1 Asian 1 Asian 1 1 Asian Asian Asian Figure 1: Schematic Diagram showing the Dummy Dataset Method. Advantage: Straight Forward Method. Disadvantages: Need to make an extra Dummy dataset. The Dummy dataset is not data-driven. Creation of dummy dataset can be complex when many combinations of categories are required.

3 (2) PROC FREQ USING SPARSE OPTION: Proc Freq with SPARSE option outputs one record for each possible combination of values of the variables in the tables statement. It also automatically adds 0 in the count column for the combinations added by sparse. Syntax: proc freq data = dataset noprint; table trt * race / sparse out = dataset1 list missing ; run; COUNT * Note Asian category is not present in the final dataset because it is not present in the actual data Figure 2: Schematic Diagram showing the Proc Freq using SPARSE option Method. Advantages: Eliminates the extra steps needed to make a dummy dataset and the corresponding merge step. Makes the program more efficient than dummy dataset. Data-driven. Disadvantages: There must be at least one occurrence of the required value in the variable for SPARSE to create a full summary. Cannot be used reliably on interim data since the next cut-off data might be different.

4 (3) PROC MEANS USING COMPLETETYPES OPTION: Proc Means with COMPLETETYPES option is similar to the SPARSE option used in Proc Freq. It outputs one record for each possible combination of values of the variables in the class statement and automatically adds a 0 in the N column of the resultant output. Syntax: proc means data = dataset completetypes; class trt race; output out = aemean n = N; run; N * Note Asian category is not present in the final dataset because it is not present in the actual data Figure 3: Schematic Diagram showing the Proc Means using COMPLETETYPES option Method. Advantages: Eliminates the extra steps needed to make a dummy dataset and the corresponding merge step. Makes the program more efficient than dummy dataset. Data-driven. Disadvantages: There must be at least one occurrence of the required value in the variable for COMPLETETYPES to create a full summary. Cannot be used reliably on interim data since the next data might be different.

5 (4) PROC MEANS WITH COMPLETETYPES and PRELOADFMT OPTION: Proc Means with PRELOADFMT in combination with COMPLETETYPES option creates the output with all the possible combinations based on a format even if the combination doesn't exist in the input dataset. This method needs a format, completetypes in the proc means statement and preloadfmt in the class statement as shown below. proc format; value racef 1 = '' 2 = ' 3 = 'Asian' 4= '' ; quit; Syntax: proc means data = dm completetypes; class trt racen / preloadfmt; format racen racef.; output out = meanplf n = N; run; N 1 1 Asian Asian 0 0 * Note Asian category has been added to the final dataset due to PRELOADFMT Figure 4: Schematic Diagram showing the Proc Means using COMPLETETYPES and PRELOADFMT option Method. Advantages: There is no requirement to have at least one occurrence of a value in the data. Can be reliably used on all data.

6 Disadvantage: Need to create formats before using the PRELOADFMT option if the formats do not exist already. CONCLUSION: If you have data that is not going to change and there is one occurrence of each variable category then SPARSE or COMPLETETYPES option would be adequate. However, in most situations the actual data is dynamic, therefore using proc means with COMPLETETYPES and PRELOADFMT option provides the most complete data-driven solution. REFERENCES: Peter R. Welbrock. ACKNOWLEDGEMENT: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. brand and product names are registered trademarks or trademarks of their respective companies. I would like to sincerely thank Mr. Tony Pisegna for his constant support and encouragement. CONTACT INFORMATION: The author can be contacted at: Kajal Tahiliani ICON Clinical Research 2800 Kelly Rd, Suite 200 Warrington, PA Tel :

