Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

Similar documents
Get into the Groove with %SYSFUNC: Generalizing SAS Macros with Conditionally Executed Code

Let SAS Write and Execute Your Data-Driven SAS Code

STEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>

Validation Summary using SYSINFO

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Conversion of CDISC specifications to CDISC data specifications driven SAS programming for CDISC data mapping

Common Sense Tips and Clever Tricks for Programming with Extremely Large SAS Data Sets

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

ABSTRACT. Paper CC-031

MOBILE MACROS GET UP TO SPEED SOMEWHERE NEW FAST Author: Patricia Hettinger, Data Analyst Consultant Oakbrook Terrace, IL

Uncommon Techniques for Common Variables

Dictionary.coumns is your friend while appending or moving data

10 The First Steps 4 Chapter 2

Calculating Cardinality Ratio in Two Steps

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

Merge Processing and Alternate Table Lookup Techniques Prepared by

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Know Thy Data : Techniques for Data Exploration

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

1 Files to download. 3 Macro to list the highest and lowest N data values. 2 Reading in the example data file

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Document and Enhance Your SAS Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata. Louise S. Hadden. Abt Associates Inc.

Taming a Spreadsheet Importation Monster

WHAT ARE SASHELP VIEWS?

Create Metadata Documentation using ExcelXP

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

Paper B GENERATING A DATASET COMPRISED OF CUSTOM FORMAT DETAILS

SAS File Management. Improving Performance CHAPTER 37

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

SQL Metadata Applications: I Hate Typing

Why choose between SAS Data Step and PROC SQL when you can have both?

Chapter 6: Modifying and Combining Data Sets

Contents of SAS Programming Techniques

Efficient Processing of Long Lists of Variable Names

. NO MORE MERGE - Alternative Table Lookup Techniques Dana Rafiee, Destiny Corporation/DDISC Group Ltd. U.S., Wethersfield, CT

work.test temp.test sasuser.test test

Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

Exploring DICTIONARY Tables and SASHELP Views

SAS Certification Handout #10: Adv. Prog. Ch. 5-8

A SAS Macro to Create Validation Summary of Dataset Report

How to Create Data-Driven Lists

Top 5 Handy PROC SQL Tips You Didn t Think Were Possible

Paper PO06. Building Dynamic Informats and Formats

An SQL Tutorial Some Random Tips

SAS coding for those who like to be control

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Top-Down Programming with SAS Macros Edward Heaton, Westat, Rockville, MD

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Extending the Scope of Custom Transformations

Building Intelligent Macros: Using Metadata Functions with the SAS Macro Language Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

Using Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Guide Users along Information Pathways and Surf through the Data

Reducing SAS Dataset Merges with Data Driven Formats

NO MORE MERGE. Alternative Table Lookup Techniques

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

Sample Questions. SAS Advanced Programming for SAS 9. Question 1. Question 2

The SERVER Procedure. Introduction. Syntax CHAPTER 8

Merging Data Eight Different Ways

Top 10 Ways to Optimize Your SAS Code Jeff Simpson SAS Customer Loyalty

Developing Data-Driven SAS Programs Using Proc Contents

Base and Advance SAS

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Identifying Duplicate Variables in a SAS Data Set

DBLOAD Procedure Reference

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Data Quality Review for Missing Values and Outliers

Introduction to PROC SQL

CHAPTER 7 Using Other SAS Software Products

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

A General SAS Macro to Implement Optimal N:1 Propensity Score Matching Within a Maximum Radius

T.I.P.S. (Techniques and Information for Programming in SAS )

Introduction. Getting Started with the Macro Facility CHAPTER 1

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

A Quick and Easy Data Dictionary Macro Pete Lund, Looking Glass Analytics, Olympia, WA

Using SAS software to fulfil an FDA request for database documentation

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

... ) city (city, cntyid, area, pop,.. )

This Too Shall Pass: Passing Simple and Complex Parameters In and Out of Macros

Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Tales from the Help Desk 6: Solutions to Common SAS Tasks

ABSTRACT INTRODUCTION MACRO. Paper RF

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

David Ghan SAS Education

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

SAS Online Training: Course contents: Agenda:

David Franklin Independent SAS Consultant TheProgramersCabin.com

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)

capabilities and their overheads are therefore different.

Program Validation: Logging the Log

SAS Macros for Grouping Count and Its Application to Enhance Your Reports

Transcription:

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny Kathy Hardis Fraeman, United BioSource Corporation, Bethesda, MD ABSTRACT Often tables or summary reports need to be produced with SAS where all possible values of one or more variables need to be included as rows in a table. However, the actual data to be summarized in a table might include variables that don t have all of the variables possible values, even though the table needs a corresponding zero-filled row for that variable value. These zero-filled table rows for non-existent variable values will be missing from the table unless additional programming is done. One programming method to make sure all rows are included would be to hard code all possible values of a variable, although this method could be tedious if a large number of variables and/or values are involved. A more dynamic method of determining all possible values of a variable is to attach a SAS format to each table variable, where the format contains all of the variable s possible values. SAS can dynamically determine the name of a format attached to a variable using SYSFUNC with SCL or a dictionary table using PROC SQL. SAS can then generate a data set of all possible values for the variable by using the CNTLOUT = <dataset> option of. The output data set generated from can be dynamically used to ensure that all possible values of a variable, even values that don t actually exist in the data, will be included as rows in a table. INTRODUCTION Tables or summary reports may need to be produced using SAS where all possible values of one or more variables must be included as rows in the tables. However, the actual input data to be summarized in such a table might include not include all possible combinations of data values of all relevant variables. If the table needs rows for all possible combinations of these data values, zero-filled table rows for these non-existent variable values will be missing from the table unless additional programming is done. One programming method to make sure all rows are included would be to hard code all possible values and combinations of values of a variable or variables. This method has the disadvantage of potentially being tedious if a large number of variables and/or values are involved, and not dynamic if the possible values of a variable will change over time. A more dynamic method of determining all possible values of a variable is to attach a SAS format to each variable, where the format contains all of the variable s possible values. This paper will show how determine the format attached to a variable, how to determine the data values defined in the format, and then how to use this information to create a table that will include rows for all possible data values as defined by the format. SAMPLE DATA The sample data in the SAS data set IN.SALES used in this paper are given below: Obs employee year num dollar 1 Hall FY 2008 10 $10,000.00 2 Hall FY 2010 15 $15,500.00 3 Oates FY 2008 8 $500.00 4 Brooks FY 2008 15 $11,111.00 5 Brooks FY 2010 20 $12,345.67 6 Abbot FY 2008 50 $75,757.00 7 Abbot FY 2010 75 $99,999.99 8 Costello FY 2008 33 $33,333.00 9 Costello FY 2010 44 $44,444.44

Both employee and year are numeric variables with attached formats. The formats used for those variables are given as: proc format library = library; value emplfmt 1 = "Hall" 2 = "Oates" 3 = "Brooks" 4 = "Dunn" 5 = "Abbot" 6 = "Costello" ; value yearfmt 2008 = "FY 2008" 2009 = "FY 2009" 2010 = "FY 2010" ; The format EMPLFMT is attached to the variable EMPLOYEE, and format YEARFMT is attached to the variable YEAR. These formats include all possible values for the variables and can be updated when new possible data values are added. REPORT WITH MISSING ROWS A report is needed using the sample data that gives the number of sales and the dollar amount of sales for each fiscal year and by employee within year. The PROC REPORT code to produce such a table is: title "Report with Missing Rows"; proc report data=in.sales nowindows headline headskip; column year employee num dollar; define year / group 'Year' order=data; define employee / display 'Employee' order = data; define num / display 'Number of sales'; define dollar / display 'Amount of sales'; break after year / skip; The report with the original data looks like:

Report with Missing Rows Number Amount of Year Employee of sales sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Abbot 50 $75,757.00 Costello 33 $33,333.00 FY 2010 Hall 15 $15,500.00 Brooks 20 $12,345.67 Abbot 75 $99,999.99 Costello 44 $44,444.44 The above table has missing rows for combinations of the variables year and employee were no data were available. No data were available at all for the year 2009, and not all employees had sales data for both 2008 and 2010. If the table needs to have rows for all possible combinations of year and employee even if no data exists for those combinations values of the variables attached formats can be used to determine all possible combination of data values. DETERMINING A VARIABLE S FORMAT SAS can dynamically determine the name of a format attached to a variable with either of two different techniques: %SYSFUNC with SCL a dictionary table using PROC SQL Each method will be discussed separately below. METHOD 1 -- %SYSFUNC WITH SAS SCREEN CONTROL LANGUAGE %SYSFUNC was originally developed in SAS version 6.12 to allow the incorporation of SCL (SAS Component Language, formerly Screen Control Language) functions into the SAS macro programming environment. Among its many capabilities, %SYSFUNC can determine the existence of a SAS data set and characterize the attributes of the data set s variables, such as the variable s format. The SCL code to determine a variable s format and put the name of the format in a macro variable is: %let dsid = %sysfunc(open(in.sales, i)); %let varnum = %sysfunc(varnum(&dsid, EMPLOYEE)); %let format = %sysfunc(varfmt(&dsid, &varnum)); %let rc = %sysfunc(close(&dsid)); %put EMPLOYEE VARIABLE FORMAT = &format;

IN.SALES is the name of the data set, and EMPLOYEE is the name of the variable in the data set IN.SALES. The name of the format attached to EMPLOYEE is EMPLFMT., and the value of the macro variable &format will be displayed in the SAS log as: 999 %put EMPLOYEE VARIABLE FORMAT = &format; EMPLOYEE VARIABLE FORMAT = EMPLFMT. Note that the. in the format name is included in the value of the macro variable. METHOD 2 -- PROC SQL DICTIONARY TABLES Structured Query Language (SQL) is a standard and widely used language and has been implemented in SAS as PROC SQL. Dictionary tables provide metadata about SAS data sets and variables, and they can be generated at runtime by using PROC SQL The PROC SQL code to determine a variable s format using a dictionary table is given below: proc sql; create table formats as select format from dictionary.columns where upcase(libname) = 'IN' and upcase(memname) = 'SALES' and upcase(name) = ('EMPLOYEE') ; quit; proc print data=formats; The PROC PRINT of the data set FORMATS will look like: Obs format 1 EMPLFMT. Again, note that the. in the format name is included in the value of the variable. DETERMINING ALL VALUES IN A FORMAT USING THE CNTLOUT OPTION OF A SAS format library is stored in SAS as a catalog, and the values of a user-defined SAS format library can be put into a SAS data set using the CNTLOUT option of. This SAS format data set created with the CNTLOUT option contains multiple variables relevant to the format library, but the three variables in the CNTLOUT data set relevant to this analysis are: FMTNAME name of the format START starting value of the format LABEL descriptive label associated with the value of START

The SAS code to use the CNTLOUT= option with to put the format information in a SAS data set named FORMATLIB is given below: proc format library=library cntlout = formatlib (keep = fmtname start label); proc print data=formatlib; The PROC PRINT of the data set FORMATLIB will look like: Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello 7 YEARFMT 2008 FY 2008 8 YEARFMT 2009 FY 2009 9 YEARFMT 2010 FY 2010 For the CNTLOUT data created by note that the. in the format name is not included in the value of the variable FMTNAME, although the. is included in the format names generated by both the %SYSFUNC and SQL dictionary table methods shown above. Format names generated by the CNTLOUT = option of need to have a. appended to the end of the format name to be compared to the format other formats. A. can be appended using the CATS function shown below. fmtname = cats(fmtname,. ); USING ALL VALUES OF AN ATTACHED FORMAT TO FILL IN MISSING TABLE ROWS The two SAS programming techniques described above can be combined to be determine all possible combinations of a single variable, or as in this paper, combinations of two variables METHOD 1 -- %SYSFUNC WITH SAS SCREEN CONTROL LANGUAGE AND CNTLOUT %SYSFUNC and the CNTLOUT= option of can be combined in the following macro to determine the name of a format associated with a variable &VAR and put all of the format s defined values in an output data set &OUTVALS:

/************************************************/ /* OPTION 1: /* Get values of formats using %SYSFUNC and SCL /* Note that input data set name IN.SALES is /* coded in the macro /*************************************************/ %macro getfmt1(var=, outvals=); %let dsid %let varnum %let varfmt %let rc = %sysfunc(open(in.sales,i)); = %sysfunc(varnum(&dsid, &var)); = %sysfunc(varfmt(&dsid, &varnum)); = %sysfunc(close(&dsid)); %put &varfmt; proc format library = library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') = "&varfmt")); title "Data set &outvals from macro GETFMT1 -- SYSFUNC and SCL"; proc print data = &outvals; %mend getfmt1; %getfmt1(var=employee, outvals=empvals); %getfmt1(var=year, outvals=yearvals); Note that the input data set IN.SALES is hardcoded in the example of the macro given above. The two output data sets created by the above macro look like: Data set empvals from macro GETFMT1 -- SYSFUNC and SCL Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello

Data set yearvals from macro GETFMT1 -- SYSFUNC and SCL Obs FMTNAME START LABEL 1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 2010 METHOD 2 -- PROC SQL DICTIONARY TABLES AND CNTLOUT PROC SQL dictionary tables and the CNTLOUT= option of can also be combined in the following macro to determine the name of a format associated with a variable &VAR and put all of the format s defined values in an output data set &OUTVALS: /******************************************************************/ /* OPTION 2: /* Get values of formats using SQL Dictionary Table /* Note that input data set name IN.SALES is coded in the macro /*****************************************************************/ %macro getfmt2(var=, outvals=); proc sql; create table &var.fmt as select format from dictionary.columns where upcase(libname) = 'IN' and upcase(memname) = 'SALES' and upcase(name) = upcase("&var") ; quit; /*****************************************************/ /* Turn the name of the format into a macro variable /*****************************************************/ data _null_; set &var.fmt; call symputx("varfmt", format, 'L'); %put &varfmt;

proc format library=library cntlout = &outvals (keep = fmtname start label where = (cats(fmtname,'.') = "&varfmt")); title "Data set &outvals from macro GETFMT2 -- SQL Dictionary Table"; proc print data = &outvals; %mend getfmt2; %getfmt2(var=employee, outvals=empvals); %getfmt2(var=year, outvals=yearvals); The output data from this macro GETFMT2 is given below and looks exactly the same as the output for the macro GETFMT1. Data set empvals from macro GETFMT2 -- SQL Dictionary Table Obs FMTNAME START LABEL 1 EMPLFMT 1 Hall 2 EMPLFMT 2 Oates 3 EMPLFMT 3 Brooks 4 EMPLFMT 4 Dunn 5 EMPLFMT 5 Abbot 6 EMPLFMT 6 Costello Data set yearvals from macro GETFMT2 -- SQL Dictionary Table Obs FMTNAME START LABEL 1 YEARFMT 2008 FY 2008 2 YEARFMT 2009 FY 2009 3 YEARFMT 2010 FY 2010 CREATE A DATA SET WITH ALL POSSIBLE VALUES OF BOTH VARIABLES, BASED ON ATTACHED FORMATS The following SAS code shows how to create a SAS data set with all possible values of the variables YEAR and EMPLOYEE, using the format values from the variables attached formats.

/********************************************************************/ /* Create SAS data sets of all possible values of YEAR and EMPLOYEE /********************************************************************/ data yearvals_mod (keep = year a); set yearvals; /*---------------------------------------------*/ /* Convert character variable START to numeric /*---------------------------------------------*/ year = input(trim(left(start)),8.); /*-------------------------------*/ /* Dummy variable for SQL join /*-------------------------------*/ a = 1; data empvals_mod (keep = employee b); set empvals; /*---------------------------------------------*/ /* Convert character variable START to numeric /*---------------------------------------------*/ employee = input(trim(left(start)),8.); /*-------------------------------*/ /* Dummy variable for SQL join /*-------------------------------*/ b = 1; /****************************************************************************/ /* Create a SAS data set of all possible combinations of YEAR and EMPLOYEE /* using all possible values of YEAR and EMPLOYEE /****************************************************************************/ proc sql; create table allrows as select year, employee from yearvals_mod y, empvals_mod e where y.a = e.b; quit; proc sort data=allrows; by year employee; The data set ALLROWS will look like this:

All rows needed for table Obs year employee 1 2008 1 2 2008 2 3 2008 3 4 2008 4 5 2008 5 6 2008 6 7 2009 1 8 2009 2 9 2009 3 10 2009 4 11 2009 5 12 2009 6 13 2010 1 14 2010 2 15 2010 3 16 2010 4 17 2010 5 18 2010 6 Formats will be attached to the variables after the merge with the actual data. CREATE A DATA SET WITH ZERO-FILLED VALUES WHEN DATA ARE MISSING The data set created above can be merged with the input data to the table program IN.SALES to create zero-filled rows as follows: proc sort data=in.sales out=sales; by year employee; data sales_all; merge allrows (in=a) sales (in=s); by year employee; /*------------------------------*/ /* Zero-fill the missing rows /*------------------------------*/ if a and ^s then do; num = 0; dollar = 0; end;

When the data set SALES_ALL is used as input to the PROC REPORT program, the table will have zero-filled rows for values of the variables that don t occur in the data. Report without Missing Rows Number Amount of Year Employee of sales sales ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FY 2008 Hall 10 $10,000.00 Oates 8 $500.00 Brooks 15 $11,111.00 Dunn 0 $0.00 Abbot 50 $75,757.00 Costello 33 $33,333.00 FY 2009 Hall 0 $0.00 Oates 0 $0.00 Brooks 0 $0.00 Dunn 0 $0.00 Abbot 0 $0.00 Costello 0 $0.00 FY 2010 Hall 15 $15,500.00 Oates 0 $0.00 Brooks 20 $12,345.67 Dunn 0 $0.00 Abbot 75 $99,999.99 Costello 44 $44,444.44 CONCLUSION This SAS programming techniques using SAS formats included in this paper demonstrate just a few of the many wonderful ways that SAS formats can be used to improve the SAS programming process. ACKNOWLEDGMENTS SAS is a Registered Trademark of the SAS Institute, Inc. of Cary, North Carolina. CONTACT INFORMATION Please contact the author with any comments, questions, or gardening tips: Kathy H. Fraeman United BioSource Corporation 7101 Wisconsin Avenue, Suite 600 Bethesda, MD 20832 (240) 235-2525 voice (301) 654-9864 fax kathy.fraeman@unitedbiosource.com