Developing Data-Driven SAS Programs Using Proc Contents

Similar documents
Data Quality Review for Missing Values and Outliers

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

PharmaSUG Paper TT11

Validation Summary using SYSINFO

ABSTRACT INTRODUCTION THE GENERAL FORM AND SIMPLE CODE

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

Uncommon Techniques for Common Variables

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

Going Under the Hood: How Does the Macro Processor Really Work?

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

WHAT ARE SASHELP VIEWS?

Creating Macro Calls using Proc Freq

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

PDF Multi-Level Bookmarks via SAS

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Taming a Spreadsheet Importation Monster

... ) city (city, cntyid, area, pop,.. )

PharmaSUG Paper PO12

Unlock SAS Code Automation with the Power of Macros

Know Thy Data : Techniques for Data Exploration

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

ABSTRACT INTRODUCTION MACRO. Paper RF

Coders' Corner. Paper Scrolling & Downloading Web Results. Ming C. Lee, Trilogy Consulting, Denver, CO. Abstract.

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

Program Validation: Logging the Log

Checking for Duplicates Wendi L. Wright

SAS Macros for Grouping Count and Its Application to Enhance Your Reports

Run your reports through that last loop to standardize the presentation attributes

KEPT IN TRANSLATION: AVOIDING DATA LOSS AND OTHER PROBLEMS WHEN CONVERTING JAPANESE DATA

Efficient Processing of Long Lists of Variable Names

Keeping Track of Database Changes During Database Lock

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Copy That! Using SAS to Create Directories and Duplicate Files

Adjusting for daylight saving times. PhUSE Frankfurt, 06Nov2018, Paper CT14 Guido Wendland

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

STEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

&&&, ;;, and Other Hieroglyphics Advanced Macro Topics Chris Yindra, C. Y. Training Associates

Dictionary.coumns is your friend while appending or moving data

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Host Systems. SEsua '95 Proceedings. SAS to.dbfby Way ofmvs

Hugh Geary, Neoprobe Corporation, Dublin, Ohio

Quality Control of Clinical Data Listings with Proc Compare

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

How to Create Data-Driven Lists

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

A Few Quick and Efficient Ways to Compare Data

SQL Metadata Applications: I Hate Typing

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

22S:172. Duplicates. may need to check for either duplicate ID codes or duplicate observations duplicate observations should just be eliminated

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

One Project, Two Teams: The Unblind Leading the Blind

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

Tales from the Help Desk 6: Solutions to Common SAS Tasks

MOBILE MACROS GET UP TO SPEED SOMEWHERE NEW FAST Author: Patricia Hettinger, Data Analyst Consultant Oakbrook Terrace, IL

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

The REPORT Procedure CHAPTER 32

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

EXAMPLES OF DATA LISTINGS AND CLINICAL SUMMARY TABLES USING PROC REPORT'S BATCH LANGUAGE

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Internet/Intranet, the Web & SAS

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

PROC REPORT AN INTRODUCTION

A Macro to Keep Titles and Footnotes in One Place

The DATA Statement: Efficiency Techniques

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

SAS Data Integration Studio Take Control with Conditional & Looping Transformations

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

USING DATA TO SET MACRO PARAMETERS

A Quick and Gentle Introduction to PROC SQL

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

Best Practices for Using the SAS Scalable Performance Data Server in a SAS Grid environment

GET A GRIP ON MACROS IN JUST 50 MINUTES! Arthur Li, City of Hope Comprehensive Cancer Center, Duarte, CA

Symbol Table Generator (New and Improved) Jim Johnson, JKL Consulting, North Wales, PA

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

A Cross-national Comparison Using Stacked Data

The Dataset Diet How to transform short and fat into long and thin

A Cross-reference for SAS Data Libraries

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

SAS Online Training: Course contents: Agenda:

Justina M. Flavin, Synteract, Inc.

Introduction. Getting Started with the Macro Facility CHAPTER 1

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

3N Validation to Validate PROC COMPARE Output

Transcription:

Developing Data-Driven SAS Programs Using Proc Contents Robert W. Graebner, Quintiles, Inc., Kansas City, MO ABSTRACT It is often desirable to write SAS programs that adapt to different data set structures without being modified. Such programs are referred to as data-driven programs because they assess the structure of the data set they are working with and automatically adapt to that structure. In SAS, the macro language can be used in conjunction with PROC CONTENTS to produce such programs. In this paper examples are provided to illustrate how this technique can be used to reduce programming and maintenance effort in a variety of situations. This paper is intended for experienced SAS programmers who have a basic understanding of the SAS macro language. INTRODUCTION The SAS macro language provides powerful capabilities for writing flexible programs that can behave differently depending on the parameters passed to them. A common use of macros is to reduce repetition in programs. An example from the pharmaceutical industry is the need to produce summary listing for each subject with the subject ID included in the report title. To accomplish this, you could write PROC REPORT code and make a copy for each subject then change the ID in each title statement. A more efficient way would be to put the PROC REPORT code in a macro, pass subject ID as a parameter and then use macro variable substitution to place the ID in the title. This method is simple when there are only a few differences between each report, but what do you do when you need reports for many different data sets with different structures? PROC CONTENTS provides a simple solution with its capability of storing data set structure information in a data set. This information can then be stored in macro variables and used to build SAS programming statements tailored to the data set you are working with. For example, in PROC REPORT, the variable names could be used to construct the COLUMN statements. In addition to making your program more generic, you also eliminate many errors. Because the variable names are obtained from PROC CONTENTS, you are assured that all variables will be included and that they will all be spelled correctly. Information on type, length, label and format can be used in a similar fashion to produce a DEFINE statement for each variable. This method can be used to generate SAS programming statements for any SAS procedure that utilizes data set structure information. There are two basic ways in which this process can be used in your programs. The first is to use macro variable substitution in the source code run by your current session. This has the advantage that your program can be generic and self contained. The second method is to use a DATA _NULL_ step with a series of PUT statements that use macro variable substitution to create a text file containing SAS source code statements. An advantage of this method is that you can modify the program before you run it. This is helpful when you are not able to handle all coding needs in your macro. It also allows you to give the source code to clients without giving away your macro technology. METHOD PROC CONTENTS has several features that make it useful in developing data-driven applications. It can determine the structure of any data set in a SAS library by using the DATA= LIBNAME.MEMBER option or it can process all data sets in a library at once by specifying DATA= LIBNAME._ALL_. By using the OUT= LIBNAME.MEMBER and NOPRINT options you can send the output to a SAS data set and suppress printed output. The resulting data set contains a series of variables describing the data set structures with an observation for each variable in each data set. The most useful variables are listed below. PROC CONTENTS Variable LIBNAME MEMNAME NAME TYPE LENGTH LABEL FORMAT FORMATL FORMATD INFORMAT INFORMATL IINFORMATD Description SAS Library Name SAS Library Member Name Variable Name Variable Type (1= numeric, 2= character) Variable length Variable Label Variable Format Format Length Format Decimals Variable Informat Informat Length Informat Decimals Creating a structure data set is very simple, an example is given below. proc contents data= &saslib..&ds out=struct position noprint; The macro variables that indicate the SAS library and data set name to be used are passed as parameters to the macro that contains the call to PROC CONTENTS. The PROC CONTENTS output is stored in a temporary data set called struct. The position option specifies that the observations will be ordered by the location of variables

in the data set rather than alphabetically by variable name. The NOPRINT option suppresses printed output of the PROC CONTENTS results. The next step is to place the desired information into SAS macro variables. Because these variables are often used in iterative processes, it is desirable to have them in an array. While the SAS macro language does not support arrays, you can simulate arrays (sometimes called pseudo arrays) by using multi-pass macro variable resolution. The following source code creates a pseudo array containing the variable names and types from the data set struct. The SYMPUT function is used to store the data set variables NAME and TYPE into macro variables that have the observation number added to the end of variable name (e.g. var1, var2, etc.) to facilitate referencing them in a %DO loop. The last observation number is stored as well to serve as the upper limit for the %DO loop. set struct end=last; call symput('var' left(_n_), name); call symput('type' left(_n_),type); if last then call symput('numrec', _N_); As mentioned earlier, one use of this information is to use macro variable substitution to form the necessary SAS source code when the macro is resolved. The following example loops through all variables in the data set and calls PROC FREQ for each one. %do i = 1 %to &numrec; proc freq data= &saslib..&ds; tables &&var&i; % The macro variable reference &&var&i will be resolved in two passes. When I = 1, the first pass will resolve to &var1 and the second pass will resolve to the string stored in &var1 which will be the name of the first variable in the data set. Another option is to use a DATA _NULL_ step and PUT statements to generate SAS source code. The example below uses this method to generate PROC REPORT code. The source code is put in the text file referenced in the FILE statements. The MOD option is used so that each successive DATA step will append to the file rather than overwrite it. set struct end=last; file sascode mod; if _N_ = 1 then do; put / "proc report data=&saslib..&ds missing nowindows headline headskip split='\';"/ ' column ' @; linelen + length(name); if linelen >= 70 then do; put / +9 @; linelen = 10; put name @; if last then put ';'; set struct end=last; file sascode mod; length clabel $ 120 coltype $ 7; vnwidth = length( trim(name)); if name in('patno','visit') then coltype = 'order'; else coltype = 'display'; select; when(length <= 4) clabel = "define " name " / " coltype " width=" put( max(vnwidth, 4), 2.) "left;"; when(4 < length <= 20) clabel = "define " name " / " coltype " width=" put( max(vnwidth, length), 2.) " left;"; when(length > 20) clabel = "define " name " / " coltype " width=20" " left flow;"; put @3 clabel; if last then do; put " title1 'QC Listing Report for &ds';" / '';

To start the program generation, a PROC REPORT statement and the associated options are written. Because this is only needs to be done once, before the variablespecific statements are written, an IF statement is used so that this line is only written when _N_ equals one. The macro variables &SASLIB and &DS contain the SAS library name and the data set name. Because these macro variables need to be resolved as the source code is generated, double quotes are used to surround the string that contains them. If you need to include a macro variable reference in the source code you generate, use single quotes to enclose the string. The remaining statements in this DATA step are used to create a COLUMN statement that contains the names of all the variables in the data set to be reported from. The second DATA step is used to create DEFINE statements for each column. This section illustrates how conditional processing can be used to generate source code that is dependent on each variable s attributes. When the length of a variable is less than or equal to four, the column width is set to the maximum of the length of the variable name and four. This guarantees that you will not have any columns narrower than four spaces. When the length is greater than four, but less than or equal to 20, the column width is equal to the maximum of the length of the name and the length of the variable. When the length is greater than 20, the column width is set to 20 and the FLOW option is used for column wrapping. After the last variable is reached, a TITLE statement is generated. This source code is part of a macro that receives the data set name in the parameter DS. If you need to perform generate source code for multiple data sets, you can write another macro that creates a pseudo array containing the names of the required data sets and then calls the source code generating macro for each data set. An example of such a macro is given below. %macro qcrptgen (saslib, codefile); options nolabel nofmterr; filename sascode "&codefile"; /**** CREATE PROGRAM HEADER ****/ file sascode mod; gendate = put( today(), date9.); put "/***********************"; put " QC Listing for: &saslib"; put " Program name :: &codefile"; put " Authors name :: "; put " Date started :: " gendate /; put " Source code generated by the ; put QCList Macro."; put "******************************/"; /**** CREATE A DATASET CONTAINING THE NAMES OF ALL DATASETS IN THE LIBRARY SASLIB ****/ proc contents data=&saslib.._all_ out = libmem (keep= memname varnum) position noprint; proc sort data=libmem nodupkey; by memname; /**** CREATE A PSUEDO-ARRAY (D1..Dn) OF MACRO VARIABLES CONTAINING EACH DATASET NAME ****/ set libmem end=last; call symput('d' left(_n_), memname); if last then call symput('numrec', _N_); /**** CALL THE SOURCE CODE GENERATOR FOR EACH DATASET ****/ %do i = 1 %to &numrec; %prgen(&&d&i); % %mend qcrptgen; This macro has two parameters; SASLIB, which contains the name of the SAS library to use, and CODEFILE, which contains the full path and filename of the source code file you want to create. The first DATA step generates a program header. The next step uses PROC CONTENTS to create a data set that contains the data structure information for all data sets in the specified SAS library. The purpose of this data set is to provide a list of data sets to generate PROC REPORT source code for. The structure data set created by PROC CONTENTS will have one observation for each variable in each data set. PROC SORT with the NODUPKEY option, using MEMNAME (data set name) as the by variable, is used to create a data set with one observation per data set in the specified library.

The next DATA step uses the SYMPUT function to store the data set names in a pseudo array of macro variables. This allows a %DO loop to be used to call the source code generating macro for each of the data sets. CONCLUSION The methods presented in this paper illustrate how PROC CONTENTS and the SAS macro language can be used to increase programming efficiency by creating data-driven programs or by generating SAS source code. ACKNOWLEDGMENTS SAS is a registered trademark or trademark of SAS Institute, Inc. in the USA and other countries. Indicates USA registration. CONTACTING THE AUTHOR Robert Graebner Quintiles, Inc. P.O. Box 9708 Kansas City, MO 64134-0708 Email: Web Site: bob.graebner@quintiles.com graetech@grapevine.net Quintiles.com www.grapevine.net/~graetech

M W S U G Data Management Jazz Up Your SAS Skills in