Let SAS Write and Execute Your Data-Driven SAS Code

Similar documents
Get into the Groove with %SYSFUNC: Generalizing SAS Macros with Conditionally Executed Code

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

STEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>

Common Sense Tips and Clever Tricks for Programming with Extremely Large SAS Data Sets

Paper PO06. Building Dynamic Informats and Formats

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL

It s Proc Tabulate Jim, but not as we know it!

Multiple Facts about Multilabel Formats

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

The Power of Combining Data with the PROC SQL

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

PharmaSUG Paper SP04

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Paper # Jazz it up a Little with Formats. Brian Bee, The Knowledge Warehouse Ltd

Paper S Data Presentation 101: An Analyst s Perspective

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

The FORMAT procedure - more than just a VALUE statement Lawrence Heaton-Wright, Quintiles, Bracknell, UK

Real Time Clinical Trial Oversight with SAS

Not Just Merge - Complex Derivation Made Easy by Hash Object

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Summary Table for Displaying Results of a Logistic Regression Analysis

A SAS Macro for Generating Informative Cumulative/Point-wise Bar Charts

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

A Macro that can Search and Replace String in your SAS Programs

A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN

Reading in Data Directly from Microsoft Word Questionnaire Forms

Big Data? Faster Cube Builds? PROC OLAP Can Do It

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

Creating a Patient Profile using CDISC SDTM Marc Desgrousilliers, Clinovo, Sunnyvale, CA Romain Miralles, Clinovo, Sunnyvale, CA

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

ERROR: ERROR: ERROR:

How to Go From SAS Data Sets to DATA NULL or WordPerfect Tables Anne Horney, Cooperative Studies Program Coordinating Center, Perry Point, Maryland

Greenspace: A Macro to Improve a SAS Data Set Footprint

Abstract. The next section will demonstrate the Oracle SQL script and the SAS program, and provide more detailed discussion of this automated process.

100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

ABSTRACT INTRODUCTION PROBLEM: TOO MUCH INFORMATION? math nrt scr. ID School Grade Gender Ethnicity read nrt scr

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

The Path To Treatment Pathways Tracee Vinson-Sorrentino, IMS Health, Plymouth Meeting, PA

Merge Processing and Alternate Table Lookup Techniques Prepared by

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

Cody s Collection of Popular SAS Programming Tasks and How to Tackle Them

Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application

The Ins and Outs of %IF

Document and Enhance Your SAS Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata. Louise S. Hadden. Abt Associates Inc.

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Keh-Dong Shiang, Department of Biostatistics & Department of Diabetes, City of Hope National Medical Center, Duarte, CA

Reducing SAS Dataset Merges with Data Driven Formats

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

PROC REPORT Basics: Getting Started with the Primary Statements

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

%ANYTL: A Versatile Table/Listing Macro

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

Visit Connect Full User Guide R3.15

The Dataset Diet How to transform short and fat into long and thin

PharmaSUG Paper CC11

Checking for Duplicates Wendi L. Wright

The Ugliest Data I ve Ever Met

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Clinical Data Visualization using TIBCO Spotfire and SAS

zum.com Service Introduction

Baseline Candidate. Sex and Current Gender Input and Display. Quick Implementation Guide. Edition 1 20 th April 2010

A Lazy Programmer s Macro for Descriptive Statistics Tables

Automating the Production of Formatted Item Frequencies using Survey Metadata

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Administering OLAP with SAS/Warehouse Administrator(TM)

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

IF there is a Better Way than IF-THEN

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team

When ANY Function Will Just NOT Do

Quality Control of Clinical Data Listings with Proc Compare

Formats, Informats and How to Program with Them Ian Whitlock, Westat, Rockville, MD

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

PharmaSUG Paper TT11

Once the data warehouse is assembled, its customers will likely

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

ERROR: The following columns were not found in the contributing table: vacation_allowed

Out of Control! A SAS Macro to Recalculate QC Statistics

Utilizing the VNAME SAS function in restructuring data files

Mira Shapiro, Analytic Designers LLC, Bethesda, MD

A General SAS Macro to Implement Optimal N:1 Propensity Score Matching Within a Maximum Radius

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

Advanced Visualization using TIBCO Spotfire and SAS

T Sql In One Hour A Day Sams Teach Yourself

SAS Drug Development Program Portability

PharmaSUG Paper AD06

Developing Data-Driven SAS Programs Using Proc Contents

Obtaining the Patient Most Recent Time-stamped Measurements

Table Lookups: From IF-THEN to Key-Indexing

PROC SQL VS. DATA STEP PROCESSING JEFF SIMPSON SAS CUSTOMER LOYALTY

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Transcription:

Let SAS Write and Execute Your Data-Driven SAS Code Kathy Hardis Fraeman, United BioSource Corporation, Bethesda, MD ABSTRACT Some SAS programming statements are based on specific values of data, such as the conditions given in IF THEN ; ELSE IF THEN ; blocks. If the values of the data are static and the number of data values is small, a SAS programmer can simply hard code the possible data values directly in the program. However, in instances where hundreds of data values need to appear in IF THEN ; ELSE IF THEN ; blocks, or if specific data values can change with different executions of the same program, then a SAS program can generate data-driven SAS code and immediately execute that code as part of the program, as long as the data values are in a SAS data set. This technique involves using DATA _NULL_ to generate the SAS code and then a %INCLUDE statement to immediately run the code SAS generated with DATA _NULL_. The programming example of this technique shows how SAS can automatically generate the programming code needed to turn a character variable with hundreds of different text values into a formatted numeric variable, where the labels of the attached format are the same as the original text. All of the programming in this example is done without ever having to write out any of the character variable s text values, or even having to know the number of unique text values. This general programming technique can be used in many other applications where SAS programming is dependent on actual values of data. INTRODUCTION A lot of SAS programming is data-driven. Data-driven means that actual values of data need to be included in the SAS code for the programming to work. A simple example of data-driven programming is the conversion of a text value of gender into a formatted numeric variable, as shown in the following example: proc format; value genfmt 1 = "Female" 0 = "Male"; data reformat_patient_vars; set original_patient_vars; format gender_cat genfmt.; label gender_cat = "Gender Category"; if gender = "Male" then gender_cat = 0; else if gender = "Female" then gender_cat = 1; The programmer needs to know that the actual data values for the variable GENDER in the data are Male and Female for the program to work, and these actual data values need to be hard coded in the program. In many instances, the number of actual data values of a variable is not very large, and these values are static, meaning they don t change with different versions of the data. Data driven programming is easy in these many instances. DATA DRIVEN PROGRAMMING BECOMES DIFFICULT However, a SAS programmer can run into problems if the variable need for data-driven programming has hundreds of values that need to be hard coded in the SAS program. These problems will be compounded if the hundreds of data values can change each time the program is run, due to factors such as different input data and/or changes in programming rules. Writing the SAS programming that includes the hundreds of constantly changing data values is practically impossible! 1

I was faced with this situation when using Electronic Medical Records (EMR) data to try to identify complex chemotherapy regimens given to cancer patients. We had each patient s individual drug administration data for the hundreds of possible individual chemotherapy drugs and the dates these drugs were administered. However, because chemotherapy regimens consist of combinations of different drugs administered concurrently and in patterns, we needed to define rules for determining what individual drugs were included in the multi-drug chemotherapy regimens. These hundreds of individual chemotherapy drugs, when combined in different combinations, could produce even more hundreds of types of combination regimens. Further complicating the programming was the need to use different programming rules to define what constituted a regimen. These different drug combinations changed every time the regimen programming rules changed. Each patient in the study needed to have a single defined initial chemotherapy regimen. All of my standard report macros require a formatted numeric variable as input, and I obviously couldn t have hundreds of lines of if text_regimen = " Cyclophosphamide/Doxorubicin" then reg_cat = 1; else if text_regimen =" Fludarabine/Mitoxantrone/Rituximab" then reg_cat = 2; else if... else if text_regimen =" Doxorubicin/Vincristine" then reg_cat = 237; especially if these hundreds of different text values of chemotherapy regimen could change, either based on our input data or programming rules used to define a regimen. IF THESE DATA VALUES ARE IN A SAS DATA SET, HAVE SAS WRITE THE CODE Fortunately these hundreds of unique derived regimen names are found in a text variable REGIMEN_TEXT_NAME in a SAS data set. This SAS data set was created by taking all of the unique values of TEXT_REGIMEN in the patient-level data and then renaming the variable to REGIMEN_TEXT_NAME. This SAS data set also has a variable with the sequential number REG_CAT_NO that was to be the value of the variable REG_CAT as in the example above, reflecting the sort order (not necessarily alphabetic) that the regimen names are to appear in the report. Because these data values are in a SAS data set, they don t need to be written out by the programmer in hundreds of IF THEN ; ELSE IF THEN ; statements. SAS can write out these hundreds statements and then execute them! If the actual values of the regimen name change for different runs of the program, SAS will always use the current values of the regimen names found in the patient-level data. Finally, as an added bonus, SAS can even automatically generate the format for the numeric categorical variable. And all of this can be done without ever needing to know or hard code any of the regimen names. WRITING SAS CODE WITH SAS The name of the SAS data set with the regimen names is REGIMENS, and the variables in the data set are REGIMEN_TEXT_NAME (text name of regimen) and REG_CAT_NO sequential regimen number, starting with 1. The SAS code to create a SAS program with the hundreds of IF THEN ; ELSE IF THEN ; statements is: /*********************************************************************************/ /* Create SAS statements to be %included in the program that create the new /* categorical variable to correspond with the chemotherapy regimen name variable /*********************************************************************************/ set REGIMENS; file "PATHNAME\regimens.sas"; if REG_CAT_NO = 1 then put @5 "if REGIMEN_TEXT = '" REGIMEN_TEXT_NAME else put @5 "else if REGIMEN_TEXT = '" REGIMEN_TEXT_NAME 2

The output data set named REGIMENS.SAS generated by the above code looks like if regimen_text = 'Cyclophosphamide/Doxorubicin' then reg_cat = 1 ; else if regimen_text = 'Fludarabine/Mitoxantrone/Rituximab' then reg_cat = 2 ; else if... else if regimen_text = Doxorubicin/Vincristine' then reg_cat = 237 ; RUNNING THE DATA-DRIVEN SAS CODE WITH SAS To then run the above code generated by SAS with SAS, the following code can be used: /**********************************************************/ /* Dynamically create a SAS formats for the regimen name /**********************************************************/ data regimens_fmt; set regimens; fmtname = "regfmt"; start = reg_cat; label = regimen_text; proc format cntlin = regimens_fmt; /*******************************************************************************/ /* Create the patient data set with the reformatted variables /***************************************************************/ data reformat_patient_vars; set original_patient_vars; format gender_cat genfmt.; label gender_cat = "Gender Category"; if gender = "Male" then gender_cat = 0; else if gender = "Female" then gender_cat = 1; format reg_cat regfmt.; label reg_cat = "Regimen Category"; %inc "PATHNAME\regimens.sas" / source2; The source2 option in the %include statement causes the included SAS statements to appear in the SAS log. 3

SINGLE QUOTES, DOUBLE QUOTES AND MACRO CODE If using this technique with any form of SAS macro code, then the use of single quotes and double quotes matters. If a macro variable needs to be evaluated when creating the output file, use double quotes as follows: %let pathname = c:\myfile; set regimens; file "&pathname\regimens.sas"; if REG_CAT_NO = 1 then put @5 "if REGIMEN_TEXT = '" REGIMEN_TEXT_NAME else put @5 "else if REGIMEN_TEXT = '" REGIMEN_TEXT_NAME If this technique is being used to create macro code and the macro code just needs to be output as text, use single quotes as follows: %let pathname = c:\myfile; set macro_call_data; file "&pathname\runmacro.sas"; put @5 '%mymacro(grpvar = &grpvar, colnum = &colnum, '; put @15 'rownum = ' combo_num ','; put @15 'rowvar = ' flag_var_name ','; put @15 'label = ' flag_var_label ');'; %inc "&pathname\runmacro.sas" / source2; CONCLUSION This general programming technique can be used in many other applications where SAS programming is dependent on actual values of data. This general technique involves using DATA _NULL_ to generate the SAS code into an output SAS program, and then a %INCLUDE statement to immediately run the code SAS generated with the DATA _NULL_ statement. This technique can be used with many applications, and the level of complexity of the dynamically generated SAS code can be much more complex than the IF THEN ; ELSE IF THEN ; block of SAS code 4

ACKNOWLEDGMENTS SAS and other SAS Institute, Inc. products or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Please contact the author with any comments or questions: Kathy H. Fraeman United BioSource Corporation 7101 Wisconsin Avenue, Suite 600 Bethesda, MD 20832 (240) 235-2525 voice (301) 654-9864 fax kathy.fraeman@unitedbiosource.com 5