Demystifying Inherited Programs

Similar documents
An Introduction to SAS Macros

Base and Advance SAS

An Introduction to Macros Deb Cassidy

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike Zdeb, School of Public Health

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

PLA YING WITH MACROS: TAKE THE WORK OUT OF LEARNING TO DO MACROS. Arthur L. Carpenter

Making an RTF file Out of a Text File, With SAS Paper CC13

Program Validation: Logging the Log

A Macro To Generate a Study Report Hany Aboutaleb, Biogen Idec, Cambridge, MA

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Chapter 2: Getting Data Into SAS

Debugging. Where to start? John Ladds, SAS Technology Center, Statistics Canada.

PDF Multi-Level Bookmarks via SAS

Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

To understand the concept of candidate and primary keys and their application in table creation.

Macros are a block of code that can be executed/called on demand

An exercise in separating client-specific parameters from your program

Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS

Using an ICPSR set-up file to create a SAS dataset

Sandra Hendren Health Data Institute

e. That's accomplished with:

Intermediate SAS: Working with Data

Seminar Series: CTSI Presents

Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office

Using the CLP Procedure to solve the agent-district assignment problem

SAS CURRICULUM. BASE SAS Introduction

Abstract. Background. Summary of method. Using SAS to determine file and space usage in UNIX. Title: Mike Montgomery [MIS Manager, MTN (South Africa)]

22S:172. Duplicates. may need to check for either duplicate ID codes or duplicate observations duplicate observations should just be eliminated

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Run your reports through that last loop to standardize the presentation attributes

Advanced Macro Topics Steven First, Systems Seminar Consultants, Madison, WI

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Leave Your Bad Code Behind: 50 Ways to Make Your SAS Code Execute More Efficiently.

A Macro that can Search and Replace String in your SAS Programs

Unlock SAS Code Automation with the Power of Macros

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

SUGI 29 Data Warehousing, Management and Quality

ECLT 5810 SAS Programming - Introduction

Paper PO06. Building Dynamic Informats and Formats

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

1 Files to download. 3 Macro to list the highest and lowest N data values. 2 Reading in the example data file

How to use UNIX commands in SAS code to read SAS logs

Don't quote me. Almost having fun with quotes and macros. By Mathieu Gaouette

Going Under the Hood: How Does the Macro Processor Really Work?

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

2. Don t forget semicolons and RUN statements The two most common programming errors.

MARK CARPENTER, Ph.D.

Normalization in Databases

A Macro to Keep Titles and Footnotes in One Place

My SAS Grid Scheduler

DSCI 325: Handout 15 Introduction to SAS Macro Programming Spring 2017

Poster Frequencies of a Multiple Mention Question

&&&, ;;, and Other Hieroglyphics Advanced Macro Topics Chris Yindra, C. Y. Training Associates

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

A Cross-national Comparison Using Stacked Data

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Part 1. Getting Started. Chapter 1 Creating a Simple Report 3. Chapter 2 PROC REPORT: An Introduction 13. Chapter 3 Creating Breaks 57

Your Own SAS Macros Are as Powerful as You Are Ingenious

CS430 Final March 14, 2005

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS?

Moving a Large SAS System Into an MVS Production Environment The Trials and Tribulations

Internet, Intranets, and The Web

Electricity Forecasting Full Circle

PharmaSUG Paper TT11

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

DATA Step Debugger APPENDIX 3

SAS Tricks and Techniques From the SNUG Committee. Bhupendra Pant University of Western Sydney College

ABSTRACT INTRODUCTION THE GENERAL FORM AND SIMPLE CODE

A Tutorial on the SAS Macro Language

Template Versatility Using SAS Macro Language to Generate Dynamic RTF Reports Patrick Leon, MPH

capabilities and their overheads are therefore different.

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

Reading in Data Directly from Microsoft Word Questionnaire Forms

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

An Introduction to SAS Macros Steven First, Systems Seminar Consultants, Madison, WI

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

Creating and Executing Stored Compiled DATA Step Programs

The INPUT Statement: Where

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

A Time Saver for All: A SAS Toolbox Philip Jou, Baylor University, Waco, TX

Posters. Paper

Symbol Table Generator (New and Improved) Jim Johnson, JKL Consulting, North Wales, PA

The %let is a Macro command, which sets a macro variable to the value specified.

Paper SE04 Dynamic SAS Programming Techniques, or How NOT to Create Job Security Steven Beakley and Suzanne McCoy

Introduction OR CARDS. INPUT DATA step OUTPUT DATA 8-1

Contents. Overview How SAS processes programs Compilation phase Execution phase Debugging a DATA step Testing your programs

Implementing external file processing with no record delimiter via a metadata-driven approach

Dictionary.coumns is your friend while appending or moving data

22S:166. Checking Values of Numeric Variables

Name: Sample Answers Q.2(c) 7 7

Powerful SAS R Techniques from the Windows API Samuel T. Croker

Tired of CALL EXECUTE? Try DOSUBL

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

SAS/ACCESS 9.2. DATA Step Interface to CA-IDMS Reference. SAS Documentation

Chapter 1 The DATA Step

How to Think Through the SAS DATA Step

Data Step Hieroglyphics Harry Droogendyk, Stratia Consulting Inc., Lynden, Ontario

ES I INFORMATICS PRACTICES Set I Class XII Sec- A,B,C Date: April 2017 Time: 1 hr 10 min M.M.: 30

Transcription:

Demystifying Inherited Programs Have you ever inherited a program that leaves you scratching your head trying to figure it out? If you have not, chances are that some time in the future you will. While initially the program may work and not require any changes, this may not always be the case. At some point the program may require maintenance due to new requirements, system upgrades, data issues, etc. At that time it will be necessary to figure out what is really happening so you can modify the program. You have inherited the program provided in Figure 1. After an initial review of the program you have determined there is no documentation and while the code executes, it is not very readable. Figure 1: options nosource ls=80; %macro macro1; data z (keep=deptnum full); length full $16; set y; full=fname!! lname; format deptnum $deptfmt.; label full='employee name'; label deptnum='department name'; label empid='employee id'; %Mend; filename deptnam 'c:\temp\deptname.txt' ; filename fmtpgm 'c:\temp\deptfmt.sas'; Data x; infile deptnam end=eof pad; input @1 deptno $3. @5 deptname $15.; file fmtpgm; if _n_ = 1 then put "proc format; value $deptfmt"; put "'" deptno "' = '" deptname "'"; if eof then put "; "; %include fmtpgm; data y; input deptnum $ empid $ fname $ lname $; datalines; 101 1001 steve last 102 1002 fred derf 103 1003 mike ekim 104 1004 mary gold ;

%macro1; proc print label; title 'original report'; You decide to execute the program to see if the resulting log (Figure 2) and output (Figure 3) will help in deciphering the process. Figure 2: 1 options nosource ls=80; NOTE: The infile DEPTNAM is: FILENAME=c:\temp\deptname.txt, RECFM=V,LRECL=256 NOTE: The file FMTPGM is: FILENAME=c:\temp\deptfmt.sas, RECFM=V,LRECL=256 NOTE: 4 records were read from the infile DEPTNAM. The minimum record length was 11. The maximum record length was 14. NOTE: 6 records were written to the file FMTPGM. The minimum record length was 6. The maximum record length was 27. NOTE: The data set WORK.X has 4 observations and 2 variables. NOTE: The DATA statement used 0.28 seconds. NOTE: %INCLUDE (level 1) file FMTPGM is file c:\temp\deptfmt.sas. NOTE: Format $DEPTFMT has been output. NOTE: The PROCEDURE FORMAT used 0.33 seconds. NOTE: %INCLUDE (level 1) ending. NOTE: The data set WORK.Y has 4 observations and 4 variables. NOTE: The DATA statement used 0.17 seconds. NOTE: The data set WORK.Z has 4 observations and 2 variables. NOTE: The DATA statement used 0.17 seconds. NOTE: The PROCEDURE PRINT used 0.16 seconds. Figure 3: original report Employee Department OBS name name 1 steve last Purchasing 2 fred derf Payroll 3 mike ekim Personnel 4 mary gold Shipping

The original SAS log is very hard to decipher. We are left with the following questions: 1. How are the data sets X, Y, and Z created? 2. What data is within each of these data sets? 3. What code is being executed from the included file FMTPGM? At this point there are several steps we can take to make the code more readable and to begin to demystify the program: 1. Add RUN statements to explicitly see the end of SAS steps. 2. Indent statements within each of the SAS steps. 3. Turn on all source OPTIONS such as SOURCE, SOURCE2, MPRINT, and SYMBOLGEN to show more information in the SAS log which will be useful for tracing logic. 4. Draw a rough data flow, or program flowchart, to help track where data comes from, and where it goes. 5. Examine each step from the top down, noting data sources and transformations. 6. Ask questions. Even if the original programmer is no longer around someone may know about the code. We need to see what is really happening during the program execution. To accomplish this we turn on the options SOURCE, SOURCE2, MPRINT, and SYMBOLGEN and run the program again. The resulting log is provided in Figure 4. Figure 4: 3 options source source2 mprint symbolgen ls=80; 4 %macro macro1; 5 data z (keep=deptnum full); 6 length full $16; 7 set y; 8 full=fname!! lname; 9 format deptnum $deptfmt.; 10 label full='employee name'; 11 label deptnum='department name'; 12 label empid='employee id'; 13 %Mend; 14 filename deptnam 'c:\temp\deptname.txt' ; 15 filename fmtpgm 'c:\temp\deptfmt.sas'; 16 Data x; 17 infile deptnam end=eof pad; 18 input @1 deptno $3. @5 deptname $15.; 19 file fmtpgm; 20 if _n_ = 1 then 21 put "proc format; value $deptfmt"; 22 put "'" deptno "' = '" deptname "'"; 23 if eof then put "; ";

NOTE: The infile DEPTNAM is: FILENAME=c:\temp\deptname.txt, RECFM=V,LRECL=256 NOTE: The file FMTPGM is: FILENAME=c:\temp\deptfmt.sas, RECFM=V,LRECL=256 NOTE: 4 records were read from the infile DEPTNAM. The minimum record length was 11. The maximum record length was 14. NOTE: 6 records were written to the file FMTPGM. The minimum record length was 6. The maximum record length was 27. NOTE: The data set WORK.X has 4 observations and 2 variables. NOTE: The DATA statement used 0.27 seconds. 24 %include fmtpgm; NOTE: %INCLUDE (level 1) file FMTPGM is file c:\temp\deptfmt.sas. 25 +proc format; 26 +value $deptfmt 27 +'101 ' = 'Purchasing ' 28 +'102 ' = 'Payroll ' 29 +'103 ' = 'Personnel ' 30 +'104 ' = 'Shipping ' 31 +; NOTE: Format $DEPTFMT has been output. 32 + NOTE: The PROCEDURE FORMAT used 0.27 seconds. NOTE: %INCLUDE (level 1) ending. 33 data y; 34 input deptnum $ empid $ fname $ lname $; 35 datalines; NOTE: The data set WORK.Y has 4 observations and 4 variables. NOTE: The DATA statement used 0.17 seconds. 36 ; 37 38 %macro1; MPRINT(MACRO1): DATA Z (KEEP=DEPTNUM FULL); MPRINT(MACRO1): LENGTH FULL $16; MPRINT(MACRO1): SET Y; MPRINT(MACRO1): FULL=FNAME!! LNAME; MPRINT(MACRO1): FORMAT DEPTNUM $DEPTFMT.; MPRINT(MACRO1): LABEL FULL= 'Employee name'; MPRINT(MACRO1): LABEL DEPTNUM= 'Department name'; MPRINT(MACRO1): LABEL EMPID= 'employee id'; NOTE: The data set WORK.Z has 4 observations and 2 variables. NOTE: The DATA statement used 0.17 seconds.

47 proc print label; 48 title 'original report'; 49 NOTE: The PROCEDURE PRINT used 0.11 seconds. Now comes our challenge. We have been asked to make the following changes: Add a new department 105 (Receiving) Add a new employee Penn Teller Show the employee id on the report We determine that to add a department we must add a record to the file deptname.txt. We also determine that to add an employee we must add a record to the instream data used to build the data set Y. But, how do we add employee id to the report? By looking through the more complete log created with the options, several things become apparent. We see the data is manipulated in a macro and three data sets, X, Y, and Z, are created. We determine that the data from data set Z is output in the report. We also see the data set only keeps the department and name variables. To include the employee id on the report we must keep it on the data set. As we noticed, the data sets do not have meaningful names. This makes it more difficult to trace the data through the process. The program also does not contain any comments or documentation explaining the program. We can take the following steps to further clean up the program: 1. Add comments to provide an overview of the program and identify data inputs and outputs. 2. Add comments describing each step. 3. Keep a log of modifications. 4. Use meaningful data set and variable names. 5. Explicitly name all data sets when used. 6. Use clean coding styles, such as indentation, for readability. We have made the necessary changes, given the data sets meaningful names, cleaned up the code, and included comments. The revised program is provided in Figure 5. Figure 5: /******************************************************************/ /* PROJECT: Report Department Employees */ /* PROGRAM: c:\programs\dept_emp_list.sas */ /* DESCRIPTION: */ /* Create list of all employees within each department. */ /* INPUT: */ /* FILEREF DSN TYPE */

/* -------- --------------------------------------- ------- */ /* deptnam c:\temp\deptname.txt TEXT */ /* WRITTEN BY: SYSTEMS SEMINAR CONSULTANTS */ /* MADISON, WI */ /* (608) 278-9964 */ /******************************************************************/ /* CHANGES: */ /* DATE BY CHANGE */ /* 06/15/2012 Teresa Added receiving department and employee */ /* Penn Teller, and included employee id on */ /* the final report. */ /******************************************************************/ options source source2 mprint symbolgen linesize=80; /*******************************************************/ /* start macro1, this reads dataset empnam and creates */ /* the fullnam dataset for the report. */ /*******************************************************/ %macro macro1; data fullnam (keep=deptnum empid full); length full $16; set empnam; /* create full name from first and last */ /* trimming extra blanks after first name */ full = trim(fname)!! ' '!! lname; /* assign the format we created */ format deptnum $deptfmt.; /* assign labels to name, deptnumber, and empid */ label full='employee name'; label deptnum='department name'; label empid='employee id'; %Mend macro1; /* assign file with dept numbers/names for format */ filename deptnam 'c:\temp\deptname2.txt' ; /* assign file to write out proc format code */ filename fmtpgm 'c:\temp\deptfmt.sas'; /********************************************************/

/* create the code for the department name format */ /* this is the file with the department number and name */ /* we use the end of file option (end=) to allow */ /* writing last line of code in proc format, and the */ /* pad to prevent problems reading input lines that are */ /* too short. */ /********************************************************/ Data _null_; infile deptnam end=eof pad lrecl=50; /* read in a record */ input @1 deptno $3. @5 deptname $15. ; /* send output from put statements to file */ file fmtpgm; /* if we are processing the first record, write */ /* out the SAS code for the proc format first */ if _n_ = 1 then put "proc format; value $deptfmt"; /* for each record on the department name file, */ /* write out a record for the values in the */ /* format. This builds a line looking like: */ /* '1001' = 'Purchasing' */ xline = "'"!! deptno!! "' = '"!! deptname!! "'"; put xline; /* if we have processed the last record then write */ /* out the ending statements for the proc format */ if eof then put "; "; /* include and run the proc format step in file fmtpgm */ %include fmtpgm; /* step to read employee data and create dataset empnam */ data empnam; input deptnum $ empid $ fname $ lname $; datalines; 101 1001 steve last 102 1002 fred derf 103 1003 mike ekim 104 1004 mary gold

105 1005 penn teller ; /* macro reads empnam and creates fullnam */ /* and assigns the format we created */ %macro1; /* this step prints our report */ proc print data=fullnam label; title 'revised report'; The new results are provided in Figure 6. Figure 6: revised report Employee Department employee OBS name name id 1 steve last Purchasing 1001 2 fred derf Payroll 1002 3 mike ekim Personnel 1003 4 mary gold Shipping 1004 5 penn teller Receiving 1005 Besides allowing us to complete the requested changes we have hopefully helped anyone else who will read this program in the future. An important thing to remember is to assume every program you write or change will be used in the future. Whether it is you or someone else, clean code and documentation are invaluable to the next person who needs to work with the program.