Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

Similar documents
Are Your SAS Programs Running You?

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

All About SAS Dates. Marje Fecht Senior Partner, Prowerk Consulting. Copyright 2017 Prowerk Consulting

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

SAS Scalable Performance Data Server 4.3

ERROR: The following columns were not found in the contributing table: vacation_allowed

Getting the Right DATES

SAS System Powers Web Measurement Solution at U S WEST

I KNOW HOW TO PROGRAM IN SAS HOW DO I NAVIGATE SAS ENTERPRISE GUIDE?

What s New in SAS Studio?

Useful Tips from my Favorite SAS Resources

Base and Advance SAS

Extending the Scope of Custom Transformations

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Best Practice for Creation and Maintenance of a SAS Infrastructure

Useful Tips When Deploying SAS Code in a Production Environment

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC

Batch vs. Interactive: Why You Need Both Janet E. Stuelpner. ASG. Inc Cary. North Carolina

Make it a Date! Setting up a Master Date View in SAS

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

Interleaving a Dataset with Itself: How and Why

PREREQUISITES FOR EXAMPLES

Hello World! Getting Started with the SAS DS2 Language

Guide Users along Information Pathways and Surf through the Data

Using Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

STATION

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

SAS ENTERPRISE GUIDE USER INTERFACE

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

SAS Macro Programming for Beginners

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV

Making the most of SAS Jobs in LSAF

Simple Rules to Remember When Working with Indexes

Files Arriving at an Inconvenient Time? Let SAS Process Your Files with FILEEXIST While You Sleep

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA

Parallelizing Windows Operating System Services Job Flows

Demystifying PROC SQL Join Algorithms

Merging Data Eight Different Ways

Innovative Performance Improvements Through Automated Flowcharts In SAS

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Bruce Gilsen, Federal Reserve Board

Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Using the SQL Editor. Overview CHAPTER 11

How to Create Data-Driven Lists

A Practical Introduction to SAS Data Integration Studio

Improving Your Relationship with SAS Enterprise Guide Jennifer Bjurstrom, SAS Institute Inc.

Program Validation: Logging the Log

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

SAS Business Rules Manager 2.1

SCSUG-2017 SAS Grid Job Search Performance Piyush Singh, Ghiyasuddin Mohammed Faraz Khan, Prasoon Sangwan TATA Consultancy Services Ltd.

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

SAS/Warehouse Administrator Usage and Enhancements Terry Lewis, SAS Institute Inc., Cary, NC

ABSTRACT Have you been programming in SAS for a while and just aren t sure how Enterprise Guide can help you? It isn t just a pretty face!

Macros I Use Every Day (And You Can, Too!)

A Quick and Gentle Introduction to PROC SQL

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

PharmaSUG Paper PO12

Introduction. Understanding SAS/ACCESS Descriptor Files. CHAPTER 3 Defining SAS/ACCESS Descriptor Files

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

Integration Best Practices: Net Change Design Patterns

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

The 'SKIP' Statement

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

SAS Business Rules Manager 1.2

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

SASe vs OB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, Inc.

SAS Job Monitor 2.2. About SAS Job Monitor. Overview. SAS Job Monitor for SAS Data Integration Studio

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix

Exploring DATA Step Merge and PROC SQL Join Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Table Lookups: From IF-THEN to Key-Indexing

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

An SQL Tutorial Some Random Tips

Can you decipher the code? If you can, maybe you can break it. Jay Iyengar, Data Systems Consultants LLC, Oak Brook, IL

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

SAS Performance Tuning Strategies and Techniques

capabilities and their overheads are therefore different.

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Use mail merge to create and print letters and other documents

Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

Using Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Fly over, drill down, and explore

Tips and Tricks for Organizing and Administering Metadata

Transcription:

Paper CS-044 Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC ABSTRACT Most programs are written on a tight schedule, using the most accessible knowledge of the programmer. Often the programmer mistakenly assumes that the program will never be used again; five years later the spaghetti code is still in use. While the tasks are accomplished and the results are accurate, the program may not be as efficient as possible and subsequent submissions may require tedious and time-consuming input and modifications. This presentation looks at typical SAS Code, and then suggests changes to improve the efficiency and maintenance of the programs. If you are a programmer who has inherited code that was written many SAS versions ago, you will benefit from examples of updating your code to the current decade. This tutorial focuses on maintenance - free, efficient coding techniques so that you can spend your work time being more productive! Topics include: macro coding techniques efficient programming tips code reduction tricks maintenance-free programming suggestions. INTRODUCTION This paper looks at example programs written in haste, and suggests changes to improve the efficiency and ongoing usage of the programs. The authors acknowledge that programmers need to strike a balance to: minimize intervention with production jobs minimize runtime focus programming time on reusable code (not one-time queries). REMOVE INEFFICIENT AND WORDY CODING Many SAS programs contain wordy coding that can be reduced as you learn more about the SAS language. The wordiness often results in inefficiency as data are processed multiple times, unnecessarily. This paper provides better solutions for coding examples that: use multiple steps when only one step is needed use multiple programs when only one is needed take forever to run. TYPICAL EXAMPLE Suppose your task is to generate a report that requires you to subset a SAS data set and then sort the resulting data. A common solution to this task is to use a DATA step to subset the data followed by a PROC SORT step to reorder the subset. data compare; set report; if ProductCode in ('XER', 'REF', 'CRS') and Date gt '01MAY2007'd; keep ProductCode Usage Date Location; proc sort data=compare; by ProductCode Date; 1

PROBLEMS All observations in the REPORT data set are read but only specific ProductCode and Date values are required The data are read twice (by SET statement and the PROC) which could be very resource intensive if you work with lots of data. IMPROVED EXAMPLE An alternate solution is to subset the data as you reorder it in the PROC SORT step. In this solution, you use a WHERE statement in the PROC SORT step to subset the data, and a KEEP= data set option to choose the columns of interest. The OUT= option enables you to preserve the original table, and create a new table with the reduced data of interest. proc sort data=report (keep=productcode Usage Date Location) out=compare ; where ProductCode in ('XER','REF','CRS') and Date gt '01MAY2007'd ; by ProductCode Date; NEXT TYPICAL EXAMPLE WORKING WITH MULTIPLE DATA SOURCES NO COMMON KEYS Your data sources often do not have common join fields. If you are a DATA step purist then your program would require multiple steps to join the data and get the results. Consider the following invoicing example where the data required to generate invoices is in three SAS data sources that you need to join to get produce invoices : sug.orders PartNumber Quantity CustomerID sug.prices PartNumber UnitPrice sug.custome Company Address City State CustomerID proc sort data=sug.orders out=orders; by PartNumber; data step1; merge Orders(in=inOrders) sug.prices; by PartNumber; if inorders; proc sort data=step1; by CustomerID; proc sort data=sug.customer out=customer; by CustomerID; data step2; merge step1(in=instep1) Customer; by CustomerID; if instep1; InvoiceAmt=Quantity*UnitPrice; proc sort data=step2; by Company; proc print data=step2; var CustomerID Company Address City State PartNumber Quantity InvoiceAmt; PROBLEMS Not only is the coding wordy to write but the data are read and sorted multiple times. In the real world, this could be resource-intensive. 2

IMPROVED EXAMPLE Perhaps you have not considered the advantages of SQL?! There are times that SQL can provide an easier solution, especially since common join variables are not required. This example joins 3 tables in one step, without a sort requirement. Obviously, with large data files, indexes can improve the performance of the join. proc sql; select a.customerid, Company, Address, State, b.partnumber, Quantity, Quantity*UnitPrice as InvoiceAmt from sug.customers as a, sug.prices as b, sug.orders as c where a.customerid=c.customerid and b.partnumber=c.partnumber order by Company ; quit; REDUCE PROGRAMMER INTERVENTION When programs are written too specifically to the task, user-intervention is required on subsequent runs to: change dates and timeframes change titles and inclusion criteria. Sometimes these changes require scanning 1000 s of lines of code. We know we have inherited that task on numerous occasions. Macro variables, SAS functions, and other coding tricks can be used to generalize a program making it more datadriven. For example, macro variables enable you to define values for parameters used throughout a program functions provide system date and time information for use in labeling, subsetting, etc. Macros provide decision tools that can determine which programming segments to execute based on specified criteria. TYPICAL EXAMPLE Suppose your task is to run a daily report that compares month-to-date revenue for the current month to revenue for the previous month. In the solution below, the values for the current month and year, and the values for month and year for the previous month are hard coded in the program. Also, three DATA steps are used to create the desired data set for the report. The first DATA step selects the data for the current month, the second DATA step selects the data for the previous month, and the third DATA step concatenates the two data sets created by the first two DATA steps. data currentmonth; if month(date) = 5 and year(date) = 2007; MonthYear = " 5/2007"; data lastmonth; if month(date) = 4 and year(date) = 2007; MonthYear = " 4/2007"; data comparemonths; set lastmonth currentmonth; proc means data=comparemonths; class MonthYear; var Revenue; title "MTD Revenue vs Last Month"; 3

PROBLEMS The solution above requires you to change the code each month and is inefficient because three DATA steps are used when only one DATA step is needed. This means that the data are read multiple times, which is resource intensive if you are processing a lot of data. An alternate solution is to use DATA step functions to eliminate the need to alter the code each month, and use the OUTPUT statement to build the final data set in one DATA step. BETTER EXAMPLE data comparemonths; if month(date)=month(today()) and year(date)=year(today()) then do; MonthYear= put(today(), mmyys7.); output; else do; lastmonth=intnx('month',today(), -1); if month(date)=month(lastmonth)and year(date)=year(lastmonth) then do; MonthYear=put(lastmonth,mmyyS7.); output; proc means data=comparemonths; class MonthYear; var Revenue; title "MTD Revenue vs Last Month"; TIPS The S in the mmyys7. format requests a SLASH to separate the month and year. If you are not familiar with this specification, have a look at SAS online documention! Many date formats enable specification of delimiters. The PUT function enables easy conversion from the numeric date to the desired character representation. EVEN BETTER EXAMPLE If the job ran across midnight, the today()function would return multiple dates as the program progressed. To avoid issues with accuracy and to reduce function calls, create a macro variable at the top of the program to determine today s date (ie: when the program BEGAN execution). Then, one time define the constants that will be used while processing the data. %let DateToday = %sysfunc(today()); ** Date at time program starts; data comparemonths; *** create static values first time through the DATA step; retain lastmonth mon_today yr_today MonthYear_today mon_lastmonth yr_lastmonth MonthYear_lastmonth ; if _n_ = 1 then do; *** current month; mon_today = month(&datetoday); yr_today = year(&datetoday); MonthYear_today = put(&datetoday, mmyys7.); *** last month; lastmonth=intnx('month',&datetoday, -1); mon_lastmonth = month(lastmonth); yr_lastmonth = year(lastmonth); MonthYear_lastmonth =put(lastmonth,mmyys7.); *** Process Current Month Revenue; if month(date)= mon_today and year(date)= yr_today then do; MonthYear= MonthYear_today; 4

output; *** Process Last Month Revenue; else do; if month(date)= mon_lastmonth and year(date)= yr_lastmonth then do; MonthYear = MonthYear_lastmonth; output; proc means data=comparemonths; class MonthYear; var Revenue; title "MTD Revenue vs Last Month"; CONSIDER TWO FINAL IMPROVEMENTS 1. Since there may be an enormous amount of data, a WHERE clause on the SET statement would reduce the amount of data read. 2. It is likely that the 7 variables created at the top of the DATA step would be needed / used elsewhere in the program. All of these values could be created at the top of the program with %SYSFUNC and stored in macro variables. THINK PRODUCTION If you work in an environment where you d like to write the code once and then apply it to a broad spectrum of business problems, you will love this! Suppose you use the same logic and reporting formats on a regular basis for each new product / campaign / region. The programs are the same except for changes to: Dates Inclusion / exclusion criteria Title and footnote text Search logic Output locations Etc. Your current approach is to hunt (everywhere!) for the most recent version(s) you can find of the query / reporting program and then after you FINALLY find the file, you copy the code and change the name. Next you have to: Step through the code to locate all of the required changes Start typing to input the new information Run the code and OOPS fix the semicolons and quotes you inadvertently overtyped Try again APPROACH Store a single dated copy of each module (segment of source code) Create a driver program that provides macro variable input for the current scenario and then calls the appropriate modules. 5

EXAMPLE When you are ready to start reporting for a new initiative, just provide the parameter values as input to your standard programs. *** Driver Program specify appropriate values for parameters; %let prestart = 01OCT2006; *** Pre campaign period for comparison; %let prestop = 31DEC2006; *** End of Pre campaign period; %let poststart = 01JAN2007; *** start campaign tracking; %let poststop = 30JUN2007; *** expire date - stop tracking; %let title = "January 2007 Introduction of Low Interest Rates"; %let products = ('VRS', 'GQL', 'BGO', 'DLA'); %let codes = ('015', '119', '214'); *** run standard reporting programs; %include 'CampaignReportingExtract_2007_01_10.sas'; %include 'CampaignReportingOutput_2007_01_13.sas'; IMPROVED EXAMPLE The above approach works great, until your source code changes. Then, you need to find all drivers that are calling the source, and you need to change the date stamp in the %include. Not fun! To avoid changing the version in 100 s of drivers, use a placeholder reference in your drivers so that the most current source is always called. Create a program that references the latest source version: Contents of CampaignReportingExtract_CurrentPgm.sas *** call latest version of source program; %include 'CampaignReportingExtract_2007_01_10.sas'; Revised contents of driver program *** Driver Program specify appropriate values for parameters; %let prestart = 01OCT2006; *** Pre campaign period for comparison; %let prestop = 31DEC2006; *** End of Pre campaign period; %let poststart = 01JAN2007; *** start campaign tracking; %let poststop = 30JUN2007; *** expire date - stop tracking; %let title = "January 2007 Introduction of Low Interest Rates"; %let products = ('VRS', 'GQL', 'BGO', 'DLA'); %let codes = ('015', '119', '214'); *** run standard reporting programs; %include 'CampaignReportingExtract_CurrentPgm.sas'; %include 'CampaignReportingOutput_CurrentPgm.sas'; TESTING TIPS When you improve a program using some of the techniques suggested above, it is prudent to compare the results to insure they are consistent and correct. Scanning output for similarity is often the approach taken, but it is easy to overlook discrepancies. Testing the equality of intermediate and final tables from the two solutions is a more accurate approach. PROC COMPARE enables comparison of entire tables, columns, or subsets of data. While many options are available, a very simple comparison of two tables is accomplished using: proc compare data=sales compare=sales2; 6

CONCLUSION As you learn more about SAS, you will find that there are numerous features that enable you to accomplish tasks easily and efficiently. The simple techniques provided in this paper should help you identify and correct a few inefficiencies and wordiness in your coding. As shown in the first two examples above, a common but inefficient practice used by many SAS users is to process data more times than is necessary. Additional examples showing code reduction techniques are provided in the one hour presentation. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Marje Fecht Email: marje.fecht@prowerk.com Larry Stewart Email: larry.stewart@sas.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7