Wonderful Unix SAS Scripts: SAS Quickies for Dummies Raoul Bernal, Amgen Inc., Thousand Oaks, CA

Similar documents
SAS Display Manager Windows. For Windows

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Quality Control of Clinical Data Listings with Proc Compare

SAS Training Spring 2006

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

SAS Studio: A New Way to Program in SAS

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Customizing Your SAS Session

Run your reports through that last loop to standardize the presentation attributes

A Linux Shell Script to Automatically Compare Results of Statistical Analyses

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

SCSUG-2017 SAS Grid Job Search Performance Piyush Singh, Ghiyasuddin Mohammed Faraz Khan, Prasoon Sangwan TATA Consultancy Services Ltd.

How Managers and Executives Can Leverage SAS Enterprise Guide

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

CSCI 211 UNIX Lab. Shell Programming. Dr. Jiang Li. Jiang Li, Ph.D. Department of Computer Science

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

Essential Unix and Linux! Perl for Bioinformatics, ! F. Pineda

One Project, Two Teams: The Unblind Leading the Blind

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

Going Under the Hood: How Does the Macro Processor Really Work?

Topic 2: More Shell Skills

EXAMPLE 2: INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT

A Practical Introduction to SAS Data Integration Studio

Why SAS Programmers Should Learn Python Too

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words

Then, we modify the program list.1 to list two files.

Perl and R Scripting for Biologists

SAS Macros of Performing Look-Ahead and Look-Back Reads

Essentials for Scientific Computing: Bash Shell Scripting Day 3

Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI

PharmaSUG Paper PO10

UNIX Essentials Featuring Solaris 10 Op System

Topic 2: More Shell Skills. Sub-Topic 1: Quoting. Sub-Topic 2: Shell Variables. Difference Between Single & Double Quotes

Statistics, Data Analysis & Econometrics

Can you decipher the code? If you can, maybe you can break it. Jay Iyengar, Data Systems Consultants LLC, Oak Brook, IL

A Cross-national Comparison Using Stacked Data

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Copy That! Using SAS to Create Directories and Duplicate Files

Sub-Topic 1: Quoting. Topic 2: More Shell Skills. Sub-Topic 2: Shell Variables. Referring to Shell Variables: More

Chapter-3. Introduction to Unix: Fundamental Commands

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

The Submission Data File System Automating the Creation of CDISC SDTM and ADaM Datasets

Getting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA

Text Generational Data Sets (Text GDS)

PharmaSUG Paper AD06

Efficiently Join a SAS Data Set with External Database Tables

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

CS 4218 Software Testing and Debugging Ack: Tan Shin Hwei for project description formulation

Bash command shell language interpreter

Using SAS Macros to Extract P-values from PROC FREQ

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

Essential Skills for Bioinformatics: Unix/Linux

Statements with the Same Function in Multiple Procedures

A Few Quick and Efficient Ways to Compare Data

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

Intermediate SAS: Working with Data

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

Shell scripting and system variables. HORT Lecture 5 Instructor: Kranthi Varala

ABSTRACT DATA CLARIFCIATION FORM TRACKING ORACLE TABLE INTRODUCTION REVIEW QUALITY CHECKS

Set 1 MCQ Which command is used to sort the lines of data in a file in reverse order A) sort B) sh C) st D) sort -r

SAS/Warehouse Administrator Usage and Enhancements Terry Lewis, SAS Institute Inc., Cary, NC

CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD

Introduction to the UNIX command line

Keeping Track of Database Changes During Database Lock

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

Permission Program. Support for Version 6 Only. Allowing SAS/SHARE Client Access to SAS Libraries or Files CHAPTER 40

Patricia Guldin, Merck & Co., Inc., Kenilworth, NJ USA

Bash Programming. Student Workbook

Routing Output. Producing Output with SAS Software CHAPTER 6

Working with Big Data in SAS

Introduction to Linux. Woo-Yeong Jeong Computer Systems Laboratory Sungkyunkwan University

Introduction to Linux

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

While You Were Sleeping, SAS Was Hard At Work Andrea Wainwright-Zimmerman, Capital One Financial, Inc., Richmond, VA

What Do You Mean My CSV Doesn t Match My SAS Dataset?

CS214 Advanced UNIX Lecture 4

PROGRAMMING PROJECT ONE DEVELOPING A SHELL

Pros and Cons of Interactive SAS Mode vs. Batch Mode Irina Walsh, ClinOps, LLC, San Francisco, CA

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

M2PGER FORTRAN programming. General introduction. Virginie DURAND and Jean VIRIEUX 10/13/2013 M2PGER - ALGORITHME SCIENTIFIQUE

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Creating a Shell or Command Interperter Program CSCI411 Lab

Advanced Unix Programming Module 03 Raju Alluri spurthi.com

Useful Tips When Deploying SAS Code in a Production Environment

The DATA Statement: Efficiency Techniques

PAPER CC02 Link the Unlinked : Documenting Dependencies Among SAS Programs in a Project Abstract within across related Context and Clarification

Routing the SAS Log and SAS Procedure Output

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

PharmaSUG Paper IB11

Transcription:

PharmaSUG2010 - Paper CC25 Wonderful Unix SAS Scripts: SAS Quickies for Dummies Raoul Bernal, Amgen Inc., Thousand Oaks, CA ABSTRACT Repeated simple SAS jobs can be executed online and in real-time to avoid having to directly execute SAS command files, producing immediate results directly into any Unix display window. The trick is to simulate SAS using Unix scripts, which can be executed just like any Unix command with minimum of one or two parameters. One advantage of making such SAS Unix scripts available is the quick access to SAS without having to learn actual SAS syntax. This reduces the number of exploratory data requests from non-regular SAS users, thereby empowering them to get quickie exploratory results without having to ask a SAS practitioner. Just like any Unix command, the results can be directed to an output file (other than the default output screen) for reporting. INTRODUCTION Imagine getting ad hoc or last minute request from your supervisor to produce a quick report to generate database records with specific criteria, or to generate a count of subjects meeting certain criteria, etc. Imagine if all you have to do to fulfill this type of request is to type any of the following: pp dataset condition pf dataset "catvar1*catvar2" condition pu dataset contvar1 condition pc dataset One way to minimize ad hoc requests is to empower non-sas-trained colleagues and co-workers to use tailored Unix shell scripts to generate their own reports for their simple queries. The simpler the query, the more applicable the Unix SAS shell script could be to address the query and provide the needed statistics. This paper will demonstrate the power and ease behind Unix shell scripts for SAS and provide examples of applications using the simple scripts pp, pf, pu, and pc for SAS procedures PROC PRINT, PROC FREQ and PROC UNIVARIATE and PROC CONTENTS, respectively. By the end of the presentation, one should be able to create simple Unix shell scripts for SAS and realize their relative ease and time-saving features. CREATING A SIMPLE UNIX SCRIPT A simple Unix script is composed of a simple command executable in the Unix operating system, with optional parameters. An example script would be one that lists all files in a current directory, by time: ls last *.* Instead of typing this command, a simple executable script that contains this line, say, fl (for file list) can be saved in utilities folder so that typing fl essentially produces the same results as ls -last *.*.. CREATING SIMPLE SAS UNIX SCRIPTS Creating a SAS Unix script would be more complicated than the simplest Unix command because we need to specify: a) the SAS procedure, b) the dataset to apply the procedure to, and optionally, c) any condition and option to apply to the dataset. Moreover, we need to create a temporary behind-the-scenes SAS job to execute in the background after executing the SAS Unix script in the command line. The objective is to create a SAS Unix script, which, when executed with required dataset, condition and optional parameters, would be equivalent to executing a SAS job using the same SAS procedure. Examples of SAS Unix scripts will be discussed in detail. A. SAS UNIX SCRIPT pp Command syntax: pp dataset condition 1

This script pp (short for proc print), when executed with parameters dataset and condition, would be equivalent to the execution of the following PROC PRINT code in SAS: proc print data=dataset width=min; where (condition); run; As an example, printing all the patient records in cohort A1 in the demo.sas7bdat dataset would simply require the following command: pp demo cohort= A1 which yields the following result: Figure 1. Results generated after executing pp example. The straightforward Unix script for pp would be: #!/bin/csh -f # pp - prints out all or part of a UNIX dataset using PROC PRINT # The first argument is the dataset set dataset="$1" # The second argument is for setting condition to subset the data set condition="$2" # Find the dataset file set pwd=`pwd` if (-f $dataset.sas7bdat == 1) then set sasdir = "$pwd" echo "Dataset: $dataset.sas7bdat not found. Exiting... " # Begin constructing the SAS program in the temporary directory /tmp echo "%*include 'init.sas' ;" >! /tmp/pp$$.sas echo "libname sasdata '$sasdir' ;" # Continue constructing the program echo "options nodate nocenter nofmterr pagesize=74 linesize=132 ;" echo "proc printto file='/tmp/pp$$.lis' ;" echo "title Printing dataset: %nrbquote($dataset) Date: &sysdate. Time: &systime.;" echo "data $dataset; set sasdata.$dataset ;" if ("$2"!= '') then echo " where $condition;" echo "proc print data=$dataset;" echo "Running proc print on dataset: $dataset ($sasdir)" # Run the program in the temporary directory./tmp cd /tmp sas -913 pp$$.sas >> /tmp/pp$$.lis grep -i "sas stopped" /tmp/pp$$.log 2

test -f /tmp/pp$$.lis; set stat=$status if ($stat == 0) then set lines = `head -10 /tmp/pp$$.lis wc -l` if ($lines <= 2) then echo No output.lst file was created. more /tmp/pp$$.log more /tmp/pp$$.lis echo No output.lst file was created. more /tmp/pp$$.log /bin/rm -rf /tmp/pp$$.* B. SAS UNIX SCRIPT pf Command syntax: pf dataset tables condition This script pf (short for proc freq), which, when executed with parameters dataset, tables and condition, would be equivalent to the execution of the following PROC FREQ code in SAS: proc freq data=dataset; tables tables / missing missprint; where (condition); run; As an example, producing the frequency table race*sex from the demo.sas7bdat dataset for age group age > 50 would simply require the following command: pf demo race*sex age>50 which yields the following result: Figure 2. Results generated after executing pf example. The straightforward Unix script for pf would be: #!/bin/csh -f # pf performs PROC FREQ on SAS dataset 3

# The first argument is the dataset # The second argument is the tables statement # The third argument is the condition statement set dataset = "$1" set tables = "$2" set condition = "$3" # Find the dataset file set pwd=`pwd` if (-f $dataset.sas7bdat == 1) then set sasdir = "$pwd" echo "Dataset: $dataset.sas7bdat not found. Exiting... " # Begin constructing the SAS program in the temporary directory /tmp echo "%*include 'init.sas';" >! /tmp/pf$$.sas echo "libname sasdata '$sasdir';" # Continue constructing the program echo "options nocenter nofmterr pagesize=74 linesize=150;" echo "proc printto file='/tmp/pf$$.lis';" echo "proc freq data=sasdata.$dataset;" echo " tables $tables;" if ("$3"!= '') then echo " where $condition;" echo "Running proc freq on dataset: $dataset ($sasdir)" # Run the program in temporary directory cd /tmp sas -913 pf$$.sas >> pf$$.lis grep -i "sas stopped" /tmp/pf$$.log test -f /tmp/pf$$.lis; set stat=$status if ($stat == 0) then set lines = `head -10 /tmp/pf$$.lis wc -l` if ($lines <= 2) then echo empty.lis file created. more /tmp/pf$$.log more /tmp/pf$$.lis echo no.lis file created. more /tmp/pf$$.log /bin/rm -rf /tmp/pf$$.* C. SAS UNIX SCRIPT pu Command syntax: pu dataset variable condition byvar This script pu (short for proc univariate), which, when executed with parameters dataset, variable, condition and byvar, would be equivalent to the execution of the following PROC UNIVARIATE code in SAS: proc univariate data=dataset; var variable; where (condition); by byvar; run; As an example, producing the univariate statistics for variable age stratified by sex from the demo.sas7bdat dataset for condition (cohort = A1 ) would simply require the following command: pu demo age cohort= A1 sex which yields the following result: 4

Figure 3. Partial results generated after executing pu example. The straightforward Unix script for pu is as follows: #!/bin/csh -f # pu - performs PROC UNIVARIATE on specific continuous variable in SAS dataset # The first argument is the dataset # The second argument is the variable of interest # The third argument is the condition statement # The fourth argument is the BY variable to stratify analysis set dataset = "$1" set variable = "$2" set condition = "$3" set byvar = "$4" # Determine if edit is requested to edit/modify SAS job if ("$3" == 'edit' "$4" == 'edit' "$5" == 'edit') then set edit = yes set edit = no # Find the dataset file set pwd=`pwd` if (-f $dataset.sas7bdat == 1) then set sasdir = "$pwd" echo "Dataset: $dataset.sas7bdat not found. Exiting... " # Begin constructing the sas program in the temporary directory /tmp echo "%*include 'init.sas';" >! /tmp/pu$$.sas echo "libname sasdata '$sasdir';" >> /tmp/pu$$.sas 5

# Continue constructing the program echo "options nocenter nofmterr pagesize=74 linesize=90;" >> /tmp/pu$$.sas echo "proc printto file='/tmp/pu$$.lis';" >> /tmp/pu$$.sas >> /tmp/pu$$.sas echo "data $dataset;" >> /tmp/pu$$.sas echo " set sasdata.$dataset;" >> /tmp/pu$$.sas >> /tmp/pu$$.sas >> /tmp/pu$$.sas if ("$4"!= '' & "$4"!= 'edit') then echo "proc sort data= $dataset;" >> /tmp/pu$$.sas echo " by $byvar;" >> /tmp/pu$$.sas >> /tmp/pu$$.sas >> /tmp/pu$$.sas echo "proc univariate data=$dataset;" >> /tmp/pu$$.sas echo " var $variable;" >> /tmp/pu$$.sas if ("$3"!= '' & "$3"!= 'edit') then echo " where $condition;" >> /tmp/pu$$.sas if ("$4"!= '' & "$4"!= 'edit') then echo " by $byvar;" >> /tmp/pu$$.sas >> /tmp/pu$$.sas >> /tmp/pu$$.sas # If edit set to yes, then invoke the text editor if ("$edit" == 'yes') then vi /tmp/pu$$.sas echo "Running (edited) proc univariate on dataset: $dataset ($sasdir)" echo "Running proc univariate on dataset: $dataset ($sasdir)" # Run the program in the scratch directory cd /tmp sas -913 pu$$.sas >> pu$$.lis grep -i "sas stopped" /tmp/pu$$.log test -f /tmp/pu$$.lis; set stat=$status if ($stat == 0) then set lines = `head -10 /tmp/pu$$.lis wc -l` if ($lines <= 2) then echo No output.lst file was created. more /tmp/pu$$.log more /tmp/pu$$.lis echo No output.lst file was created. more /tmp/pu$$.log /bin/rm -rf /tmp/pu$$.* D. SAS UNIX SCRIPT pc Command syntax: pc dataset This script pc (short for proc contents), which, when executed with parameter dataset, would be equivalent to the execution of the following PROC CONTENTS code in SAS: proc contents data=dataset; run; As an example, producing the contents of the demo.sas7bdat dataset would simply require the following command: pc demo which yields the following result: 6

Figure 4. Results generated after executing pc demo. MODIFYING SAS UNIX SCRIPTS FOR ADDED ENHANCEMENTS Once a desired Unix script command is finalized, oftentimes the actual SAS command generated needs to be saved to be formalized as a library job. A possible enhancement would be the option to create and edit the SAS job prior to the script execution. This is accomplished by simply adding a parameter or option to the script, in the form of a unique parameter value that can only be interpreted for creating and editing such a SAS job. In our example, we use the keyword edit. When the keyword edit is parsed as an option, the temporary SAS job is opened in Unix window (using default editor) for editing and saving. A. SAS UNIX SCRIPT pu WITH edit PARAMETER Command syntax: pu dataset variable condition edit For the same SAS example for script pu above, the following job is created and opened in the default Unix editor when using edit option in the command line: pu demo age cohort= A1 sex edit 7

Figure 5. Sample code generated when using edit option with script pu. Once can save this temporary job in /tmp/ directory into an actual SAS job, to be formalized as a library job later. This file can be saved before the script actually executes the job and dumps the results into the default output, which was shown earlier. (Note that in the actual script, SAS statements to sort the dataset by the byvar variable have been added to avoid potential errors due to unsorted data.) B. SAS UNIX SCRIPT pc WITH edit PARAMETER Command syntax: pu dataset variable condition edit For the same SAS example for the script pc above, the following job is created and opened in the default Unix editor when using edit option in the command line: pc demo edit Figure 6. Code generated when using edit with script pc. CONCLUSION Creating Unix scripts to automate SAS execution is practical for teams with members that need quick access to SAS results without having to learn the necessary programming behind SAS. This saves valuable time and effort for the SAS programmer, who can then concentrate on planned deliverables instead of getting bogged down by ad hoc requests. The scripts also aid the programmer or study teams members to make quick exploratory data checks. For data checks and ad hoc requests that are often repeated as tedious tasks, the edit feature allows the programmer to save the job into a more formal SAS production job. 8

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Raoul A. Bernal Amgen Inc. One Amgen Center Drive Thousand Oaks, CA 91320 Work Phone: 805 447 4772 E-mail: rbernal@amgen.com Web: www.amgen.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 9