Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS

Similar documents
WRITE SAS CODE TO GENERATE ANOTHER SAS PROGRAM

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

Base and Advance SAS

The Programmer's Solution to the Import/Export Wizard

Using Dynamic Data Exchange

A Macro that can Search and Replace String in your SAS Programs

Extending the Scope of Custom Transformations

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

PharmaSUG Paper PO10

Chapter 2: Getting Data Into SAS

Other Data Sources SAS can read data from a variety of sources:

The INPUT Statement: Where

Accessing Data and Creating Data Structures. SAS Global Certification Webinar Series

Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

Chapter 7 File Access. Chapter Table of Contents

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Dictionary.coumns is your friend while appending or moving data

Using SAS to Control the Post Processing of Microsoft Documents Nat Wooding, J. Sargeant Reynolds Community College, Richmond, VA

DSCI 325: Handout 2 Getting Data into SAS Spring 2017

Reading in Data Directly from Microsoft Word Questionnaire Forms

The Ugliest Data I ve Ever Met

Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians

SUGI 29 Data Warehousing, Management and Quality

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

This Text file. Importing Non-Standard Text Files using and / Operators.

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users

SAS CURRICULUM. BASE SAS Introduction

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

The INPUT Statement: Where It

A Practical Introduction to SAS Data Integration Studio

Making an RTF file Out of a Text File, With SAS Paper CC13

How to use UNIX commands in SAS code to read SAS logs

Using Unnamed and Named Pipes

PharmaSUG Paper TT11

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

SAS Data Integration Studio Take Control with Conditional & Looping Transformations

Using SAS Enterprise Guide to Coax Your Excel Data In To SAS

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

How Managers and Executives Can Leverage SAS Enterprise Guide

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

PDF Multi-Level Bookmarks via SAS

Reading data in SAS and Descriptive Statistics

Demystifying Inherited Programs

Find2000: A Search Tool to Find Date-Related Strings in SAS

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

An Easy Way to Split a SAS Data Set into Unique and Non-Unique Row Subsets Thomas E. Billings, MUFG Union Bank, N.A., San Francisco, California

Your Own SAS Macros Are as Powerful as You Are Ingenious

16. Reading raw data in fixed fields. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 364

The Impossible An Organized Statistical Programmer Brian Spruell and Kevin Mcgowan, SRA Inc., Durham, NC

Lazy Programmers Write Self-Modifying Code OR Dealing with XML File Ordinals

A Macro to Keep Titles and Footnotes in One Place

Using SAS to Control and Automate a Multi SAS Program Process Patrick Halpin, dunnhumby USA, Cincinnati, OH

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

17. Reading free-format data. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 386

Leave Your Bad Code Behind: 50 Ways to Make Your SAS Code Execute More Efficiently.

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

SAS Tricks and Techniques From the SNUG Committee. Bhupendra Pant University of Western Sydney College

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Once the data warehouse is assembled, its customers will likely

INTRODUCTION THE FILENAME STATEMENT CAPTURING THE PROGRAM CODE

Acquisition and Management of Market Data for SAS Risk Dimensions

MISSOVER, TRUNCOVER, and PAD, OH MY!! or Making Sense of the INFILE and INPUT Statements. Randall Cates, MPH, Technical Training Specialist

When Powerful SAS Meets PowerShell TM

The Demystification of a Great Deal of Files

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Bruce Gilsen, Federal Reserve Board

Writing Programs in SAS Data I/O in SAS

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., Atlanta, GA

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

One SAS To Rule Them All

Macros are a block of code that can be executed/called on demand

Introduction to SAS Mike Zdeb ( , #1

Getting Your Data into SAS The Basics. Math 3210 Dr. Zeng Department of Mathematics California State University, Bakersfield

Innovative Performance Improvements Through Automated Flowcharts In SAS

Implementing external file processing with no record delimiter via a metadata-driven approach

EXPORTING SAS OUTPUT ONTO THE WORLD WIDE WEB

Cleaning up your SAS log: Note Messages

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

Chasing the log file while running the SAS program

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix

Introduction OR CARDS. INPUT DATA step OUTPUT DATA 8-1

ZIPpy Safe Harbor De-Identification Macros

Department of Computer Science University of Cyprus. EPL342 Databases. Lab 2

Biostatistics 600 SAS Lab Supplement 1 Fall 2012

capabilities and their overheads are therefore different.

CS143: Relational Model

The Ins and Outs of %IF

Paper William E Benjamin Jr, Owl Computer Consultancy, LLC

Chapter 1 The DATA Step

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

e. That's accomplished with:

Geocoding Crashes in Limbo Carol Martell and Daniel Levitt Highway Safety Research Center, Chapel Hill, NC

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Transcription:

Paper 175-29 Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS Linda Gau, Pro Unlimited @ Genentech, Inc., South San Francisco, CA ABSTRACT In this paper we introduce a dynamic way to create SAS data sets from raw data files (flat files) by using SAS/Base only. You only need to write one SAS program, and it will handle all kinds of raw data fixed length, delimited, freeformatted or even hierarchical files. You may also apply the same algorithms with little modification to output raw data files from SAS data sets. INTRODUCTION It is often be the case that many raw data files with varying structure need to be read into SAS data sets for analysis using SAS software. Writing a SAS program to read each file results in many similar programs with minor variation due to different starting points, variable names, data types, etc. If your clients change a file structure, for example, from a fixed length file to a delimited file, or vice versa, you may need to re-write the SAS programs to read raw data. While these programs are technically trivial, writing and maintaining them can be an onerous, time-consuming task. Wouldn t it be nice to write one program which will generate tons of SAS programs and handle any data files? Yes, you can do this by creating a data layout table to store the specification of the data files and using DATA _NULL_ to create data dependent SAS programs. Alternatively, by using a macro, you have better control of your source code, and do not need to output SAS programs to the host system. READING RAW DATA Manual Programming to Read Raw Data You can use column or formatted input to read data in fixed columns or standard character and numeric data. You may also use formatted input to read nonstandard character and numeric data, or date values, and convert them to SAS date values by using Informats: $CHARw., COMMAw., DATEw., MMDDYYw., etc. You can use list input to read data that are not in a fixed location. The following is an example that we might deal with in our daily job: filename in 'xxxxxxxxxx'; data library.file1; infile in lrecl=529 recfm=f; input @ 1 ssn $9. 1

@ 10 dob mmddyy6. @ 16 age 3. @ 19 sex $1. @ 20 eezip $9. @ 29 m_status $1. @ 30 eestatus $3. @ 33 earntype $1. @ 34 wageband $1. @ 35 jobcode $4. @ 39 salary zd11.2 @ 50 contserv mmddyy6. @ 56 pidays 5.2 @ 61 yearserv 2. @ 63 pensionu $3. @ 66 loca $4. @ 70 pub18 $7. @ 77 ee_site $8.... The Dynamic Solution Step 1: Create Data Layout Table Let s dissect the code in the previous example. Basically, it can be broken down into four elements record length, starting position for the variable, variable name, and information regarding data type and length of variable or informats. You may store the information in a data file, let s name it Data Layout Table. You may store the data layout table in a MS Excel spreadsheet (Fig. 1), then write a SAS program to read the data layout table into the SAS system, or enter the information directly into a SAS data set (Fig. 2). Fig. 1 The following program reads a data layout table as a tab delimited file into a SAS data set. Please note that there are various ways to input data, so if you prefer another solution, that s OK. 2

data tblay; length comp ftype $2 tbla_nam $8 tbla_len 4. tbla_dty $10 tbla_fmt $12; infile "&tabldir/tb_lay.txt" dlm='09'x missover dsd end=eof ; input comp $ ftype $ @;... input tbla_pos $ tbla_nam $ tbla_len tbla_dty $ tbla_fmt $ tbla_fle $;... Fig. 2 Step 2: Writing Dynamic Code to Read Raw Data -- Using DATA _NULL_ A lot of programmers use DATA _NULL_ for writing custom reports or raw data files. However, the same concept of writing a raw data file using FILE and PUT statements can be applied to write a SAS program. Instead of naming raw data files with an extension of.txt or.dat, you name it with an extension of.sas. data _null_ ; set tblay end=eof ; file "&userdir/&compid&filetype&sysnum.lay.sas"; The SET statement simply tells SAS to read the data layout table, which stores the file structure specifications. The next step is to write the code that you normally would for reading raw data, but using the a PUT statement and replacing the information for record length, starting position, variable names and informats with the fields in the data layout table. if _n_ =1 then do; put "data XXXXX; " / 'infile "&datadir/&compid&filetype" missover ls=' tbla_fle '; / 'input' ; data onecomp ; infile "&datadir/&compid&filetype" missover ls = 268 ; input 3

Step 3: Recode Data Type and Informats Data can be in different forms and the flat files may be generated by other software, e.g., Oracle, DB2, C, etc., so different keywords might be used to represent character or numeric data. The following are a few examples: Character: CHAR, VARCHAR2, etc. Numeric: NUMBER, INTEGER, SHORT, LONG, etc. Thus, we need to do data cleaning on the data types first, and create informats to make our dynamic coding easier. 1. Data cleaning for character data: if upcase(tbla_dty) =: 'VARCHAR then tbla_dty = 'CHAR '; 2. Data cleaning for numeric data: if upcase(left(tbla_dty)) in ( 'NUMBER ', 'INTEGER ') then tbla_dty = 'NUMERIC '; 3. Data cleaning for date variables: if upcase(tbla_dty) =: 'DATE' then do; dtcnt = dtcnt +1; dts(dtcnt) = tbla_nam ; if upcase(left(tbla_fmt)) = 'CCYYMMDD ' then tbla_fmt = 'yymmdd8. '; else if upcase(left(tbla_fmt)) = 'MM/DD/CCYY ' then tbla_fmt = 'mmddyy10. '; else if upcase(left(tbla_fmt)) = 'MMDDYY10 ' then tbla_fmt='mmddyy10. '; else if upcase(left(tbla_fmt)) = 'MMDDYY10. ' then tbla_fmt='mmddyy10. '; else if upcase(left(tbla_fmt)) = 'MMDDYY8 ' then tbla_fmt='mmddyy8. '; else if upcase(left(tbla_fmt)) = 'MMDDYY8. ' then tbla_fmt='mmddyy8. '; else if upcase(tbla_fmt) = " " then tbla_fmt='mmddyy8. ' ; In the meantime, you might also need to recode informats if necessary: if upcase(tbla_dty) =: 'CHAR' then do; if tbla_fmt = " " then tbla_fmt = '$' compress(tbla_len) '.' ; else tbla_fmt = tbla_fmt; if upcase(tbla_dty) =: 'NUMERIC' then do; if tbla_fmt = " " then tbla_fmt = compress(floor(tbla_len) ) '.' ; else tbla_fmt = tbla_fmt; All the tedious work is done. Now comes the fun part, dynamic coding, and you will see how great and powerful a solution it is. Usually, you will write the starting point, SAS variable name and informats for each variable that you are going to read into a SAS data set. Unfortunately, if a file with hundreds of variables needs to be read as a SAS data set, you need to write hundreds of lines by the following programming format. 4

Since we stored the data specification in the data layout table, we will replace the hundreds of lines of code with the following four lines of code: if tbla_pos ne. then do; put @10 '@' tbla_pos +1 tbla_nam +1 tbla_fmt ; At the end of the program for reading a raw data file, you will format some non-standard data, e.g. date variables, so it will make sense to the general users. One tip here is that you can store the date variables into an array, so you can dynamically format those date variables into the formats you prefer. if eof then do; put ';' / 'format' ; do i=1 to dtcnt ; put dts(i) end ; put "mmddyy10." / '; run; ' ; ; format EMPDOB EMPJOBDA mmddyy10.; run; Step 4: Run Program that Actually Read Raw data Finally, you may execute the program for reading raw data into a SAS data set by using %include statement. READ DELIMITED FILES %INCLUDE "&USERDIR/&COMPID&FILETYPE&SYSNUM.LAY.SAS"; To further extend the program s functionality, consider delimited files. With a little bit of modification to the previous program, you can also read delimited files by using DELIMITER=, or DLM=, option in the INFILE statement. if _n_ = 1 then do; put "data XXXXX; " /; put "length comp $2 " @;... put 'infile "&datadir/&compid&filetype" dlm = "09"X dsd missover; ' /; put 'input' /; As we described before in the section on reading fixed length files, we may also generate data dependent SAS programs using the information stored in the data layout table. 5

PUT @10 TBLA_NAM +1 TBLA_FMT ; WRITING RAW DATA In addition to reading raw data into SAS data sets, you may modify this program to write flat files. Since it uses the same method as we just described but using FILE and PUT statements, we won t illustrate it here again. Raw Data File INFILE INPUT SAS Data Set FILE PUT Raw Data File Here is the code: data _null_ ; file "&userdir/&compid&filetype&sysnum.asc.sas"; set tbnorm end=eof ;... if _n_ =1 then do; put "data _null_ ; " / 'options missing = " " ;' / "set outdir.&compid&filetype&dt ;" / 'file "&textdir/&compid&filetype&dt&sysnum..txt" ls=' tbnm_fle ';' / 'put' ; end ;... put @10 '@' tbnm_pos +1 tbnm_nam +1 tbnm_fmt ;... if eof then do; put ';' / ' run; '; %include "&userdir/&compid&filetype&sysnum.asc.sas"; ALTERNATIVE CODING MACRO SOLUTION There are various ways to accomplish this solution. If you are a macro guru, you might like to write dynamic code using SAS macro language. Here is a solution provided by Brit Harvey. We are just providing the idea, the rest of the 6

programming is up to you. %macro ReadFile ( MetaFile = /* INPUT METADATA FILE */, LineSize = /* LINESIZE FROM CLIENT */, DataFile = /* INPUT DATA FILE */, Out = /* OUTPUT DATASET */ ); %local ReadCode ; /* THIS IS A MACRO VARIABLE THAT WILL CONTAIN CODE TO READ DATA FILE. */ /* WRITE THE DATA, INFILE AND INPUT STATEMENTS */ %let ReadCode = data &out %str(;) infile = &DataFile missover ls=&linesize %str(;) input ; %do until end of &MetaFile; Read a line of &MetaFile. Do conversion of date, char, and numeric informats. Generate the input statement line, (call it &inline). %let ReadCode = &ReadCode &inline ; %end ; Generate the formats (call them &formats). %let ReadCode = %str(;) &ReadCode &formats %str(;) run %str(;) ; /* OPTIONALLY PUT THE GENERATED CODE TO THE LOG FOR DEBUGGING */ %put Review generated code ; %put &ReadCode ; /* EXECUTE THE GENERATED CODE TO READ THE FILE */ &ReadCode %mend ReadFile ; CONCLUSIONS I hope this paper will provide you with a great solution and make your work much easier if you face the same situation as I did. In the future, it might also trigger you to write a paper to simplify some routine work and make the programmer s life easier. ACKNOWLEDGMENTS I would like to express my appreciation to Brit Harvey for his helpful suggestions in the section of Alternative Coding Macro Solution, and Alan Nakamoto for editing my paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Linda Gau Pro Unlimited @ Genentech, Inc. 1 DNA Way, Mail stop # 59 South San Francisco, CA 94080 (650) 225-8556 Email: gau.linda@gene.com Email: csgau@yahoo.com 7

Please note that all the SAS programs in this paper were written for the UNIX platform, if you want to run these examples in other platforms, you may need to modify some SAS statements. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 8