SAS 101 Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23 By Tasha Chapman, Oregon Health Authority
Topics covered All the leftovers! Infile options Missover LRECL=/Pad/Truncover FirstObs=/OBS=
Topics covered Advanced formats and informats User created informats Reading both character and numeric data Formats within formats Saving formats Using formats as look-up tables CNTLIN and CNTLOUT datasets
Topics covered Transposing data PROC Transpose Other topics Saving and storing macros %Include Autocall library Stored compiled macros
Infile options
PROC Import with a twist (redux) Run PROC Import Copy the SAS log to the Program Editor PROC Import will create a DATA step with INFILE and INPUT statements in the log Delete any non-sas code Modify informats, formats, and lengths (as needed) Run the new code From Week 3 Chapters 5 & 6
PROC Import with a twist (redux) Infile statement options Run PROC Import Copy the SAS log to the Program Editor Delete any non-sas code Modify informats, formats, and lengths (as needed) Run the new code From Week 3 Chapters 5 & 6
Common INFILE options Option dsd dlm= missover lrecl= pad truncover firstobs= obs= Purpose Stands for delimiter sensitive data. Changes default delimiter from blank to comma. If two delimiters in a row, assumes missing value between. Quotes stripped from character values. Stands for delimiter. Specifies alternate delimiter(s). At the end of an input line of raw data, sets remaining values to missing if there are more variables than data values. Stands for logical record length. Specifies the record length of the raw data file (necessary if greater than the default 256 bytes). Pads the input records with blanks out the the end of the logical record length. Essentially has the effect of both PAD and MISSOVER options combined. Specifies which record number in the raw dataset is the first observation of data. Useful if raw data includes headers. Specifies the record number of the last record to read. Useful if only want to read a select number of observations.
MISSOVER Missing data results in shorter than expected line
MISSOVER MISSOVER option fills in the blanks at the end of the line with missing values.
Common INFILE options Option dsd dlm= missover lrecl= pad truncover firstobs= obs= Purpose Stands for delimiter sensitive data. Changes default delimiter from blank to comma. If two delimiters in a row, assumes missing value between. Quotes stripped from character values. Stands for delimiter. Specifies alternate delimiter(s). At the end of an input line of raw data, sets remaining values to missing if there are more variables than data values. Stands for logical record length. Specifies the record length of the raw data file (necessary if greater than the default 256 bytes). Pads the input records with blanks out the the end of the logical record length. Essentially has the effect of both PAD and MISSOVER options combined. Specifies which record number in the raw dataset is the first observation of data. Useful if raw data includes headers. Specifies the record number of the last record to read. Useful if only want to read a select number of observations.
FIRSTOBS=
FIRSTOBS= First row contains header information
FIRSTOBS= FIRSTOBS=2 option starts reading the raw data file on the second row.
Common INFILE options Option dsd dlm= missover lrecl= pad truncover firstobs= obs= Purpose Stands for delimiter sensitive data. Changes default delimiter from blank to comma. If two delimiters in a row, assumes missing value between. Quotes stripped from character values. Stands for delimiter. Specifies alternate delimiter(s). At the end of an input line of raw data, sets remaining values to missing if there are more variables than data values. Stands for logical record length. Specifies the record length of the raw data file (necessary if greater than the default 256 bytes). Pads the input records with blanks out the the end of the logical record length. Essentially has the effect of both PAD and MISSOVER options combined. Specifies which record number in the raw dataset is the first observation of data. Useful if raw data includes headers. Specifies the record number of the last record to read. Useful if only want to read a select number of observations.
Advanced formats and informats
PROC Format (redux) value $gender Input value Output value Value statement begins new format Can create more than one format per PROC Format $gender is the name of the new format Format name begins with a $ to indicate that the format is to be applied to Character data From Week 3 Chapters 5 & 6
What are informats? (redux) Informats are instructions that tell SAS how to read a data value Can be as simple as w.d 3.1 tells SAS to read 123 as 12.3 $3. tells SAS to read 123 as 123 and store it as character data Excellent for reading dates, dollars, and percents MMDDYY8. tells SAS to read 12/26/07 and store it as 17526 (a SAS date that can be used for calculations, etc.) From Week 3 Chapters 5 & 6
Creating informats invalue score Input value Output value Invalue statement creates informats Dollar sign $ indicates format will be creating character variables (i.e. output value will be character) Absence of dollar sign indicates format will be creating numeric variables (as in this example)
Creating informats Survey scale entered as character values (SA = Strongly Agree, A = Agree, etc.) Want to convert to numeric Likert-type scale
Creating informats Use PROC Format to create the informat score Apply the informat while reading in the raw data
Creating informats UPCASE option Converts all input strings to uppercase before they are compared to ranges JUST option Left justifies all input strings before they are compared to ranges Useful options as raw data may be messy (mixed case, leading blanks, etc.)
Creating informats Dataset of patient temperature readings Normal temperature coded as N Actual temperature entered if not normal Both character and numeric data in same field
Creating informats Use PROC Format to create the informat tempfmt Numeric temperatures within valid range will be read as written. N will be converted to 98.6. Any other values (including numeric temps outside valid range) will be converted to missing.
Formats within formats Formats and informats can be nested within each other Useful for applying multiple types of formats (e.g. picture and value formats) to the same variable depending on the data value
Formats within formats Phone directory dataset Some provided full phone numbers Want to show as (999) 999-9999 Some provided extensions Want to show as x9999 Some have no phone number Want to show as Unlisted
Formats within formats (503) 373-1793 x1793 Applies format based on data value
Saving and storing formats PROC Format saves user-created formats to a catalog Usually these catalogs are in the WORK library, and are deleted at the end of each SAS Session However, formats can be easily saved and stored to other permanent libraries
Saving and storing formats Save a format to a permanent library using a library= option This will create a formats catalog (called formats by default) in the mylib folder
Saving and storing formats To use the saved formats in another program, use the fmtsearch= option to add that catalog to the list of available formats
CNTLIN/CNTLOUT Can create a format from a dataset using the CNTLIN= option in PROC Format Can create a dataset from a format using the CNTLOUT= option in PROC Format
CNTLIN/CNTLOUT Have a dataset of ICD9 codes and descriptions Want to convert this to a SAS format
CNTLIN/CNTLOUT The input dataset has to have specific variables: FMTNAME name of the format START the single value to be formatted (or start value if the beginning of a range of values) END (optional) the end value of a range of values to be formatted LABEL the formatted value TYPE (optional) type of format, C for character, N for numeric
CNTLIN/CNTLOUT Original dataset Ready to be made into a format
CNTLIN/CNTLOUT Use PROC Format to convert the dataset to a format
CNTLIN/CNTLOUT Use PROC Format to convert a format to a dataset
Transposing data
Transposing data
Transposing data Transposing is converting variables to observations and vice versa Multiple ways of restructuring and transposing data PROC Transpose DATA step Arrays and DO Loops
Transposing data basic example Output (transposed) dataset Seven variables Two observations Input (original) dataset Two variables Seven observations
Transposing data basic example data= Input (original) dataset out= Output (transposed) dataset
Transposing data basic example Name of transposed variables stored in _NAME_ column
Transposing data basic example New variables generically named COL1, COL2, etc.
VAR statement The var statement specifies which variables should be transposed If omitted, by default PROC Transpose will only transpose numeric variables
ID statement The ID statement specifies which variables should be used to name the new columns If the value is not a valid variable name (e.g., starts with a number), SAS will convert it to a valid name (e.g., leading underscore)
ID statement The variable names can be modified with the prefix=, delimiter=, or suffix= options
Transposing data BY groups Two temperatures (HighTemp and LowTemp) Three cities (Eugene, Portland, and Salem)
BY statement Can specify more than one BY group variable Data must be sorted by BY group variable(s)
BY statement DayOf Week City HighTemp LowTemp 1 Eugene 68 46 2 Eugene 65 41 3 Eugene 66 45 4 Eugene 63 44 5 Eugene 60 45 6 Eugene 63 43 7 Eugene 65 44 1 Portland 62 44 2 Portland 63 43 3 Portland 61 42 4 Portland 62 39 5 Portland 60 44 City _NAME_ Day1 Day2 Day3 Day4 Day5 Day6 Day7 Eugene HighTemp 68 65 66 63 60 63 65 Eugene LowTemp 46 41 45 44 45 43 44 Portland HighTemp 62 63 61 62 60 62 66 Portland LowTemp 44 43 42 39 44 45 45 Salem HighTemp 65 66 62 60 58 62 68 Salem LowTemp 45 42 43 41 41 45 46 6 Portland 62 45 7 Portland 66 45 1 Salem 65 45 2 Salem 66 42 3 Salem 62 43 4 Salem 60 41 5 Salem 58 41 6 Salem 62 45 7 Salem 68 46
NAME= option Use the name= option to name the variable containing the name of the transposed variable (_NAME_ column)
Saving and storing macros
Saving and storing macros Need to store and share macro code Multiple ways to save and store macros for future use %Include Autocall facility Stored compiled macro facility
Saving and storing macros Which method to choose depends on your needs and operating environment SAS recommends: Don t store macros while still in development If you are running production-level jobs using namestyle macros, consider stored compiled macros If you are letting a group of users share macros, consider the autocall facility
LIBNAME trick (redux) Save your commonly used and/or passworded LIBNAME statements in a text file (using Notepad) Use a %include statement to reference the text file at the beginning of every SAS program SAS will include the code in the text file as if it were part of your program. From Week 2 Chapters 3 & 4
%Include Save your macro definitions in a text file Use %include to reference the file at the start of every program
%Include Advantages: Easy and straightforward approach Excellent first step towards starting a macro library Disadvantages: The macro definition is compiled every time the %include is executed (inefficient) If efficiency is an issue, each file should contain only one macro (which would result in multiple files to include) Requires you to know where the physical text files are stored
Autocall facility An autocall library is a directory containing individual files Similar in concept to %include, but files stored as SAS files Each file contains one macro definition The name of the file must be the same as the macro name An autocall library can also be a SAS catalog
Autocall facility Save the SAS code for your macro using the macro name as the program file name To avoid confusion, this folder should have nothing but autocall macros
Autocall facility To use the macro later Reference the folder storing the autocall macros with a FILEREF (created with a filename statement) Not a libref!
Autocall facility mautosource option turns on the autocall macro facility mautolocdisplay option (optional) displays the location of the source code in the log when the macro is called sasautos= option tells SAS where the autocall macros are stored
Autocall facility Advantages: Macros stored as SAS code can use enhanced editor to modify them User-defined macros stored in a standard location No need to remember multiple file names when calling macros Macro code only compiled the first time it is used in a session (efficient) Easy to share
Autocall facility Disadvantages: Because macro code only compiled once per session, this can be difficult during editing phase
Stored compiled macros Macros are always compiled before they are executed Compiled macros are stored in a catalog called SASMACR In a typical session, this catalog is stored in the WORK library However, this catalog can be stored in a more permanent library for future use
Stored compiled macros Create a library to store the SASMACR catalog
Stored compiled macros mstored option turns the stored compiled macro facility on sasmstore= option identifies the library where the SASMACR catalog will be stored
Stored compiled macros Run the macro you want to store store option tells SAS to store this macro source option (optional) stores the source code with the compiled code des= option (optional) assigns a descriptive title for the macro entry in the SAS catalog
Stored compiled macros To use the macro later mstored option turns the stored compiled macro facility on sasmstore= option identifies the library where the SASMACR catalog is stored
Stored compiled macros SASMACR catalog available to view in the Explorer Window Description stored as a file property (Right-click Properties)
Stored compiled macros If the source option was used during macro storage, the source code can be retrieved using %copy (Code will be printed to log)
Stored compiled macros Advantages: Macro programs only compiled once Compile and store is faster Can store more than one macro per catalog Keeping track of macros is easy Source code does not have to be stored with SASMACR catalog But for maintenance purposes, it is recommended
Stored compiled macros Disadvantages: Cannot recreate source statements from a compiled macro Cannot be moved directly to other operating systems Must be saved and recompiled under new OS at any new location May need to be recompiled for new releases of SAS
Saving and storing macros If macros are stored in multiple locations, SAS will search for macro definitions in this order: WORK.SASMACR catalog Stored compiled macros Autocall macros
Additional Reading Missover, Truncover, and Pad, Oh My!! or Making Sense of the Infile and Input Statements Yes We Can Save SAS Formats Learn the Basics of Proc Transpose Turning the Data Around: Proc Transpose and Alternative Approaches Use of a Macro to Revise Data Creating a Stored Macro Facility in Ten Minutes Ways to Store Macro Source Codes and How to Retrieve Them Building and Using Macro Libraries
You Did It! That s all, folks!