SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Similar documents
Chapter 2: Getting Data Into SAS

Storing and Reusing Macros

DSCI 325: Handout 2 Getting Data into SAS Spring 2017

Using an ICPSR set-up file to create a SAS dataset

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Stat 302 Statistical Software and Its Applications SAS: Data I/O

Other Data Sources SAS can read data from a variety of sources:

Stat 302 Statistical Software and Its Applications SAS: Data I/O & Descriptive Statistics

Base and Advance SAS

New Macro Features Added in SAS 9.3 and SAS 9.4

Reading data in SAS and Descriptive Statistics

Introduction to SAS Mike Zdeb ( , #61

FSEDIT Procedure Windows

16. Reading raw data in fixed fields. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 364

The Programmer's Solution to the Import/Export Wizard

Using Dynamic Data Exchange

Objectives Reading SAS Data Sets and Creating Variables Reading a SAS Data Set Reading a SAS Data Set onboard ia.dfwlax FirstClass Economy

Accessing Data and Creating Data Structures. SAS Global Certification Webinar Series

CMU MSP : SAS FORMATs and INFORMATs Howard Seltman Nov. 7+12, 2018

The INPUT Statement: Where

CMU MSP : SAS FORMATs and INFORMATs Howard Seltman October 15, 2017

Formats, Informats and How to Program with Them Ian Whitlock, Westat, Rockville, MD

Functions vs. Macros: A Comparison and Summary

using and Understanding Formats

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

Utilizing the Stored Compiled Macro Facility in a Multi-user Clinical Trial Setting

Chapter 6: Modifying and Combining Data Sets

Merge Processing and Alternate Table Lookup Techniques Prepared by

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

SAS CURRICULUM. BASE SAS Introduction

Getting Your Data into SAS The Basics. Math 3210 Dr. Zeng Department of Mathematics California State University, Bakersfield

Syntax Conventions for SAS Programming Languages

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

Reading in Data Directly from Microsoft Word Questionnaire Forms

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Template Versatility Using SAS Macro Language to Generate Dynamic RTF Reports Patrick Leon, MPH

Eventus Example Series Using Non-CRSP Data in Eventus 7 1

17. Reading free-format data. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 386

April 4, SAS General Introduction

MATH 707-ST: Introduction to Statistical Computing with SAS and R. MID-TERM EXAM (Writing part) Fall, (Time allowed: TWO Hours)

Basic Concept Review

Validating And Updating Your Data Using SAS Formats Peter Welbrock, Britannia Consulting, Inc., MA

Using SAS Files CHAPTER 3

Chapter 1 The DATA Step

MISSOVER, TRUNCOVER, and PAD, OH MY!! or Making Sense of the INFILE and INPUT Statements. Randall Cates, MPH, Technical Training Specialist

Intermediate SAS: Working with Data

Introduction to SAS Statistical Package

22S:166. Checking Values of Numeric Variables

Procedures. PROC CATALOG CATALOG=<libref.>catalog <ENTRYTYPE=etype> <KILL>; CONTENTS <OUT=SAS-data-set> <FILE=fileref;>

An Introduction to SAS University Edition

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

STAT 7000: Experimental Statistics I

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users

The INPUT Statement: Where It

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

Formats and the Format Procedure

Exporting & Importing Datasets & Catalogs: Utility Macros

Macros to Manage your Macros? Garrett Weaver, University of Southern California, Los Angeles, CA

Top-Down Programming with SAS Macros Edward Heaton, Westat, Rockville, MD

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo

Exchanging data between SAS and Microsoft Excel

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

Paper SAS Programming Conventions Lois Levin, Independent Consultant, Bethesda, Maryland

SAS Certification Handout #6: Ch

SAS Programming Conventions Lois Levin, Independent Consultant

Learn to Impress - Hidden base SAS features. Peter Crawford Crawford Software Consultancy Limited 1 of 21

Introduction to the SAS System

Introduction to SAS. Hsueh-Sheng Wu. Center for Family and Demographic Research. November 1, 2010

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

STAT:5400 Computing in Statistics

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Lecture 1 Getting Started with SAS

Introduction OR CARDS. INPUT DATA step OUTPUT DATA 8-1

A Macro that can Search and Replace String in your SAS Programs

SUGI 29 Data Warehousing, Management and Quality

Biostatistics 600 SAS Lab Supplement 1 Fall 2012

SAS Online Training: Course contents: Agenda:

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE

SAS Macro Programming for Beginners

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

BIOMETRICS INFORMATION

Writing Programs in SAS Data I/O in SAS

Customizing Your SAS Session

Review of PC-SAS Batch Programming

AN INTRODUCTION TO MACRO VARIABLES AND MACRO PROGRAMS Mike Zdeb, School of Public Health

Chapter 7 File Access. Chapter Table of Contents

A Time Saver for All: A SAS Toolbox Philip Jou, Baylor University, Waco, TX

Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC

ERROR: ERROR: ERROR:

IT 433 Final Exam. June 9, 2014

Week 9: PROC TABULATE (Chapter 19)

SAS/FSP 9.2. Procedures Guide

Dictionary.coumns is your friend while appending or moving data

Implementing User-Friendly Macro Systems Ed Heaton Data & Analytic Solutions, Inc. Sarah Woodruff Westat

Level I: Getting comfortable with my data in SAS. Descriptive Statistics

STATION

Help, I've Received a Spreadsheet File from StarOffice Calc...!

SAS Certification Handout #7: Ch

Transcription:

SAS 101 Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23 By Tasha Chapman, Oregon Health Authority

Topics covered All the leftovers! Infile options Missover LRECL=/Pad/Truncover FirstObs=/OBS=

Topics covered Advanced formats and informats User created informats Reading both character and numeric data Formats within formats Saving formats Using formats as look-up tables CNTLIN and CNTLOUT datasets

Topics covered Transposing data PROC Transpose Other topics Saving and storing macros %Include Autocall library Stored compiled macros

Infile options

PROC Import with a twist (redux) Run PROC Import Copy the SAS log to the Program Editor PROC Import will create a DATA step with INFILE and INPUT statements in the log Delete any non-sas code Modify informats, formats, and lengths (as needed) Run the new code From Week 3 Chapters 5 & 6

PROC Import with a twist (redux) Infile statement options Run PROC Import Copy the SAS log to the Program Editor Delete any non-sas code Modify informats, formats, and lengths (as needed) Run the new code From Week 3 Chapters 5 & 6

Common INFILE options Option dsd dlm= missover lrecl= pad truncover firstobs= obs= Purpose Stands for delimiter sensitive data. Changes default delimiter from blank to comma. If two delimiters in a row, assumes missing value between. Quotes stripped from character values. Stands for delimiter. Specifies alternate delimiter(s). At the end of an input line of raw data, sets remaining values to missing if there are more variables than data values. Stands for logical record length. Specifies the record length of the raw data file (necessary if greater than the default 256 bytes). Pads the input records with blanks out the the end of the logical record length. Essentially has the effect of both PAD and MISSOVER options combined. Specifies which record number in the raw dataset is the first observation of data. Useful if raw data includes headers. Specifies the record number of the last record to read. Useful if only want to read a select number of observations.

MISSOVER Missing data results in shorter than expected line

MISSOVER MISSOVER option fills in the blanks at the end of the line with missing values.

Common INFILE options Option dsd dlm= missover lrecl= pad truncover firstobs= obs= Purpose Stands for delimiter sensitive data. Changes default delimiter from blank to comma. If two delimiters in a row, assumes missing value between. Quotes stripped from character values. Stands for delimiter. Specifies alternate delimiter(s). At the end of an input line of raw data, sets remaining values to missing if there are more variables than data values. Stands for logical record length. Specifies the record length of the raw data file (necessary if greater than the default 256 bytes). Pads the input records with blanks out the the end of the logical record length. Essentially has the effect of both PAD and MISSOVER options combined. Specifies which record number in the raw dataset is the first observation of data. Useful if raw data includes headers. Specifies the record number of the last record to read. Useful if only want to read a select number of observations.

FIRSTOBS=

FIRSTOBS= First row contains header information

FIRSTOBS= FIRSTOBS=2 option starts reading the raw data file on the second row.

Common INFILE options Option dsd dlm= missover lrecl= pad truncover firstobs= obs= Purpose Stands for delimiter sensitive data. Changes default delimiter from blank to comma. If two delimiters in a row, assumes missing value between. Quotes stripped from character values. Stands for delimiter. Specifies alternate delimiter(s). At the end of an input line of raw data, sets remaining values to missing if there are more variables than data values. Stands for logical record length. Specifies the record length of the raw data file (necessary if greater than the default 256 bytes). Pads the input records with blanks out the the end of the logical record length. Essentially has the effect of both PAD and MISSOVER options combined. Specifies which record number in the raw dataset is the first observation of data. Useful if raw data includes headers. Specifies the record number of the last record to read. Useful if only want to read a select number of observations.

Advanced formats and informats

PROC Format (redux) value $gender Input value Output value Value statement begins new format Can create more than one format per PROC Format $gender is the name of the new format Format name begins with a $ to indicate that the format is to be applied to Character data From Week 3 Chapters 5 & 6

What are informats? (redux) Informats are instructions that tell SAS how to read a data value Can be as simple as w.d 3.1 tells SAS to read 123 as 12.3 $3. tells SAS to read 123 as 123 and store it as character data Excellent for reading dates, dollars, and percents MMDDYY8. tells SAS to read 12/26/07 and store it as 17526 (a SAS date that can be used for calculations, etc.) From Week 3 Chapters 5 & 6

Creating informats invalue score Input value Output value Invalue statement creates informats Dollar sign $ indicates format will be creating character variables (i.e. output value will be character) Absence of dollar sign indicates format will be creating numeric variables (as in this example)

Creating informats Survey scale entered as character values (SA = Strongly Agree, A = Agree, etc.) Want to convert to numeric Likert-type scale

Creating informats Use PROC Format to create the informat score Apply the informat while reading in the raw data

Creating informats UPCASE option Converts all input strings to uppercase before they are compared to ranges JUST option Left justifies all input strings before they are compared to ranges Useful options as raw data may be messy (mixed case, leading blanks, etc.)

Creating informats Dataset of patient temperature readings Normal temperature coded as N Actual temperature entered if not normal Both character and numeric data in same field

Creating informats Use PROC Format to create the informat tempfmt Numeric temperatures within valid range will be read as written. N will be converted to 98.6. Any other values (including numeric temps outside valid range) will be converted to missing.

Formats within formats Formats and informats can be nested within each other Useful for applying multiple types of formats (e.g. picture and value formats) to the same variable depending on the data value

Formats within formats Phone directory dataset Some provided full phone numbers Want to show as (999) 999-9999 Some provided extensions Want to show as x9999 Some have no phone number Want to show as Unlisted

Formats within formats (503) 373-1793 x1793 Applies format based on data value

Saving and storing formats PROC Format saves user-created formats to a catalog Usually these catalogs are in the WORK library, and are deleted at the end of each SAS Session However, formats can be easily saved and stored to other permanent libraries

Saving and storing formats Save a format to a permanent library using a library= option This will create a formats catalog (called formats by default) in the mylib folder

Saving and storing formats To use the saved formats in another program, use the fmtsearch= option to add that catalog to the list of available formats

CNTLIN/CNTLOUT Can create a format from a dataset using the CNTLIN= option in PROC Format Can create a dataset from a format using the CNTLOUT= option in PROC Format

CNTLIN/CNTLOUT Have a dataset of ICD9 codes and descriptions Want to convert this to a SAS format

CNTLIN/CNTLOUT The input dataset has to have specific variables: FMTNAME name of the format START the single value to be formatted (or start value if the beginning of a range of values) END (optional) the end value of a range of values to be formatted LABEL the formatted value TYPE (optional) type of format, C for character, N for numeric

CNTLIN/CNTLOUT Original dataset Ready to be made into a format

CNTLIN/CNTLOUT Use PROC Format to convert the dataset to a format

CNTLIN/CNTLOUT Use PROC Format to convert a format to a dataset

Transposing data

Transposing data

Transposing data Transposing is converting variables to observations and vice versa Multiple ways of restructuring and transposing data PROC Transpose DATA step Arrays and DO Loops

Transposing data basic example Output (transposed) dataset Seven variables Two observations Input (original) dataset Two variables Seven observations

Transposing data basic example data= Input (original) dataset out= Output (transposed) dataset

Transposing data basic example Name of transposed variables stored in _NAME_ column

Transposing data basic example New variables generically named COL1, COL2, etc.

VAR statement The var statement specifies which variables should be transposed If omitted, by default PROC Transpose will only transpose numeric variables

ID statement The ID statement specifies which variables should be used to name the new columns If the value is not a valid variable name (e.g., starts with a number), SAS will convert it to a valid name (e.g., leading underscore)

ID statement The variable names can be modified with the prefix=, delimiter=, or suffix= options

Transposing data BY groups Two temperatures (HighTemp and LowTemp) Three cities (Eugene, Portland, and Salem)

BY statement Can specify more than one BY group variable Data must be sorted by BY group variable(s)

BY statement DayOf Week City HighTemp LowTemp 1 Eugene 68 46 2 Eugene 65 41 3 Eugene 66 45 4 Eugene 63 44 5 Eugene 60 45 6 Eugene 63 43 7 Eugene 65 44 1 Portland 62 44 2 Portland 63 43 3 Portland 61 42 4 Portland 62 39 5 Portland 60 44 City _NAME_ Day1 Day2 Day3 Day4 Day5 Day6 Day7 Eugene HighTemp 68 65 66 63 60 63 65 Eugene LowTemp 46 41 45 44 45 43 44 Portland HighTemp 62 63 61 62 60 62 66 Portland LowTemp 44 43 42 39 44 45 45 Salem HighTemp 65 66 62 60 58 62 68 Salem LowTemp 45 42 43 41 41 45 46 6 Portland 62 45 7 Portland 66 45 1 Salem 65 45 2 Salem 66 42 3 Salem 62 43 4 Salem 60 41 5 Salem 58 41 6 Salem 62 45 7 Salem 68 46

NAME= option Use the name= option to name the variable containing the name of the transposed variable (_NAME_ column)

Saving and storing macros

Saving and storing macros Need to store and share macro code Multiple ways to save and store macros for future use %Include Autocall facility Stored compiled macro facility

Saving and storing macros Which method to choose depends on your needs and operating environment SAS recommends: Don t store macros while still in development If you are running production-level jobs using namestyle macros, consider stored compiled macros If you are letting a group of users share macros, consider the autocall facility

LIBNAME trick (redux) Save your commonly used and/or passworded LIBNAME statements in a text file (using Notepad) Use a %include statement to reference the text file at the beginning of every SAS program SAS will include the code in the text file as if it were part of your program. From Week 2 Chapters 3 & 4

%Include Save your macro definitions in a text file Use %include to reference the file at the start of every program

%Include Advantages: Easy and straightforward approach Excellent first step towards starting a macro library Disadvantages: The macro definition is compiled every time the %include is executed (inefficient) If efficiency is an issue, each file should contain only one macro (which would result in multiple files to include) Requires you to know where the physical text files are stored

Autocall facility An autocall library is a directory containing individual files Similar in concept to %include, but files stored as SAS files Each file contains one macro definition The name of the file must be the same as the macro name An autocall library can also be a SAS catalog

Autocall facility Save the SAS code for your macro using the macro name as the program file name To avoid confusion, this folder should have nothing but autocall macros

Autocall facility To use the macro later Reference the folder storing the autocall macros with a FILEREF (created with a filename statement) Not a libref!

Autocall facility mautosource option turns on the autocall macro facility mautolocdisplay option (optional) displays the location of the source code in the log when the macro is called sasautos= option tells SAS where the autocall macros are stored

Autocall facility Advantages: Macros stored as SAS code can use enhanced editor to modify them User-defined macros stored in a standard location No need to remember multiple file names when calling macros Macro code only compiled the first time it is used in a session (efficient) Easy to share

Autocall facility Disadvantages: Because macro code only compiled once per session, this can be difficult during editing phase

Stored compiled macros Macros are always compiled before they are executed Compiled macros are stored in a catalog called SASMACR In a typical session, this catalog is stored in the WORK library However, this catalog can be stored in a more permanent library for future use

Stored compiled macros Create a library to store the SASMACR catalog

Stored compiled macros mstored option turns the stored compiled macro facility on sasmstore= option identifies the library where the SASMACR catalog will be stored

Stored compiled macros Run the macro you want to store store option tells SAS to store this macro source option (optional) stores the source code with the compiled code des= option (optional) assigns a descriptive title for the macro entry in the SAS catalog

Stored compiled macros To use the macro later mstored option turns the stored compiled macro facility on sasmstore= option identifies the library where the SASMACR catalog is stored

Stored compiled macros SASMACR catalog available to view in the Explorer Window Description stored as a file property (Right-click Properties)

Stored compiled macros If the source option was used during macro storage, the source code can be retrieved using %copy (Code will be printed to log)

Stored compiled macros Advantages: Macro programs only compiled once Compile and store is faster Can store more than one macro per catalog Keeping track of macros is easy Source code does not have to be stored with SASMACR catalog But for maintenance purposes, it is recommended

Stored compiled macros Disadvantages: Cannot recreate source statements from a compiled macro Cannot be moved directly to other operating systems Must be saved and recompiled under new OS at any new location May need to be recompiled for new releases of SAS

Saving and storing macros If macros are stored in multiple locations, SAS will search for macro definitions in this order: WORK.SASMACR catalog Stored compiled macros Autocall macros

Additional Reading Missover, Truncover, and Pad, Oh My!! or Making Sense of the Infile and Input Statements Yes We Can Save SAS Formats Learn the Basics of Proc Transpose Turning the Data Around: Proc Transpose and Alternative Approaches Use of a Macro to Revise Data Creating a Stored Macro Facility in Ten Minutes Ways to Store Macro Source Codes and How to Retrieve Them Building and Using Macro Libraries

You Did It! That s all, folks!