SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Similar documents
Getting Your Data into SAS The Basics. Math 3210 Dr. Zeng Department of Mathematics California State University, Bakersfield

Chapter 2: Getting Data Into SAS

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

SAS Training Spring 2006

Writing Programs in SAS Data I/O in SAS

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

Reading data in SAS and Descriptive Statistics

STAT 7000: Experimental Statistics I

April 4, SAS General Introduction

Using an ICPSR set-up file to create a SAS dataset

ECLT 5810 SAS Programming - Introduction

DSCI 325: Handout 2 Getting Data into SAS Spring 2017

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

Chapter 1: Introduction to SAS

Chapter 1 The DATA Step

Other Data Sources SAS can read data from a variety of sources:

Base and Advance SAS

Introduction to the SAS System

Lab #1: Introduction to Basic SAS Operations

MARK CARPENTER, Ph.D.

Stat 302 Statistical Software and Its Applications SAS: Data I/O

STAT:5400 Computing in Statistics

Introduction to SAS Mike Zdeb ( , #61

Introduction to SAS Mike Zdeb ( , #1

Syntax Conventions for SAS Programming Languages

SAS CURRICULUM. BASE SAS Introduction

SAS Programming Basics

Intermediate SAS: Working with Data

The Programmer's Solution to the Import/Export Wizard

Basic Concept Review

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Lecture 1 Getting Started with SAS

Stat 302 Statistical Software and Its Applications SAS: Data I/O & Descriptive Statistics

Customizing Your SAS Session

Chapter 7 File Access. Chapter Table of Contents

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

ERROR: ERROR: ERROR:

Using Dynamic Data Exchange

Accessing Data and Creating Data Structures. SAS Global Certification Webinar Series

SAS Data Libraries. Definition CHAPTER 26

The INPUT Statement: Where

Level I: Getting comfortable with my data in SAS. Descriptive Statistics

Introductory Guide to SAS:

COPYRIGHTED MATERIAL GETTING STARTED LEARNING OBJECTIVES

Full file at

Exam Name: SAS Base Programming for SAS 9

Using SAS Files CHAPTER 3

Introduction to SAS Statistical Package

Creation of SAS Dataset

Stat 302 Statistical Software and Its Applications SAS: Working with Data

The correct bibliographic citation for this manual is as follows: SAS Institute Inc Proc EXPLODE. Cary, NC: SAS Institute Inc.

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

Dr. Barbara Morgan Quantitative Methods

Introduction OR CARDS. INPUT DATA step OUTPUT DATA 8-1

proc print data=account; <insert statement here> run;

DBLOAD Procedure Reference

Fundamentals of Programming Session 4

Chapter 14: Files and Streams

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

The TRANTAB Procedure

Formulas and Functions

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

IT 374 C# and Applications/ IT695 C# Data Structures

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Omitting Records with Invalid Default Values

A Step by Step Guide to Learning SAS

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo

SAS Macro Language: Reference

SAS and Data Management Kim Magee. Department of Biostatistics College of Public Health

Going Under the Hood: How Does the Macro Processor Really Work?

The TIMEPLOT Procedure

Stat 5411 Lab 1 Fall Assignment: Turn in copies of bike.sas and bike.lst from your SAS run. Turn this in next Friday with your assignment.

An Introduction to Stata Part I: Data Management

Opening a Data File in SPSS. Defining Variables in SPSS

Introduction. Getting Started with the Macro Facility CHAPTER 1

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

Introduction. How to Use this Document. What is SAS? Launching SAS. Windows in SAS for Windows. Research Technologies at Indiana University

FSEDIT Procedure Windows

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Getting Our Feet Wet with Stata SESSION TWO Fall, 2018

CS242 COMPUTER PROGRAMMING

Adding Lines in UDP. Adding Lines to Existing UDPs CHAPTER

The EXPLODE Procedure

Getting Started Using SAS Software

CSV Import Guide. Public FINAL V

Print the Proc Report and Have It on My Desktop in the Morning! James T. Kardasis, J.T.K. & Associates, Skokie, IL

ST Lab 1 - The basics of SAS

A Practical Introduction to SAS Data Integration Studio

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

TIPS FROM THE TRENCHES

Using Microsoft Access

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

SAS Online Training: Course contents: Agenda:

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

CHAPTER 7 Using Other SAS Software Products

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE

MATH 707-ST: Introduction to Statistical Computing with SAS and R. MID-TERM EXAM (Writing part) Fall, (Time allowed: TWO Hours)

Transcription:

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Department of MathemaGcs and StaGsGcs Phone: 4-3620 Office: Parker 364- A E- mail: carpedm@auburn.edu Web: hup://www.auburn.edu/~carpedm/stat6110

TOPICS () Introduction to SAS Windows Environment (log, editor, and output screens). Introduction to SAS Help Screens (on-line and within SAS system) Introduction to the SAS DATASTEP SAS LIBNAMES SAS INFILE 2

Rules for SAS Statements SAS statements end with a semicolon. You can enter SAS statements in lowercase, uppercase, or a mixture of the two. You can begin SAS statements in any column of a line and write several statements on the same line. You can begin a statement on one line and continue it on another line, but you cannot split a word between two lines. Words in SAS statements are separated by blanks or by special characters (such as the equal sign and the minus sign in the calculation of the Loss variable in the WEIGHT_CLUB example). 3

Comment Statements Documents the purpose of the programming statements or the overall program. Can appear anywhere in the program Are helpful reminders to the programmer and assist the user in implementation of the program. Syntax: *message; or /*message*/ 4

Comment Statements (cont) Example: /* the following lines produce summary statistics */ or *the following lines produce summary statistics; 5

Comment Statements (cont) Example: /* Author: John Smith Assignment: Homework 1 Due Date: 9/21/04 */ 6

Comment Statements (cont) Example: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/ 7

Comment Statements (cont) NOTE: All Programs for Homework assigments turned will have to have to start with a preamble: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/ 8

Comment Statements (cont) Example: * Author: John Smith; * Assignment: Homework 1; * Due Date: 9/21/04; 9

INTRODUCTION TO THE SAS DATASTEP Click on Help, SAS Help and Documentation Click Contents tab. Click SAS Products then Base SAS Click Step-by-step Programming with Base software 10

SAS BASE PROGRAMMING The DATA step is one of the basic building blocks of SAS programming. It creates the data sets that are used in a SAS program's analysis and reporting procedures. Understanding the basic structure, functioning, and components of the DATA step is fundamental to learning how to create your own SAS data sets. 11

SAS DATA SETS AND DATASTEPs In this section, you will learn the following: what a SAS data set is and why it is needed how the DATA step works what information you have to supply to SAS so that it can construct a SAS data set for you. 12

ANOTOMY OF A DATASTEP Creating a SAS data set from Scratch using datalines statement DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 13

ANOTOMY OF A DATASTEP DATA weight_club; 1 INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 1 The DATA statement tells SAS to begin building a SAS data set named WEIGHT_CLUB 14

ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 2 2 The INPUT statement idengfies the fields to be read from the input data and names the SAS variables to be created from them (IdNumber, Name, Team, StartWeight, and EndWeight). 15

ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; 3 DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 3 The third statement is an assignment statement. It calculates the weight each person lost and assigns the result to a new variable, Loss. 16

ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 4 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 4 The DATALINES statement indicates that data lines follow 17

ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 5 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 5 The data lines follow the DATALINES statement. This approach to processing raw data is useful when you have only a few lines of data. (Later secgons show ways to access larger amounts of data that are stored in files.) 18

ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 6 6 The DATALINES statement marks the beginning of the input data. The single semicolon marks the end of the input data and the DATA step. 19

NAMING CONVENTIONS Rules for Most SAS Names SAS names are used for SAS data set names, variable names, and other items. The following rules apply: A SAS name can contain from one to 32 characters. The first character must be a letter or an underscore (_). Subsequent characters must be letters, numbers, or underscores. Blanks cannot appear in SAS names. 20

NAMING CONVENTIONS Special Rules for Variable Names For variable names only, SAS remembers (labels) the combination of uppercase and lowercase letters that you use when you create the variable name. Internally, the case of letters does not matter. "CAT," "cat," and "Cat" all represent the same variable. But for presentation purposes, SAS remembers (labels) the initial case of each letter and uses it to represent the variable name when printing it. 21

STAT 6110 SOME SAS BASE PROCEDURES OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; title 'Health Club Data'; run; 22

STAT 6110 SOME SAS BASE PROCEDURES options linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; TITLE 'Health Club Data'; RUN; 23

STAT 6110 SOME SAS BASE PROCEDURES OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC TABULATE DATA=weight_club; CLASS team; VAR StartWeight EndWeight Loss; TABLE team, mean*(startweight EndWeight Loss); TITLE1 'Mean StarGng Weight, Ending Weight,'; TITLE2 'and Weight Loss'; RUN; 24

SAS module 1. Create a directory on your hard drive called c:\sasfiles 2. Save the SAS programs to your local directory, a. module1_example1.sas b. module1_example2.sas c. module1_example3.sas d. module1_example4.sas e. module1_exampl5.sas 3. Save the text files, module1_text1.txt and module1_text2.txt, and the the excel file classroll_example.xls to the c:\sasfiles directory. 4. Open SAS and go to the editor. 5. Follow Professor s instructions on how to open and run these programs. 6. Replicate these steps at home and make sure you can open and run SAS programs before the next class meeting. 25

SAS DataSet from Existing SAS DataSet DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; DATA weight2; SET weight_club; *SET statement tells SAS from which existing dataset to begin; RUN; *Run statement tells SAS that you are at the end of this DATASTEP; *DATA statement tells SAS to begin building a SAS data set named weight2; 26

Temporary SAS datasets and the WORK Directory Both SAS datasets, weight_club and weight2 are temporary SAS datesets Temporary SAS datasets can be referenced and used throughout the SAS module in which they were created only. Temporary SAS datasets are stored in the temporary SAS library that SAS calls the WORK. 27

Temporary SAS datasets and the WORK Directory STAT module 5110/6110: 2 : SAS STAT Programming 6110 and ApplicaGons 28 28

Temporary SAS datasets and the WORK Directory Double-click module 2 : STAT 6110 29 29

Temporary SAS datasets and the WORK Directory List of SAS Datasets 30

Permanent SAS datasets and user defined SAS Libraries LIBNAME Statement is used to define a permanent SAS library with name of user s choosing. The SAS library is mapped to a specific folder located on the user s hard-drive. 31

Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; 32

Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; SAS LIBNAME statement tells SAS you are going to create or reference a SAS Library mapped to a specific location on the harddrive. 33

Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; User defined library name. Instead of libref the user may choose the name. 34

Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user s harddrive. Example: LIBNAME stat6110 c:\sasfiles ; 35

Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user s harddrive. Example: LIBNAME stat6110 c:\sasfiles ; Must exist on harddrive 36

Permanent SAS datasets and user defined SAS Libraries Programming Statements SAS log file LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; 85 86 LIBNAME stat6110 'c:\sasfiles'; NOTE: Libref STAT6110 was successfully assigned as follows: Engine: V9 Physical Name: c:\sasfiles 87 88 DATA stat6110.weight_club; 89 SET weight2; 90 RUN; 37

Permanent SAS datasets and user defined SAS Libraries Programming Statements LIBNAME stat6110 'c:\sasfiles'; Creates a library called stat6110 stat6110 is mapped to c:\sasfiles DATA stat6110.weight_club; SET weight2; RUN; 38

Permanent SAS datasets and user defined SAS Libraries Programming Statements LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; Creates a permanent SAS dataset called weight_club which is virtually mapped to the stat6110 library but actual file is located on the harddrive in c:\sasfiles 39

Permanent SAS datasets and user defined SAS Libraries New SAS Library mapped to c:\sasfiles Permanent SAS dataset 40

FREE FORMAT DATA CREATION If the raw data is in rectangular format where columns represent variables and rows represent observations and the variables are separated by spaces, then the SAS dataset can be created (using DATALINES, INFILE, etc) without column formatting. 41

FREE FORMAT AND COMMA DELIMITED FILES DATA1 and DATA2 are identical DATA DATA1; INPUT ID Age savings; DATALINES; 1 25 4000 2 33 1000 3 32 8000 4 26 1500 ; DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; INFILE statement is used when we need to tell SAS special features for the data or special locations (external files). 42

DSD versus delimter=',' DATA2 and DATA2b are identical DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; The DSD and delimiter=',' both sets the comma as the delimiter for this dataset DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; The DSD option sets the comma as the default delimiter 43

DSD versus delimter=',' DSD (delimiter-sensitive data) specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values. DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; 44

FREE FORMAT DATA CREATION Other Delimiters DATA DATA3; INFILE datalines delimiter= 8'; INPUT first$ last$; DATALINES; John8Smith Bill8Johnson Alice8Bening ; DATA DATA4; INFILE datalines delimiter= *'; INPUT ID Age savings; DATALINES; 1*25*4000 2*33*1000 3*32*8000 4*26*1500 ; INFILE statement is used when we need to tell SAS special features for the data or special locations (external files). 45

STAT 6110 READING CHARACTER VARIABLES DATA DATA5; INPUT ID Age Gender$ Savings; DATALINES; 1 25 Male 4000 2 33 Female 1000 3 32 Male 8000 4 26 Male 1500 ; Dollar sign, $, tells SAS that the variable to be read is a character variable. 46

MISSOVER STATEMENT This example demonstrates how to prevent missing values from causing problems when you read the data with list input. Some data lines in this example contain fewer than five temperature values. Use the MISSOVER option so that these values are set to missing. weather1 and weather2 are identical DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3.. 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; 47

MISSOVER STATEMENT Prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing. DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3.. 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; 48

Using the INFILE statement (Reading External Text Files) To find more information on INFILE: While in the text editor in a SAS module, go to Help then click on the Index tab. Type the word infile in the keyword box, then double click the word INFILE in the results section. 49

Using the INFILE statement (Reading External Text Files) Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records. Usually, you use an INFILE statement to read data from an external file. When data is read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step. 50

Using the INFILE statement (Reading External Text Files) Reading Multiple Input Files You can read from multiple input files in a single iteration of the DATA step by using multiple INFILE statements. 51

Using the INFILE statement (Reading External Text Files) C:\module2_text1.txt France,575,Express,10 Spain,510,World,12 Brazil,540,World,6 India,489,Express,. C:\module2_text2.txt Japan,720,Express,10 Greece,698,Express,20 New Z.,1489, Southsea,6 Venez.,425,World,8 Italy,468,Express,9 USSR,924,World,6 Switz.,734,World,20 Austral.,1079,Southsea,10 Ireland,558,Express,9 52

Using the INFILE statement (Reading External Text Files) SAS program that reads in C:\module2_text1.txt DATA DATA1; INFILE 'c:\module2_text1.txt' DSD; INPUT country$ cost vendor$ number; RUN; DATA DATA2; INFILE 'c:\module2_text2.txt' DSD; INPUT country$ cost vendor$ number; RUN; DATA DATA3; SET DATA1 DATA2; RUN; 53

(@@ or "double trailing @"). Sometimes you may need to create multiple observations from a single record of raw data. One way to tell SAS how to read such a record is to use the other line-hold specifier, the double trailing at-sign (@@ or "double trailing @"). The double trailing @ not only prevents SAS from reading a new record into the input buffer when a new INPUT statement is encountered, but it also prevents the record from being released when the program returns to the top of the DATA step. 54

END OF 55