SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Department of MathemaGcs and StaGsGcs Phone: 4-3620 Office: Parker 364- A E- mail: carpedm@auburn.edu Web: hup://www.auburn.edu/~carpedm/stat6110
TOPICS () Introduction to SAS Windows Environment (log, editor, and output screens). Introduction to SAS Help Screens (on-line and within SAS system) Introduction to the SAS DATASTEP SAS LIBNAMES SAS INFILE 2
Rules for SAS Statements SAS statements end with a semicolon. You can enter SAS statements in lowercase, uppercase, or a mixture of the two. You can begin SAS statements in any column of a line and write several statements on the same line. You can begin a statement on one line and continue it on another line, but you cannot split a word between two lines. Words in SAS statements are separated by blanks or by special characters (such as the equal sign and the minus sign in the calculation of the Loss variable in the WEIGHT_CLUB example). 3
Comment Statements Documents the purpose of the programming statements or the overall program. Can appear anywhere in the program Are helpful reminders to the programmer and assist the user in implementation of the program. Syntax: *message; or /*message*/ 4
Comment Statements (cont) Example: /* the following lines produce summary statistics */ or *the following lines produce summary statistics; 5
Comment Statements (cont) Example: /* Author: John Smith Assignment: Homework 1 Due Date: 9/21/04 */ 6
Comment Statements (cont) Example: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/ 7
Comment Statements (cont) NOTE: All Programs for Homework assigments turned will have to have to start with a preamble: /* *********************** * Author: John Smith * * Assignment: Homework 1 * * Due Date: 9/21/04 * *************************/ 8
Comment Statements (cont) Example: * Author: John Smith; * Assignment: Homework 1; * Due Date: 9/21/04; 9
INTRODUCTION TO THE SAS DATASTEP Click on Help, SAS Help and Documentation Click Contents tab. Click SAS Products then Base SAS Click Step-by-step Programming with Base software 10
SAS BASE PROGRAMMING The DATA step is one of the basic building blocks of SAS programming. It creates the data sets that are used in a SAS program's analysis and reporting procedures. Understanding the basic structure, functioning, and components of the DATA step is fundamental to learning how to create your own SAS data sets. 11
SAS DATA SETS AND DATASTEPs In this section, you will learn the following: what a SAS data set is and why it is needed how the DATA step works what information you have to supply to SAS so that it can construct a SAS data set for you. 12
ANOTOMY OF A DATASTEP Creating a SAS data set from Scratch using datalines statement DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 13
ANOTOMY OF A DATASTEP DATA weight_club; 1 INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 1 The DATA statement tells SAS to begin building a SAS data set named WEIGHT_CLUB 14
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 2 2 The INPUT statement idengfies the fields to be read from the input data and names the SAS variables to be created from them (IdNumber, Name, Team, StartWeight, and EndWeight). 15
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; 3 DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 3 The third statement is an assignment statement. It calculates the weight each person lost and assigns the result to a new variable, Loss. 16
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 4 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 4 The DATALINES statement indicates that data lines follow 17
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 5 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 5 The data lines follow the DATALINES statement. This approach to processing raw data is useful when you have only a few lines of data. (Later secgons show ways to access larger amounts of data that are stored in files.) 18
ANOTOMY OF A DATASTEP DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; Loss=StartWeight-EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; 6 6 The DATALINES statement marks the beginning of the input data. The single semicolon marks the end of the input data and the DATA step. 19
NAMING CONVENTIONS Rules for Most SAS Names SAS names are used for SAS data set names, variable names, and other items. The following rules apply: A SAS name can contain from one to 32 characters. The first character must be a letter or an underscore (_). Subsequent characters must be letters, numbers, or underscores. Blanks cannot appear in SAS names. 20
NAMING CONVENTIONS Special Rules for Variable Names For variable names only, SAS remembers (labels) the combination of uppercase and lowercase letters that you use when you create the variable name. Internally, the case of letters does not matter. "CAT," "cat," and "Cat" all represent the same variable. But for presentation purposes, SAS remembers (labels) the initial case of each letter and uses it to represent the variable name when printing it. 21
STAT 6110 SOME SAS BASE PROCEDURES OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; title 'Health Club Data'; run; 22
STAT 6110 SOME SAS BASE PROCEDURES options linesize=80 pagesize=60 pageno=1 nodate; PROC PRINT DATA=weight_club; TITLE 'Health Club Data'; RUN; 23
STAT 6110 SOME SAS BASE PROCEDURES OPTIONS linesize=80 pagesize=60 pageno=1 nodate; PROC TABULATE DATA=weight_club; CLASS team; VAR StartWeight EndWeight Loss; TABLE team, mean*(startweight EndWeight Loss); TITLE1 'Mean StarGng Weight, Ending Weight,'; TITLE2 'and Weight Loss'; RUN; 24
SAS module 1. Create a directory on your hard drive called c:\sasfiles 2. Save the SAS programs to your local directory, a. module1_example1.sas b. module1_example2.sas c. module1_example3.sas d. module1_example4.sas e. module1_exampl5.sas 3. Save the text files, module1_text1.txt and module1_text2.txt, and the the excel file classroll_example.xls to the c:\sasfiles directory. 4. Open SAS and go to the editor. 5. Follow Professor s instructions on how to open and run these programs. 6. Replicate these steps at home and make sure you can open and run SAS programs before the next class meeting. 25
SAS DataSet from Existing SAS DataSet DATA weight_club; INPUT Id 1-4 Name $ 6-24 Team $ StartWeight EndWeight; DATALINES; 1023 David Shaw red 189 165 1049 Amelia Serrano yellow 145 124 1219 Alan Nance red 210 192 1246 Ravi Sinha yellow 194 177 1078 Ashley McKnight red 127 118 ; DATA weight2; SET weight_club; *SET statement tells SAS from which existing dataset to begin; RUN; *Run statement tells SAS that you are at the end of this DATASTEP; *DATA statement tells SAS to begin building a SAS data set named weight2; 26
Temporary SAS datasets and the WORK Directory Both SAS datasets, weight_club and weight2 are temporary SAS datesets Temporary SAS datasets can be referenced and used throughout the SAS module in which they were created only. Temporary SAS datasets are stored in the temporary SAS library that SAS calls the WORK. 27
Temporary SAS datasets and the WORK Directory STAT module 5110/6110: 2 : SAS STAT Programming 6110 and ApplicaGons 28 28
Temporary SAS datasets and the WORK Directory Double-click module 2 : STAT 6110 29 29
Temporary SAS datasets and the WORK Directory List of SAS Datasets 30
Permanent SAS datasets and user defined SAS Libraries LIBNAME Statement is used to define a permanent SAS library with name of user s choosing. The SAS library is mapped to a specific folder located on the user s hard-drive. 31
Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; 32
Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; SAS LIBNAME statement tells SAS you are going to create or reference a SAS Library mapped to a specific location on the harddrive. 33
Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; User defined library name. Instead of libref the user may choose the name. 34
Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user s harddrive. Example: LIBNAME stat6110 c:\sasfiles ; 35
Permanent SAS datasets and user defined SAS Libraries Syntax: LIBNAME libref 'SAS-data-library'; In quotes the user tells SAS where the files will be kept. This is a specific Folder that must already exist on the user s harddrive. Example: LIBNAME stat6110 c:\sasfiles ; Must exist on harddrive 36
Permanent SAS datasets and user defined SAS Libraries Programming Statements SAS log file LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; 85 86 LIBNAME stat6110 'c:\sasfiles'; NOTE: Libref STAT6110 was successfully assigned as follows: Engine: V9 Physical Name: c:\sasfiles 87 88 DATA stat6110.weight_club; 89 SET weight2; 90 RUN; 37
Permanent SAS datasets and user defined SAS Libraries Programming Statements LIBNAME stat6110 'c:\sasfiles'; Creates a library called stat6110 stat6110 is mapped to c:\sasfiles DATA stat6110.weight_club; SET weight2; RUN; 38
Permanent SAS datasets and user defined SAS Libraries Programming Statements LIBNAME stat6110 'c:\sasfiles'; DATA stat6110.weight_club; SET weight2; RUN; Creates a permanent SAS dataset called weight_club which is virtually mapped to the stat6110 library but actual file is located on the harddrive in c:\sasfiles 39
Permanent SAS datasets and user defined SAS Libraries New SAS Library mapped to c:\sasfiles Permanent SAS dataset 40
FREE FORMAT DATA CREATION If the raw data is in rectangular format where columns represent variables and rows represent observations and the variables are separated by spaces, then the SAS dataset can be created (using DATALINES, INFILE, etc) without column formatting. 41
FREE FORMAT AND COMMA DELIMITED FILES DATA1 and DATA2 are identical DATA DATA1; INPUT ID Age savings; DATALINES; 1 25 4000 2 33 1000 3 32 8000 4 26 1500 ; DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; INFILE statement is used when we need to tell SAS special features for the data or special locations (external files). 42
DSD versus delimter=',' DATA2 and DATA2b are identical DATA DATA2; INFILE datalines delimiter=','; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; The DSD and delimiter=',' both sets the comma as the delimiter for this dataset DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; The DSD option sets the comma as the default delimiter 43
DSD versus delimter=',' DSD (delimiter-sensitive data) specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values. DATA DATA2b; INFILE datalines DSD; INPUT ID Age savings; DATALINES; 1,25,4000 2,33,1000 3,32,8000 4,26,1500 ; 44
FREE FORMAT DATA CREATION Other Delimiters DATA DATA3; INFILE datalines delimiter= 8'; INPUT first$ last$; DATALINES; John8Smith Bill8Johnson Alice8Bening ; DATA DATA4; INFILE datalines delimiter= *'; INPUT ID Age savings; DATALINES; 1*25*4000 2*33*1000 3*32*8000 4*26*1500 ; INFILE statement is used when we need to tell SAS special features for the data or special locations (external files). 45
STAT 6110 READING CHARACTER VARIABLES DATA DATA5; INPUT ID Age Gender$ Savings; DATALINES; 1 25 Male 4000 2 33 Female 1000 3 32 Male 8000 4 26 Male 1500 ; Dollar sign, $, tells SAS that the variable to be read is a character variable. 46
MISSOVER STATEMENT This example demonstrates how to prevent missing values from causing problems when you read the data with list input. Some data lines in this example contain fewer than five temperature values. Use the MISSOVER option so that these values are set to missing. weather1 and weather2 are identical DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3.. 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; 47
MISSOVER STATEMENT Prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing. DATA weather1; INFILE datalines missover; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; DATA weather2; INPUT temp1-temp5; DATALINES; 97.9 98.1 98.3.. 98.6 99.2 99.1 98.5 97.5 96.2 97.3 98.3 97.6 96.5 ; 48
Using the INFILE statement (Reading External Text Files) To find more information on INFILE: While in the text editor in a SAS module, go to Help then click on the Index tab. Type the word infile in the keyword box, then double click the word INFILE in the results section. 49
Using the INFILE statement (Reading External Text Files) Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records. Usually, you use an INFILE statement to read data from an external file. When data is read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step. 50
Using the INFILE statement (Reading External Text Files) Reading Multiple Input Files You can read from multiple input files in a single iteration of the DATA step by using multiple INFILE statements. 51
Using the INFILE statement (Reading External Text Files) C:\module2_text1.txt France,575,Express,10 Spain,510,World,12 Brazil,540,World,6 India,489,Express,. C:\module2_text2.txt Japan,720,Express,10 Greece,698,Express,20 New Z.,1489, Southsea,6 Venez.,425,World,8 Italy,468,Express,9 USSR,924,World,6 Switz.,734,World,20 Austral.,1079,Southsea,10 Ireland,558,Express,9 52
Using the INFILE statement (Reading External Text Files) SAS program that reads in C:\module2_text1.txt DATA DATA1; INFILE 'c:\module2_text1.txt' DSD; INPUT country$ cost vendor$ number; RUN; DATA DATA2; INFILE 'c:\module2_text2.txt' DSD; INPUT country$ cost vendor$ number; RUN; DATA DATA3; SET DATA1 DATA2; RUN; 53
(@@ or "double trailing @"). Sometimes you may need to create multiple observations from a single record of raw data. One way to tell SAS how to read such a record is to use the other line-hold specifier, the double trailing at-sign (@@ or "double trailing @"). The double trailing @ not only prevents SAS from reading a new record into the input buffer when a new INPUT statement is encountered, but it also prevents the record from being released when the program returns to the top of the DATA step. 54
END OF 55