Epidemiology 9509 Principles of Biostatistics Chapter 3 John Koval Department of Epidemiology and Biostatistics University of Western Ontario
What we will do today We will learn to use use SAS to 1. read raw data 2. create word descriptions of the data 3. perform some simple statistics
Many windows (6) 1. toolbar 2. explorer 3. results 4. editor 5. log 6. output Initially 4 are visible toolbar explorer editor log
Programming process 1. use editor to prepare program 2. submit 3. look at log and output 4. decide to use editor to modify program 5. etc
Running SAS Figure 3.1: Process for running SAS Editor SAS program SAS errors Log window output Output window more?
Notation KEYWORD user info; example DATA name; 1. KEYWORD SAS command 2. user info specific to your analysis 3. ; the blessed SAS semi-colon could be data name;
General pattern 1. preamble 2. data step (DATA) 3. procedure (PROC)
Preamble 1. TITLE 2. OPTIONS 3. FILENAME
TITLE statement prints at beginning of each page of output TITLE Worlds greatest analysis ; TITLE1 Worlds greatest analysis ; TITLE2 for my thesis of course ;
OPTIONS for example, to change output OPTIONS LINESIZE=80 PAGESIZE=60; shortens output to fit on letter-size pages 1. LINESIZE=80 no more than 80 characters on a page 2. PAGESIZE=60 no more than 60 lines per page short form is OPTIONS ls=80 ps=60;
FILENAME statement indicates location of file needed by SAS FILENAME one one.dat ; looks for file one.dat in c:\documents and Settings\yourname or FILENAME one U:\one.dat ; looks for one.dat on U: drive one is SAS name for file one.dat is Windoze name for file
Other SAS statements 1. Comments 2. RUN 3. QUIT 4. ENDSAS
Comments in code only does not print out (go to OUTPUT window) 1. on a line by itself * look at this brilliant SAS code; 2. at end of a line of the program PROC FREQ /* twoway tables */;
RUN QUIT and ENDSAS 1. RUN RUN; 1.1 to run SAS program that proceeds 1.2 at end of SAS program to complete run of all commands 1.3 can highlight code and SUBMIT 2. QUIT QUIT; 2.1 at end of SAS program to run all commands 2.2 stop runaway program 3. ENDSAS ENDSAS; AVOID kills SAS session
The Data step 1. DATA 2. DATALINES 3. INPUT 4. LABEL
DATA statement DATA first; creates temporary SAS dataset first.sas7bdat in WORK folder
DATALINES statement brings data into SAS program DATALINES; 1 22 1 60 2 25 1 80... 6 23 2 70 ; semicolon(;) at end of data
INPUT statement short names of variables INPUT id age sex hr;
LABEL statement extended names of variables LABEL id = Unique identification number hr = heart rate ; semicolon (;) at end of all labels in LABEL statement
SAS Procedures doing the statistics 1. PROC PRINT 2. PROC MEANS 3. PROC FREQ 4. PROC FORMAT
PROC PRINT gives a list of values of indicated variables shortening output to only first M cases PROC PRINT DATA=first (OBS=4); VAR age sex;
PROC MEANS mean, variance, standard deviation minimum, maximum PROC MEANS; VAR age hr;
PROC FREQ frequencies, relative frequencies cumulative frequencies, cumulative relative frequencies PROC FREQ; TABLE sex;
Formats giving meaning values to numerical values of discrete data 1. PROC FORMAT 2. FORMAT statment
PROC FORMAT defines formats for rest of SAS session appears before use in FORMAT statement can appear before DATA step PROC FORMAT VALUE sex 1= Male 2= Female VALUE yesno 1= Yes 2= No ;
FORMAT Statement 1. PROC 2. DATA step FORMAT sex sex. q1 yesno.
A SAS program title First SAS program ; options pagesize=60 linesize=80; proc format; value gender 1= Male 2= Female ; data first; input id age sex hr; label age= Age of Subject sex= Sex of Subject hr= Heart Rate ; format sex gender.;
A SAS program (continued) datalines; 1 22 1 60 2 25 1 80 3 24 1 75 4 27 2 55 5 26 2 65 6 23 2 70 ; proc means; var age; proc freq; table sex; run;
Sample output file First SAS program 1 17:12 Wednesday, September 7, 2011 The MEANS Procedure Analysis Variable : age Age of Subject N Mean Std Dev Minimum Maximum ------------------------------------------------- 6 24.5000000 1.8708287 22.0000000 27.0000000 --------------------_---------------------------- SAS sample program 2 The FREQ Procedure Sex of Subject Cumulative Cumulative sex Frequency Percent Frequency Percent ----------------------------------------------------- Male 3 50.00 3 50.00 Female 3 50.00 6 100.00
Sample log file NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.2 (TS2M3) NOTE: SAS initialization used: real time 25.68 seconds cpu time 6.73 seconds 1 title First SAS program ; 2 options pagesize=60 linesize=80; 3 proc format; 4 value gender 1= Male 2= Female ; NOTE: Format GENDER has been output. NOTE: PROCEDURE FORMAT used (Total process time): real time 0.38 seconds cpu time 0.14 seconds
Sample log file - continued 5 data first; 6 input id age sex hr; 7 label age= Age of Subject 8 sex= Sex of Subject 9 hr= Heart Rate ; 10 format sex gender.; 11 datalines; NOTE: The data set WORK.FIRST has 6 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 1.91 seconds cpu time 0.36 seconds 18 ; 19 proc means; 20 var age;
Sample log file - III NOTE: There were 6 observations read from the data set WORK.FIRST. NOTE: PROCEDURE MEANS used (Total process time): real time 1.50 seconds cpu time 0.34 seconds 21 proc freq; 22 table sex; 23 run; NOTE: There were 6 observations read from the data set WORK.FIRST. NOTE: PROCEDURE FREQ used (Total process time): real time 0.82 seconds cpu time 0.15 seconds
REMEMBER Save 1. your SAS program often 2. your output file when you have a successful run Save on the U: drive