SAS Training Spring 2006 Coxe/Maner/Aiken Introduction to SAS: This is what SAS looks like when you first open it: There is a Log window on top; this will let you know what SAS is doing and if SAS encountered any errors or problems with your syntax. (In this class, you will be asked to print out the Log file as well as the Output for your homework if you use SAS.) The lower window is called the Editor; this is where you will edit your syntax. You can see that there is also a bar at the bottom for a window called Output, which will contain (surprise!) the output of any analyses that you run. Right now it doesn t contain anything because we haven t run any analyses.
Opening, editing, and saving syntax: As in SPSS, you can open saved syntax files (which have extension.sas) or type syntax directly into the Editor window. To open a saved.sas file, select File then Open, find your file on the computer and select it, and click Open. Similarly, to save a syntax file you have written or edited, select File then Save As
Running syntax: The syntax file looks like this when it has been opened. To run the syntax, select it and press the little running man icon ( ) on the right side of the toolbar or select Run then Submit from the pull-down menus.
After running the syntax: Now you can see the output (SAS has selected the Output window to be in the front). This will contain all of your output except graphs (which are in a new window called Graph1 that you can select from the window bar at the bottom of SAS, see arrow). You can scroll through the output using the scroll bar at the right, or using the Page Up and Page Down keys on the keyboard. You can cycle through the graphs (there are 2 of them in this example) the same way.
Printing SAS output and graphs: When you print the Output, all pages of the output will be printed at once. They are often formatted oddly and use up a lot of pages. It is usually easier to select all of the output and copy and paste it into a word processing program (like Microsoft Word), so you can adjust the margins and page orientation to make the output fit better on the pages (and save a few trees). When you print from the graphs window, only the visible graph will be printed. This is very important! If there are several graphs from your analysis, you will have to scroll through, selecting and printing each one. Here is what it looks like when you first select the Graph window. This is the first graph, a histogram. From here, you can select File then Print to print this graph. Remember that even though there are several graphs in this analysis, SAS will only print the one you are looking at.
To print the second graph, use the Page Down button or the scroll bar to show the second graph, which is a scatterplot. Select File then Print to print the second graph.
General Syntax and Other Rules SAS is not case-sensitive. It does not differentiate between upper and lowercase. Always include a after each block of code. This enables you to run particular blocks of code within your program, as opposed to running the whole program each time. Remember, SAS will not execute a command unless it is followed by a RUN. Always include a semicolon after each line of code. When debugging your program, always check first to make sure that you did not omit a semi-colon. Four times out of five, that will be the problem. For each procedure (proc) include the name of the dataset for which you want the procedure run (data=yourdata). This is important when you have multiple datasets open in SAS. Use comments frequently. They will help you greatly when you return to a program after spending time away from it. There are 2 ways to include a comment in your program. The first is simply to place an asterisk (*) at the beginning of the line. When it sees an asterisk, SAS will ignore everything until it reaches semicolon. The other way is to place a forward slash and asterisk (/*) at the beginning of the line. When SAS sees this it will ignore everything (including semi-colons) until it reaches an asterisk and slash (*/). This second method of commenting is useful if you want to make SAS ignore a large block of code that includes semicolons. See examples of both methods in the program at the end of the handout. Use titles they will help you organize your output. Remember that titles are always enclosed within single quotation marks. You may use several titles and layer them (title, title2, title3, etc.) Each subsequent title will appear under the one before it. To replace a title simply use a new title line with the new title you want it will automatically overwrite the old one. To get ride of a title line simply put the title line with no title (e.g., title2; ). Data-related Syntax The datastep is the portion of your program in which you read in data from an external file. It consists of a DATA statement, which names your working dataset within SAS, an INFILE statement that tells SAS where your external file is and what it is named, and an INPUT statement, which tells SAS what variables to read in, and what columns each variable is located in. Note that variable names and dataset names cannot exceed 8 characters. You may read in alphanumeric data (contains both letters and numbers) by designating a variable as a string variable. You do this by including a dollar sign ($) after the variable name, and before the column designation (e.g., SEX $ 1-4). There are many ways to manage and manipulate your data in SAS. You can sort your data (using the SORT statement), merge different datasets (using the MERGE statement), create sub-datasets (using the DATA and SET statements), and print your data in the output (using the PROC PRINT command). Examples and explanations are included within the sample program.
TITLE 'FALL 93 GRADES, 230 -- SAS PSY531(ex1.sas)'; TITLE2 Regression Class Example 1 ; OPTIONS NOCENTER LINESIZE=80 PAGESIZE=44; ** Here comes the datastep. Notice that I am using comments to annotate this program; DATA grades; INFILE 'a:\grade230.txt'; INPUT id 2-3 sex 8 T1 31-32 PS1 34 PS2 36 PS3 38 PS4 40 PS5 42 PS6 44 PS7 46 T2 48-49 PS8 51 PS9 53 T3 55-57 PS10 59 PS11 61 PS12 63 PS13 65 T4 66-69; LABEL t1='test 1' t2='test 2' t3='test 3' t4='test 4'; ** Notice that I just put a RUN after the datastep include a run after each block of code; /* If the variable sex contained letters (male, fem instead of 1,2) we would need to designate it as a string variable. See the following code: INPUT id 2-3 sex $ 8-12 ; Notice also that I ve used the second method for commenting here. SAS ignores Everything it just saw until it sees the star slash */ * Always use PROC PRINT to look at your data, making sure it has been read in properly; PROC PRINT DATA=grades; * Here we create value labels with PROC FORMAT; PROC FORMAT DATA=grades; VALUE gender 1='Male' 2='Female'; * Here we use PROC FREQ to get frequencies for the different values of sex; * Note that PROC FREQ requires the use of TABLES to designate the variables you want frequencies for; * Note also that we use the FORMAT statement to include value labels in the output; PROC FREQ DATA=grades; TABLES sex; FORMAT sex gender.; /* PROC FREQ can also give you cross tabulations. Imagine that you had a variable called COND (condition), and you wanted to know how many males and females were in each condition. You could use the following code PROC FREQ DATA=grades; TABLES cond*sex; */ * PROC UNIVARIATE supplies univariate statistics. Use the VAR command to tell SAS what variables you want stats for. Most procedures use the VAR command;
PROC UNIVARIATE DATA=grades; VAR t3 t4; * Using PROC FREQ to get frequency distributions for tests 3 and 4; PROC FREQ DATA=grades; TABLES t3 t4; *Using proc gchart and gplot to make charts and scatterplots; *Note the use of title3 SAS will place this under your first two titles; PROC GCHART DATA=grades; VBAR t3; TITLE3 'Historgram of test 3'; PROC GPLOT DATA=grades; PLOT t4 * t3; TITLE3 'Scatterplot of test 4 against test 3'; *Let s turn off the title3; TITLE3; *PROC CORR generates a correlation matrix of the variables you specify; * it also supplies means and standard deviations for these variables; PROC CORR DATA=grades; VARIABLES T1 T2 T3 T4; * Note the NOMISS keyword in the next correlation procedure this requests casewise/listwise deletion of missing data; PROC CORR DATA=GRADES NOMISS; VARIABLES t1 t2 t3 t4; *Using the DATA and SET commands to add or recode variables, and to create subdatasets; * First create a new dataset called GRADES2, and recode females from a 2 to a 0; DATA grades2; SET grades; IF sex=2 THEN sex=0; * Create 2 subdatasets one for males and one for females; DATA female; SET grades2; If sex=0; DATA male; SET grades2; If sex=1;
* Create a new dataset, grades3, that contains only each person s sex and their test scores use KEEP= ; * Note that KEEP immediately follows SET, no semi-colon in between; DATA grades3; SET grades2 (KEEP = sex t1 t2 t3 t4); * PROC SORT is very useful it allows you to run separate analysis for subsets of your data without creating separate datasets. You must first sort by the variable, the values for which you want separate analysis. In the following example I sort by sex, then request means (and SDs) for the 4 test grades, for males and females separately. This procedure generalizes to many procedures (e.g., running separate regression analyses for males and females; PROC SORT DATA=grades2; BY sex; PROC MEANS DATA=grades; VAR t1 t2 t3 t4; BY sex; * Alternatively, I could have requested means for the sex-specific datasets I created earlier; PROC MEANS DATA=male; VAR t1 t2 t3 t4; PROC MEANS DATA=female; VAR t1 t2 t3 t4; * Merging 2 datasets Imagine that when you entered your data, you had 2 research assistants entering 2 different questionnaires (from the same set of subjects) into separate data files. You could read them in separately, and then merge them into a single dataset in SAS using the MERGE statement. You must have a common case identification variable in both datasets you will use that variable to identify subjects. You must sort both datasets by the case id variable. You then create a new dataset that combines the originals. Be careful, if you have variables that are named the same in each dataset, variables will overwrite each other. The following is an example; DATA ques1; INFILE a:\ questionnaire1.txt ; INPUT id 1-3 q1 4 q2 5 q3 6 q4 7 q5 8; DATA ques2; INFILE a:\ questionnaire2.txt ; INPUT id 1-3 q6 4 q7 5 q8 6 q9 7 q10 8; PROC SORT DATA=ques1; BY id; PROC SORT DATA=ques2; BY id; DATA combined; MERGE ques1 ques2; BY ID; * Writing out an ascii dataset from SAS with the PUT statement. If you want to write out a dataset (or part of one), you can use the following code. The following code writes out a file called NEWDATA with all 10 questionnaire items. The output file will have the suffix.dat. Once you have run the code, check the LOG to see where SAS put the file (usually in the SAS folder in PROGRAM FILES on your C: drive). DATA outfile; SET combined; FILE newdata; PUT @1 id 1-3 q1 4 q2 5 q3 6 q4 7 q5 8 q6 9 q7 10 q8 11 q9 12 q10 13;