Introduction to SAS Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC cmurray-krezan@salud.unm.edu 20 August 2018
What is SAS? Statistical Analysis System, created in 1976 at NC State for agricultural data analysis A consortium of eight universities with major research funding from the USDA realized the importance of such software. They obtained a grant from NIH to further develop the software, and SAS was born. Widely used in many disciplines including statistics, health sciences, business, and economics.
SAS vs. Other Software Command-driven vs. menu-driven Flexibility comes from using SAS language to write programs. Other software you may use: SPSS Stata Minitab Matlab
Components of SAS Programs DATA steps Here you can: Read in data Manipulate data PROC steps Here you can: Analyze the data Create tables of output
The SAS Environment Five windows: 1. Editor where you write your program (commands). 2. Log log of success of the submitted command. 3. Output display of your statistical results. 4. Explorer a directory for your libraries. 5. Results a listing of all submitted PROC steps.
Where Your Data Will Live Library This is created to refer to permanent data sets(such as your Excel file, or other permanent data set). You specify the directory and then SAS knows where to get the data, or where to put permanent data sets. Use libname statement to name your library and specify the directory.
Types of Data Sets Temporary data sets Stored in the Work library. Created while running your program. Cease to exist when you close SAS. Permanent data sets Stored in a library that you define. Continue to exist after SAS is closed. A data set that you are reading into SAS Can pretty much be any file type. A data set that you export out of SAS Can export into pretty much any file type.
The LIBNAME Statement Example Syntax: libname sasdata C:\cristina\Pharm547 ; Notes: Your library name (called a libref in SAS syntax) must be 8 characters in length. All SAS statements must end with ;.
Ways to Read Your Data into SAS Import Wizard from drop-down menu: Go to File > Import Data Select your data file type Select your data set Give your temporary data set a name SAS can generate the code used to perform the import. Just select a directory where the code should be output. PROC IMPORT is the procedure used in the code. I recommend doing this.
Ways to Read Your Data into SAS (continued) For a very small dataset, or test data, you can input the data in the DATA step using the datalines statement (a.k.a. cards ). Example data mydata; input patid $ age gender $; datalines; A1001 27 F A1002 32 M A1003 29 M A1004 29 F ; run;
Ways to Read Your Data into SAS External data sets (continued) In practice, you will most likely be using Excel or ACCESS files to read into SAS. Use Import wizard, PROC IMPORT, or the infile and input statements. Example: data mydata; infile C:\cristina\Pharm547\dataset.csv dlm =, ; input patid $ age gender $; run;
Ways to Read Your Data into SAS (continued) Large data sets obtained from national databases/registries very often come with programs you can use to read in the data to SAS. BRFSS: Can use the following to create a permanent SAS data set: SASOUT11_LLCP.SAS (this program converts the data from ASCII to SAS7DBAT) LLCP2011.ASC (this is the actual data in ASCII format) Formas11.sas (this formats the data and can put labels over the variable names)
From a Temporary Data Set to a Permanent Data Set All of the previous examples, except BRFSS, created temporary data sets (will not exist after closing SAS). Create a permanent data set for BP_Example which will be stored in the directory you assigned to what you named your library (in this case, sasdata ): data sasdata.bp_example; set bp_example; run;
Vice Versa: From a Permanent Data Set to a Temporary Data Set Create a temporary (or working ) dataset for the BRFSS data, which will now exist in the sasdata library as well as in the Work library. data brfss; set sasdata.brfss2001; run;
Accessing Your Data in SAS To access temporary data sets, use the DATA step, but omit the library name in the front. SAS stores temporary data sets in the library Work. You can refer to the data set as Work.dataset, but by default SAS assumes the Work. -part unless you specify differently, so you don t have to add it.
Accessing Your Data in SAS Examples: data brfss2; set brfss; run; (continued) SAS is thinking of it like: data work.brfss2; set work.brfss; run;
The DATA Step All DATA steps use the following syntax: data <new dataset name>; set <dataset name>; run; NOTE: Every statement ends with a ;. Every step ends with a run;.
Things You Can Do with the Create new variables. DATA Step Change the variable type. e.g., from numeric to character or vice versa. Drop, keep, rename variables. Output to a new temporary or permanent data step. Format the data.
The PROC Step DATA steps are used to read and modify data whereas PROC steps are used to analyze data. All PROC steps use the following syntax: proc <procname> data = <dataset> run;
Commonly Used PROC Steps CONTENT lists the contents of your data set, such as all the variables, whether they are character, numeric, their assigned formats, etc. SORT sorts your data by the variable(s) that you specify. SUMMARY provides basic summary statistics for your data, such as n, means, standard deviations, etc.
Commonly Used PROC Steps (continued) FREQ create counts of categorical variables with specific features and contingency tables (2x2 or greater). Also calculated associated statistics (e.g., chi-square). MEANS calculate means, CIs, etc. of continuous variables and associated statistics. TTEST conduct a two-sample t-test between two continuous variables. REG perform simple or multiple linear regression.
Many PROCS Use the Following Statements by perform commands by certain groups, such as calculate the mean age by gender. class lets SAS know to treat a variable in the class statement as a categorical variable. var tells SAS on which variables to perform requested calculations. output can output the working data set that SAS creates in the background that may contain calculations of interest.
More about the PROC Steps There are many specific statements for each PROC step they are not all the same nor are the always consistent. Don t forget that each statement must end with a semicolon.
Outputting Permanent Data Sets You may want to create a new permanent data set from your original. For example, you may want a subset of variables from the BRFSS data set for your project. You can use the DATA step: data sasdata.mydata; set mydata; run;
Outputting Permanent Data Sets (continued) Use PROC EXPORT (similar to PROC IMPORT). Use the Export Wizard in the drop-down menu under file. NOTE: The DATA step only outputs a SAS data set (in the way I ve shown you). PROC EXPORT or the Export Wizard can output to just about any file type.
A Few More Things about SAS before You Jump in SAS Help is your friend!! Access by either clicking on the book with the question mark or on the Help link and selecting SAS Help and Documentation.
A Few More Things about SAS before You Jump in (continued) The documentation contains almost everything (and often more) that you may want to know, such as all the statements and syntax particular to a given PROC. It also provides detailed discussions about the statistical procedures it uses and how they are implemented. A plethora of information, may be a bit terse for some.
Good SAS Resources UCLA s Statistical Computing website: https://stats.idre.ucla.edu/ Delwiche & Slaughter, The Little SAS Book: A Primer, 5 th Ed. (2012). Cody & Smith, Applied Statistics and the SAS Programming Language, 5 th Ed. (2005). The internet!
Now you are ready to program!