STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS

Similar documents
Tasks Menu Reference. Introduction. Data Management APPENDIX 1

Lecture 1 Getting Started with SAS

Please login. Take a seat Login with your HawkID Locate SAS 9.3. Raise your hand if you need assistance. Start / All Programs / SAS / SAS 9.

SAS Training Spring 2006

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Choosing the Right Procedure

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

Techdata Solution. SAS Analytics (Clinical/Finance/Banking)

1 Introducing SAS and SAS/ASSIST Software

STAT:5400 Computing in Statistics

JMP Book Descriptions

Chapter 1 Introduction. Chapter Contents

Please login. Procedures for Data Insight. overview. Take a seat at one of the work stations Login with your HawkID Locate SAS 9.3 in the Start Menu

Introduction to SAS OnDemand for Academics: Enterprise Guide. Handout

The SAS interface is shown in the following screen shot:

STAT 7000: Experimental Statistics I

Getting Started with SAS/ASSIST 9.1. SAS Documentation

Using an ICPSR set-up file to create a SAS dataset

Advanced Analytics with Enterprise Guide Catherine Truxillo, Ph.D., Stephen McDaniel, and David McNamara, SAS Institute Inc.

Choosing the Right Procedure

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Starting SAS. 2. Click START, ALL PROGRAMS, SAS, and the SAS ICON

Creating a data file and entering data

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

SAS Studio: A New Way to Program in SAS

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

Getting Started with JMP at ISU

ScholarOne Manuscripts. COGNOS Reports User Guide

Nuts and Bolts Research Methods Symposium

SAS (Statistical Analysis Software/System)

ST Lab 1 - The basics of SAS

Technical Support Minitab Version Student Free technical support for eligible products

AURA ACADEMY SAS TRAINING. Opposite Hanuman Temple, Srinivasa Nagar East, Ameerpet,Hyderabad Page 1

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

April 4, SAS General Introduction

APPENDIX 2 Customizing SAS/ASSIST Software

UNIT 4. Research Methods in Business

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Applied Regression Modeling: A Business Approach

Chapter One: Getting Started With IBM SPSS for Windows

Dr. Barbara Morgan Quantitative Methods

SAS CURRICULUM. BASE SAS Introduction

Index COPYRIGHTED MATERIAL. Symbols and Numerics

Base and Advance SAS

PART I: USING SAS FOR THE PC AN OVERVIEW 1.0 INTRODUCTION

Learn What s New. Statistical Software

SPSS: AN OVERVIEW. V.K. Bhatia Indian Agricultural Statistics Research Institute, New Delhi

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

Introduction to STATA

Applied Regression Modeling: A Business Approach

SAS/ASSIST Software Setup

Chapter 2 The SAS Environment

An Introduction to the R Commander

Minitab detailed

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Chapter 1 Introduction. Chapter Contents

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

SAS Enterprise Guide. Kathleen Nosal Yarmouth Greenway Drive Madison, WI (608)

SPSS Modules Features

SAS Training BASE SAS CONCEPTS BASE SAS:

Introduction to Nesstar

Introduction to SAS: General

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Making Tables and Figures

STATA 13 INTRODUCTION

Basic concepts and terms

Information Visualization

Introductory Guide to SAS:

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

8. MINITAB COMMANDS WEEK-BY-WEEK

SAS High-Performance Analytics Products

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

IBM SPSS Categories 23

Construction IC User Guide

Minitab 18 Feature List

Chapter 41 SAS/INSIGHT Statements. Chapter Table of Contents

Forfattere Intro to SPSS 19.0 Description

Going Under the Hood: How Does the Macro Processor Really Work?

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

An Introduction to Stata Exercise 1

SAS/STAT 13.1 User s Guide. The NESTED Procedure

STATISTICS (STAT) Statistics (STAT) 1

An alternative view.

Data Visualisation with SASIINSIGHT Software. Gerhard Held SAS Institute. Summary. Introduction

SAS Application Development Using Windows RAD Software for Front End

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

SYSTEM 2000 Essentials

JMP 10 Student Edition Quick Guide

Intermediate SAS: Statistics

Applied Regression Modeling: A Business Approach

Step through Your DATA Step: Introducing the DATA Step Debugger in SAS Enterprise Guide

CERTIFICATE IN BIG DATA TECHNIQUES ON SMALL DATA FORMAT UTILIZING MICROSOFT EXCEL

General Guidelines: SAS Analyst

KEY MARKET DATA AND FORECASTS: TECHNOLOGIES: APPLICATIONS: GEOGRAPHIES:

Organizing Your Data. Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013

To finish the current project and start a new project. File Open a text data

JMP Clinical. Release Notes. Version 5.0

Transcription:

STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS

What is SAS? SAS (originally an acronym for Statistical Analysis System, now it is not an acronym for anything) is a program designed to perform analysis on large sets of numerical and character data. Pronounced sass, not spelled out as three letters. Developed in the early 1970 s at North Carolina State University. In 1976, The SAS Institute Inc., a privately held corporation was formed. It grew in popularity and capability and was used in academic groups. 1

What is SAS? SAS can be used without knowing much about programming but it is also a very sophisticated language and more can be done with it. SAS was first developed to be a programming language for statisticians and data analysts. Originally intended for management and analysis of agricultural field experiments. 2

What is SAS? SAS has grown into the world s largest privately held software company. SAS is now located in Cary, North Carolina. It is a world-wide company with business in Asia, Pacific and Latin America, Europe, Middle East and Africa. SAS also has a good employee retention rate of 96%. It also is a family oriented company and is friendly to working women 3

What is SAS? SAS is now one of the the most widely used statistical software. Continual product line expansion and diversification of clientele have resulted in SAS products being used by over 40,000 customer sites in 50 countries. There are 3.5 million users of SAS products. Part of the reason for the continual growth is that the SAS Institute works with the end user to improve its product. It offers solutions for data warehousing, data mining, data visualization, and applications development. 4

What is SAS? The SAS System is an applications system that can be used as a statistical package a data base management system a high level programming language An applications system is software that gives you the tools you need to make the data useful and meaningful. In order to be useful, an applications system should give you total control of your data, facilitate applications that run in more than one computing environment, and accommodate varying skill levels of potential users. 5

What is SAS? SAS is able to run on a variety of platforms and SAS is also portable across computing environments. A computing environment is determined by the HARDWARE and the host OPERATING SYSTEM running it. SAS can be used on IBM mainframes, UNIX based machines, on personal computers using Windows. Portability means that SAS applications: Function the same Look the same Produce the same results You can develop SAS applications in one environment and run them in other environments without rewriting the programs. 6

Modes for Running SAS SAS can be run in a variety of styles, or modes, depending on what type of operating system it is being run on. The modes most often used include: Batch Mode: user writes whole SAS programs, saves them into a file, then runs SAS from a command line prompt. Interactive Line Mode: user enters commands line by line in response to prompts issued by the SAS System. 7

Modes for Running SAS Interactive window mode (SAS Display Manager System): user interacts with SAS through Windows using pull-down menus, dialog boxes and icons. this is the version used on Windows and Macintosh. SAS Enterprise Guide: SAS Enterprise Guide software runs only under Windows It can write SAS code for you through its extensive menu system. 8

How does SAS work? With any body of data, you must perform four basic tasks to make it useful and meaningful. ACCESS First, you access the data through the SAS system MANAGE Update, rearrange, combine, edit, or subset data before analyzing ANALYZE Ranges from simple descriptive statistics to more advanced or specialized analyses for econometrics and forecasting, statistical design, computer performance evaluation, and operations research PRESENT Presentation capabilities range from simple list and tables to multidimensional plots to elaborate full-color graphics, both on paper and on your display. 9

How does SAS work? A SAS program is a sequence of statements executed in order. A statement gives information or instructions to SAS and must be appropriately placed in the program. SAS is very lenient about the format of its input statements can be broken up across lines, multiple statements can appear on a single line, and blank spaces and lines can be added to make the program more readable. The most effective strategy for learning SAS is to concentrate on the details of the data step, and learn the details of each procedure as you have a need for them. 10

SAS Windows There are five basic SAS windows: Results and Explorer windows, and three programming windows: Editor, Log, and Output. There are also many other SAS windows that you may use for tasks such as getting help, changing SAS system options, and customizing your SAS session. Results: The Results window is like a table of contents for your Output window; the results tree lists each part of your results in an outline form. Explorer: The Explorer window gives you easy access to your SAS files and libraries. 11

SAS Windows Editor: The Editor window can use the text editor to type in, edit, and submit SAS programs as well as edit other text files such as raw data files. Log: The Log window contains notes about your SAS session, and after you submit a SAS program, any notes, errors, or warnings associated with your program as well as the program statements themselves will appear in the Log window. Output: If your program generates any printable results, then they will appear in the Output window. 12

SAS Windows In Windows operating environments, the default editor is the Enhanced Editor. The Enhanced Editor is syntax sensitive and color codes your programs making it easier to read them and find mistakes. Green: Comments Dark Blue: Keywords in major SAS commands Blue: Keywords that have special meaning as SAS commands Yellow Highlight: Data Red: Statements that SAS does not understand The Enhanced Editor also allows you to collapse and expand the various steps in your program. For other operating environments, the default editor is the Program Editor whose features vary with the version of SAS and operating environment. 13

General Syntax and Rules SAS statements may be in upper or lower case and may begin on any column. SAS statements always end with a semicolon (;). SAS statements may also extend across lines, and more than one SAS statement may appear on a single line. SAS variable names must be 32 characters or less, constructed of letters, digits and the underscore character. The first character must be an English letter (A, B, C,..., Z) or underscore ( ). Subsequent characters can be letters, numeric digits (0, 1,..., 9), or underscores. Characters such as dashes and spaces are not allowed. 14

General Syntax and Rules Its a good idea not to start variable names with an underscore, because special system variables are named that way. Data set names follow similar rules as variables, but they have a different name space. There are virtually no reserved keywords in SAS; its very good at figuring things out by context. SAS is not case sensitive, except inside of quoted strings. Missing values are handled consistently in SAS, and are represented by a period (.). Each statement in SAS must end in a semicolon (;). 15

General Syntax and Rules To make your programs more understandable, you can insert comments into your programs. Comments are usually used to annotate the program, making it easier for someone to read your program and understand what you have done and why. It doesnt matter what you put in your comments, SAS will not look at it. There are two styles of comments you can use: one starts with an asterisk (*) and ends with a semicolon (;). The other style starts with a slash asterisk (/*) and ends with an asterisk slash (*/). 16

Getting Help The bulk of SAS documentation is available online, at http://support.sas.com/documentation/onlinedoc/ A catalog of printed documentation available from SAS can be found at http://support.sas.com/publishing/ Online help: Type help in the SAS display manager input window. Sample Programs, distributed with SAS on all platforms. SAS Institute Home Page: http://www.sas.com SAS Institute Technical Support: http://support.sas.com/resources/ 17

Getting Help Searchable index to SAS-L, the SAS mailing list: http://www.listserv.uga.edu/archives/sas-l.html Michael Friendlys Guide to SAS Resources on the Internet: http://www.math.yorku.ca/scs/statresource.html#sas Brian Yandells Introduction to SAS: http://www.stat.wisc.edu/~yandell/software/sas/intro.html 18

Two Parts of a SAS Program There are two main components to most SAS programs DATA steps: create SAS data sets, read in, manipulated and edited data. PROC steps: process SAS data sets (creating reports, graphs, editing data, sorting data, etc.) and can also create data sets. A typical program starts with a DATA step to create a SAS data set and then passes the data to a PROC step for processing. For example: Raw data and/or a pre-existing SAS data set are read into a SAS DATA step, turned into a SAS data set, altered or analyzed by a PROC step and then the results are displayed in a report. 19

DATA steps: Getting data into a SAS There are three ways of getting data into a SAS data set. 1. Including the data in the SAS command stream The data are like a card deck placed into the stream of SAS commands. Use an INPUT command to list the variables and a CARDS statement right before the data to be read in. Example: DATA CARDSIN; INPUT IDNUM SEX AGE; CARDS; 1 1 25 2 2 33 4 1 55 20

DATA steps: Getting data into a SAS 2. Read the data in from a disk file. Use the INFILE command to name the disk area with the data Then use the INPUT command to list the variables. Example: DATA DISKIN; INFILE RAWDATA.DAT ; INPUT IDNUM SEX AGE; 21

DATA steps: Getting data into a SAS 3. Create a new data set from an existing SAS data set. Here, the SET command is used to name the existing SAS data set. Example: creates two new SAS data sets from an existing SAS data set: DATA FATHERS MOTHERS; SET DISKIN; IF SEX=1 THEN OUTPUT FATHERS; ELSE OUTPUT MOTHERS; 22

PROC steps: Data Management PROC SORT Sorts a data set by one or more variables. PROC SORT; BY ID; will sort the data set by the values of the variable ID. PROC CONTENTS Displays the contents of the data set. PROC DATASETS Manages SAS data set libraries. PROC RANK Rank orders one or more variables. PROC STANDARDIZE Rescales variables to a specified mean and/or standard deviation. 23

PROC steps: Data Management PROC SCORE Generates linear scores for certain procedures like factor analysis and discriminant analysis. PROC TRANSPOSE Transposes a data set. 24

PROC steps: Descriptive Statistics PROC FREQ Simple frequencies and contingency tables for categorical variables. PROC MEANS Number of observations, mean, standard deviation, and minimum and maximum values for continuous variables. PROC UNIVARIATE More detailed descriptive statistics for continuous variables. PROC TABULATE Produces tables of frequencies and/or descriptive statistics. 25

PROC steps: Descriptive Statistics PROC SUMMARY Descriptive statistics broken down by groups; particularly useful for generating a data set of descriptive statistics for input into other procedures. PROC CORR Parametric and nonparametric correlations. 26

PROC steps: Regression PROC REG General purpose linear regression and multivariate regression. PROC GLM General linear models, including regression, analysis of variance/covariance, and multivariate analysis of variance/covariance. PROC RSQUARE All possible subsets of regression. PROC RSREG Quadratic response surface regression. PROC LOGISTIC Logistic regression. PROC PROBIT Probit regression. 27

PROC steps: ANOVA, Graphics Analysis of Variance PROC ANOVA Analysis of variance for orthogonal data. PROC GLM General linear models, including regression, analysis of variance, and multivariate analysis of variance. PROC NESTED Nested analysis of variance. PROC VARCOMP Variance components. Low Resolution Graphics PROC CHART Pie, bar, and star charts. PROC PLOT Two dimensional plots. 28

PROC steps: Multivariate Analysis Discriminant Analysis PROC DISCRIM General purpose parametric and nonparametric discriminant analysis. PROC CANDISC Canonical discriminant analysis. Principal Components and Factor Analysis PROC PRINCOMP Principal components. PROC FACTOR Factor analysis. 29

PROC steps: Multivariate Analysis Cluster Analysis PROC CLUSTER Clustering observations. PROC FASTCLUS Disjoint clustering for large data sets. PROC VARCLUS Clustering variables. Survival Analysis PROC LIFETEST Nonparametric and life tables. PROC LIFEREG Parametric survival analysis. 30