Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

Similar documents
Introduction to STATA

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

SAS Training Spring 2006

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

Choosing the Right Procedure

Choosing the Right Procedure

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Advanced Regression Analysis Autumn Stata 6.0 For Dummies

STA 570 Spring Lecture 5 Tuesday, Feb 1

Introduction to Stata: An In-class Tutorial

ST Lab 1 - The basics of SAS

Introduction to Statistical Analyses in SAS

Applied Regression Modeling: A Business Approach

STATA 13 INTRODUCTION

An Introductory Guide to Stata

After opening Stata for the first time: set scheme s1mono, permanently

Applied Regression Modeling: A Business Approach

Intermediate SAS: Statistics

Applied Regression Modeling: A Business Approach

Contents of SAS Programming Techniques

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

An Introduction to Stata Part I: Data Management

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Dr. Barbara Morgan Quantitative Methods

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1

2016 SPSS Workshop UBC Research Commons

An Introduction to Stata Part II: Data Analysis

A quick introduction to STATA:

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Reading data in SAS and Descriptive Statistics

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

A Short Guide to Stata 10 for Windows

Intro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis.

Introduction to STATA

A Quick Guide to Stata 8 for Windows

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

STM103 Spring 2008 INTRODUCTION TO STATA 8.0

Revision of Stata basics in STATA 11:

INTRODUCTION TO THE SAS ANNOTATE FACILITY

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

An introduction to SPSS

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

Please login. Take a seat Login with your HawkID Locate SAS 9.3. Raise your hand if you need assistance. Start / All Programs / SAS / SAS 9.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Bivariate (Simple) Regression Analysis

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

Using Large Data Sets Workbook Version A (MEI)

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Math 227 EXCEL / MEGASTAT Guide

Tips to Customize SAS/GRAPH... for Reluctant Beginners et al. Claudine Lougee, Dualenic, LLC, Glen Allen, VA

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

ECONOMICS 452* -- Stata 12 Tutorial 1. Stata 12 Tutorial 1. TOPIC: Getting Started with Stata: An Introduction or Review

Introduction to Stata. Written by Yi-Chi Chen

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

EXAMPLE 2: INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT

Chapter 4: Analyzing Bivariate Data with Fathom

Multiple Regression White paper

StatLab Workshops 2008

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

An Example of Using inter5.exe to Obtain the Graph of an Interaction

Stata: A Brief Introduction Biostatistics

Introduction to SAS: General

Lab 4 Importing Data and Basic Graphs

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

INTRODUCTION TO SPSS OUTLINE 6/17/2013. Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN Phone:

It s Not All Relative: SAS/Graph Annotate Coordinate Systems

IPUMS Training and Development: Requesting Data

Math 121 Project 4: Graphs

Enterprise Miner Version 4.0. Changes and Enhancements

Getting started with Stata 2017: Cheat-sheet

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

Cell means coding and effect coding

A Short Guide to Stata 14

Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp )

1 Downloading files and accessing SAS. 2 Sorting, scatterplots, correlation and regression

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

GETTING DATA INTO THE PROGRAM

Introduction: EViews. Dr. Peerapat Wongchaiwat

Online Supplementary Appendix for. Dziak, Nahum-Shani and Collins (2012), Multilevel Factorial Experiments for Developing Behavioral Interventions:

Forfattere Intro to SPSS 19.0 Description

STAT:5400 Computing in Statistics

WORKSHOP: Using the Health Survey for England, 2014

A quick introduction to STATA:

Preparing for Data Analysis

Box-Cox Transformation for Simple Linear Regression

Intermediate SAS: Working with Data

The Power and Sample Size Application

San Francisco State University

DATA. Business Statistics

April 4, SAS General Introduction

Transcription:

Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Introduction to SAS Workshop Objective This workshop is designed to give a basic understanding of how SAS procedure works and how to use simple statistical methods to analyze data. Learning Outcomes 1. Understanding the basics 2. Managing data 3. Computing descriptive statistics 4. Creating basic graphs 5. Running linear regressions Scenario: We are interested in studying the relationship between the Gross Domestic Product (GDP) per capita of a country and how corrupt the country is. To start investigating this, we will examine the relationship between nations scores on the Corruption Index and their GDP per capita. I. Understanding the basics In this section, we introduce a few basic but very helpful commands. Brief Intro to the SAS Interface: Editor Window The Editor window is the place where the user programs the SAS procedures. It is equivalent to the do file in Stata or the syntax file in SPSS. The user can run a specific parts of the code by highlighting them and pressing F5 or clicking on the RUN button. Explorer Window

The explorer window allows the user to browse through the active libraries and the content of the system. Output Window The output window contains the output from the various procedures SAS Procedures A SAS procedure is a collection of lines of code that carry out the statistical analysis. The procedure is always preceded by the keyword proc and followed by the name of the procedure. Once all statements have been included in the body of the procedure, the procedure is finalized with an end statement. SAS Data Steps This is the part of the code where the data is created/imported. Importing Data There are several ways in which the user can import a data set. In this example we illustrate two ways. To import a data set, the user can use the built-in import wizard by clicking on the following: File > Import Data > (select Comma Separated Values) > Next > Browse (to the location of the file) > Next > Type in the name of your SAS data set Alternatively the user can use the proc import procedure in SAS. This can be done by typing the following code in the Editor window: libname SASINTRO "C:\data\"; PROC IMPORT OUT= SASINTRO.nations DATAFILE="J:\CLASSES\Workshops\cs.dta" DBMS=dta REPLACE; RUN; The SASINTRO library name is map to C:\ drive. This can be map to any desired folder. The out parameter specifies the library where the data set should be saved. The DBMS parameter specifies the type of the dataset. In this case, we are importing a STATA dataset (.dta). Program-File We recommend that you create a program file before you start working in SAS. A Program-file allows you to record and save all of your command lines so that you can repeat those steps in the future. File, New Program Save Program File to G:\ or any other drive

II. Descriptive statistics In this section, we will demonstrate how to obtain information on descriptive statistics for one or more variables. A full description of all variables in the data can be obtained with the print/contents procedure: /*proc "print" will print out to the output window*/ proc print data=sasintro.nations; /*proc "contents" will allows you to inspect the data in more details (type, format, etc...)*/ proc contents data=sasintro.nations; To generate a table of basic descriptive statistics, such as the number of observations, mean, standard deviation, minimum, and maximum use the commands: *all variables; proc means data=sasintro.nations; var ; *only two variables; proc means data=sasintro.nations; var corrup gdppc; SAS displays each variable name, along with its descriptive statistics in table form. To insert these results into another document, highlight the table, right-click, and select any of copy options. The results can now be pasted into another document. Recode and Generate a new variable Most of the times we are interested how the percentage change in one variable affected another variable. In those cases, we usually take the log of that variable. We will generate the log of GDP per capita (gdppc). This can be done in Data Step. data SASINTRO.nations; set SASINTRO.nations; log_gdppc = log(gdppc); Making a frequency table of a continuous variable is not very useful. If we want a good frequency table, we should recode. Look at the description for the variable (corrup) you want to recode. This can be done using Freq procedure.

proc freq data=sasintro.nations; tables corrup; We will recode this variable into three categories. Additionally, we will generate a new variable for the recoded data and provide labels for the new values in the Data Step. data SASINTRO.nations; set SASINTRO.nations; format corrup_categ; if (corrup>=0) and (corrup<=3.3) then corrup_categ = 1; else if (corrup>3.3) and (corrup<=6.6) then corrup_categ = 2; else if (corrup>6.6) and (corrup<=10) then corrup_categ = 3; label corrup_categ = 'Corruption Index in 3 categories'; proc format; value corrup_categ 1 = 'High' 2 = 'Medium' 3 = 'Low' ; proc freq data=sasintro.nations; tables corrup_categ; We can also assign a label to the variable itself to help us remember what it is. Format procedure is also helpful for formatting the values of the variables. Finally, it is common to transform categorical variables into dummy variables in order to better capture the differences associated with being in one of those categories. To do this type: DATA SASINTRO.nations; set SASINTRO.nations; ARRAY dummys {*} 3. corrup_categ1 - corrup_categ3; DO i=1 TO 3; dummys(i) = 0; END; if corrup_categ ne. then dummys(corrup_categ) = 1; RUN; PROC FREQ DATA=SASINTRO.nations; TABLES corrup_categ*corrup_categ1*corrup_categ2*corrup_categ3/ list ; RUN;

III. Creating Basic Graphs In this section, we demonstrate how to generate a histogram and a scatter plot. A histogram is used to show the frequency distribution of a variable. Let s generate a histogram using the original variable and the log-transformation we just created (gdppc, log_gdppc) proc univariate data=sasintro.nations; var gdppc log_gdppc; histogram; Title 'GDP per Capita and Log-transformation'; Suppose you are curious about the relationship between daily Log of GDP per capita and the Corruption(corrupt). Let s create a scatter plot to help visualize the relationship between these two variables. We will also include a trend line and title the graph to help us understand what we are seeing. proc gplot data=sasintro.nations; /* Define the axis characteristics */ axis1 label=("corruption"); axis2 label=(angle=90 "Log of GDP per Capita") minor=(n=4); /* Define the symbol characteristics for the scatter plot groups */ symbol1 interpol=none value=dot color=depk; /* Define the symbol characteristics for the regression line */ symbol2 interpol=rl value=none color=black; plot log_gdppc*corrup/haxis=axis1 vaxis=axis2; plot2 log_gdppc*corrup/noaxis; title1 'GDP per Capita and Corruption Relationship'; quit; As we can see, there is a positive relationship between these two variables.

IV. Bivariate Analysis T-test A t-test is the most basic way to compare two groups together, based on their scores on a single variable. While we divided the Corruption Index into three groups, you can only compare two of those groups at a time using a t-test. Let s compare the countries with the most and least corrupted (group 1 and 3) proc ttest data=sasintro.nations; class corrup_categ; var log_gdppc; where corrup_categ~=2; In the Output, look for the t, which gives the value of the t-test, degrees of freedom, and the p-value or Pr> t, which tells you the probability of getting that result. Remember that, in general, you are looking for a probability value less than.05. Correlation The correlation coefficient is a measure of association between two variables. The sign indicates inverse versus directly proportional relationships. 0 would be a nonrelationship, while a value of 1 would be a perfect positive relationship, and -1 would be a perfect inverse relationship. Let s look at the correlation between Log of GDP per capita, Corruption Index proc corr data=sasintro.nations; var log_gdppc corrup_categ; Your output will be a 2x2 table, but only one cell is relevant. Use either cell that shows the relation between your two variables (there are two such identical cells). V. Running Linear Regressions In this section, we show you how to run a simple regression in order to understand the relationship between log of gdp per capita and corruption. proc reg data=sasintro.nations; model log_gdppc = corrup; SAS displays the regression results in table form. The important information provided is the Adj R-squared, coef. and t/p> t. Let s run this regression again with robust standard errors by using procedure robustreg proc robustreg data=sasintro.nations; model log_gdppc = corrup;