ST Lab 1 - The basics of SAS

Similar documents
Applied Regression Modeling: A Business Approach

(on CQUEST) A.L. Gibbs

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV

A Step by Step Guide to Learning SAS

Lab #9: ANOVA and TUKEY tests

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

INTRODUCTION TO SAS STAT 525 FALL 2013

Applied Regression Modeling: A Business Approach

Introductory Guide to SAS:

Introduction to Statistical Analyses in SAS

Some Basics of CQUEST

SAS Training Spring 2006

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

(on CQUEST) A.L. Gibbs

Chapter 2 The SAS Environment

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Introduction to Stata: An In-class Tutorial

1 Downloading files and accessing SAS. 2 Sorting, scatterplots, correlation and regression

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Reading data in SAS and Descriptive Statistics

Lab #1: Introduction to Basic SAS Operations

Introduction to STATA

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

An introduction to SPSS

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

STATA 13 INTRODUCTION

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Introduction to Stata

Applied Regression Modeling: A Business Approach

Getting started with Stata 2017: Cheat-sheet

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

STAT:5400 Computing in Statistics

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Introduction to SAS: General

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

A. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.

Multiple Regression White paper

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Spreadsheet and Graphing Exercise Biology 210 Introduction to Research

R Commander Tutorial

The SAS interface is shown in the following screen shot:

Week 4: Simple Linear Regression III

Using an ICPSR set-up file to create a SAS dataset

0 Graphical Analysis Use of Excel

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something)

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

Introductory SAS example

Level I: Getting comfortable with my data in SAS. Descriptive Statistics

An Introduction to Stata Exercise 1

STAT 7000: Experimental Statistics I

Lab #3: Probability, Simulations, Distributions:

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Welcome to class! Put your Create Your Own Survey into the inbox. Sign into Edgenuity. Begin to work on the NC-Math I material.

A whirlwind introduction to using R for your research

Dr. Barbara Morgan Quantitative Methods

Introduction to gretl

Lab 07: Multiple Linear Regression: Variable Selection

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Barchard Introduction to SPSS Marks

STA9750 Lecture I OUTLINE 1. WELCOME TO 9750!

TIPS AND TRICKS: IMPROVE EFFICIENCY TO YOUR SAS PROGRAMMING

PART I: USING SAS FOR THE PC AN OVERVIEW 1.0 INTRODUCTION

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Stat 5100 Handout #14.a SAS: Logistic Regression

A quick introduction to First Bayes

Part B. EpiData Analysis

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Intermediate SAS: Working with Data

General Guidelines: SAS Analyst

= 3 + (5*4) + (1/2)*(4/2)^2.

Introduction to Stata. Written by Yi-Chi Chen

Intro to Stata for Political Scientists

Introduction to Minitab 1

Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp )

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany

Stat 302 Statistical Software and Its Applications SAS: Data I/O

StatLab Workshops 2008

Biostatistics & SAS programming. Kevin Zhang

1. Setup Everyone: Mount the /geobase/geo5215 drive and add a new Lab4 folder in you Labs directory.

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

Our Changing Forests Level 2 Graphing Exercises (Google Sheets)

AMS Lab # 1. Prof. Wei Zhu

A First Tutorial in Stata

Chapter One: Getting Started With IBM SPSS for Windows

Error Analysis, Statistics and Graphing

Barchard Introduction to SPSS Marks

SAS Online Training: Course contents: Agenda:

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

INTRODUCTION SAS Prepared by A. B. Billings West Virginia University May 1999 (updated August 2006)

Finance Data: Datastream. An Introduction Guide. Table Of Content. 1. Introduction When do I use Datastream 2

Agenda. - Final Project Info. - All things Git. - Make sure to come to lab for Python next week

CS106 Lab 1: Getting started with Python, Linux, and Canopy. A. Using the interpreter as a fancy calculator

Enterprise Miner Version 4.0. Changes and Enhancements

5. Key ingredients for programming a random walk

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Intermediate SAS: Statistics

Transcription:

ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc corr. Today we will start with the basics: The SAS interface, reading in data, and running a few procedures. You can download SAS for your personal computer - See http://sas.ncsu.edu/ (It requires more than Windows Home though.) The SAS Interface There are four main windows in the SAS environment: The Program Editor Window This is where you spend most of your time working in SAS, writing your program in the editing window. Note: SAS is not case sensitive. The Log Window Once you execute your program, SAS will report back to you in this window. Error messages, notes about your dataset, and warning messages will appear here. Don t underestimate the importance of this window remember to look here each and every time you execute a SAS program. The Output Window The output requested in your SAS program will appear in this window. Remember that output may be generated even when errors are present in your program! The Results Window The output is listed by section here. Click on an item and you are taken to that place in the output. The explorer subtab allows you to keep track of your libraries and their contents. 1

Note: to end a line in SAS, a semicolon is used! First Lines and Reading in Data Now we are ready to read in some data. There are a few options to read in data: 1. Copy and Paste the data into the Program Editor Window with the correct code before and after. 2. Use the SAS import wizard 3. Use SAS commands that call a file Let s get started! 1. Copy and Paste method: Go to the wolfware page, Lab section. Here you will find a file called soilwater.dat. Open the link and copy and paste the data into the program editor after your options command. To create data in SAS we use the data command (called a data step). data name; input variable1 variable2...; datalines; # #... # #...... # #... ; Note, if one of our variables was non-numeric (e.g. had values A B etc.) We would need to put a $ after the variable name in the input statement (input variable1 $ variable2... declares variable1 to be a character variable). Create the data step to read in the soil water data. Don t paste in the column names! Once you have the code, highlight the part you want to run (include the options command the first time you run something). Now you can simply click the running man button at the top of the page (or use the SAS menus). To check if the data is read in correctly, as should always be done after submitting code, first check the log for errors. Then we can print the data out to see it. To do this we use a procedure called proc print. proc print data=name; If you ever get confused on a procedure s syntax, you can google sas proc - help. The first link should take you to SAS s very nice online documentation system. (Try it.) Highlight this section of code and run it to see your output! 2

2. Import Wizard Method: SAS has an import wizard that can read in many standard types of data files. First, go to the class website and download the mother.xls file (make sure you know where it is saving to!). Now, go to File Import Data. The import wizard will pop up. You can choose a standard source from the drop down menu. Mother.xls is an excel 95 file (SAS can t read in xlsx files). Hit next and browse to the location of the file. Hit next, now type in the name of the dataset you want to create (e.g. mother). Hit next, SAS will ask if you want to save the commands for importing the file. Hit browse and find the folder you would like to save the file to, type in the file name and hit save. Finally, hit finish. Check your log to see if there are any errors. Print the data out to check that it was read in correctly. 3. SAS commands: You can also read data in using a few commands in SAS. Find the file that contained the commands for importing a file and open it. Copy and paste the code into the program editor. In the future you can use these commands to read in the data rather than using the import wizard. You may need to change the file path however. You can set the default file path in SAS using the following: Go to Tools Options Change Current Folder. From here you can select the default folder for SAS to look in. Choose the folder with the file mother.xls. Once you do this, you can remove any directory names, e.g. DATAFILE= J:\ST 512\Labs\Mother.xls can be replaced by DATAFILE= Mother.xls There are other ways to import data in SAS such as infile. If interested, search the SAS help pages. 3

The Corr and Reg Procedures Description of the Soil Water data set: The data set contains the measured soil water content (in cm 3 /cm 3 ) of 16 soil samples at four depths (in cms). Description of the Mother data set: Weight gain of the mother during pregnancy is known to be a critical factor in determining the birth-weight of the infant. Some data collected in a study of the relationship between average weight gain and mother s age are given in the file mother.xls. Some questions we may want to answer from these types of data sets: 1. Is there an association between the two variables? 2. If so, does that association appear to be linear? 3. Can we conduct a statistical test to determine this relationship is statistically significant? 4. Can we fit a linear regression line to this data? 5. How can we use that line to predict for future observations? Let s go through the soil water data together, then you can attempt the mother data set on your own. 1. To answer the first two questions, we can invoke the corr procedure. proc corr data=soilwater; var depth soil; Run this code and inspect the output. To get the tests we will cover in class (and some nice plots), add in the following: ods graphics on; proc corr data=soilwater plots=matrix fisher(biasadj=no); var depth soil; ods graphics off; Run this code and inspect the plots. Use the output to answer the first 3 questions above. There are many other options for tests that can be performed using the proc corr procedure. Check out http://support.sas.com/documentation/cdl/en/procstat/63104/html/default/viewer.htm#procstat_corr_sect004.htm for more information. 4

Let us look into fitting a regression line with this data. Which variable would we consider our response (dependent variable), which our predictor (independent variable)? We can use proc reg to fit a regression line (we could also use the proc glm or proc mixed, which will be discussed later in the course). The basic code to invoke the reg procedure is: proc reg data=soilwater; model soil=depth; Run this code and inspect the output. What hypotheses are being tested by each p-value you see? To see a scatterplot with a regression line, residual diagnostics, predicted values, and confidence intervals for our parameter estimates we can run the following: ods graphics on; proc reg data=soilwater; model soil=depth/r clb; ods graphics off; Inspect the output and plots. Does the line appear to fit the data on the scatterplot well? Do the residuals appear to have constant variance? What do our confidence intervals tell us about our parameter estimates? We can see that by adding /r to the model statement we get information on the predicted value of any value of the independent variable that was included in our data set. How can we get other predicted values (e.g. for a depth of 12.5 cm or 49 cm)? We can use the equation given to estimate the predicted values or have SAS To do this we can trick SAS into giving us a predicted value by appending missing values onto our data set. SAS sees a. as a missing value, so we can run the following code: 5

data newdepths; input depth soil; datalines; 12.5. 49. ; proc datasets; append base=soilwater data=newdepths; Run this code, check the log to make sure everything worked, and use proc print to print out the new soilwater data set. Now if we run the same proc reg code with /r we will get predicted values at depths of 12.5 and 49. (Note, we can also get C.I. s and P.I. s for these values, which we will talk about at a later time.) Try to answer the following questions about the mother data set on your own: 1. What is the sample correlation between weight gain and age? 2. Is the sample correlation significantly different from zero? 3. Which variable would we consider the response and which the independent variable? Why? 4. Fit a regression line to the data, is the slope significantly different from 0? 5. Do the data appear to satisfy the assumption of constant variance? 6. Predict the value of weight gain for someone who is 20 years old and someone who is 35 years old. 6