SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

Similar documents
Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

Level I: Getting comfortable with my data in SAS. Descriptive Statistics

INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

Introductory Guide to SAS:

Applied Regression Modeling: A Business Approach

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

SAS Training Spring 2006

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

Applied Regression Modeling: A Business Approach

22S:166. Checking Values of Numeric Variables

Lab 1: Introduction to Data

Lab #1: Introduction to Basic SAS Operations

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

SAS: Proc GPLOT. Computing for Research I. 01/26/2011 N. Baker

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

EXST SAS Lab Lab #8: More data step and t-tests

Intermediate SAS: Working with Data

Paper S Data Presentation 101: An Analyst s Perspective

ST Lab 1 - The basics of SAS

EXST SAS Lab Lab #6: More DATA STEP tasks

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

Some Basics of CQUEST

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Create a SAS Program to create the following files from the PREC2 sas data set created in LAB2.

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Stat 302 Statistical Software and Its Applications SAS: Data I/O & Descriptive Statistics

Week 2: Frequency distributions

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Stat 302 Statistical Software and Its Applications SAS: Data I/O

Final Stat 302, March 17, 2014

SPSS. (Statistical Packages for the Social Sciences)

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

Get into the Groove with %SYSFUNC: Generalizing SAS Macros with Conditionally Executed Code

Using the Health Indicators database to help students research Canadian health issues

An Introduction to SAS University Edition

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

8. MINITAB COMMANDS WEEK-BY-WEEK

An Introduction to R- Programming

CFB: A Programming Pattern for Creating Change from Baseline Datasets Lei Zhang, Celgene Corporation, Summit, NJ

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Dr. Barbara Morgan Quantitative Methods

IPUMS Training and Development: Requesting Data

Know Thy Data : Techniques for Data Exploration

It s Proc Tabulate Jim, but not as we know it!

STAT:5400 Computing in Statistics

An introduction to SPSS

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares

Chapter Two: Descriptive Methods 1/50

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

Package mangotraining

Intermediate SAS: Statistics

Using PROC SQL to Generate Shift Tables More Efficiently

Basic Medical Statistics Course

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

How to Go From SAS Data Sets to DATA NULL or WordPerfect Tables Anne Horney, Cooperative Studies Program Coordinating Center, Perry Point, Maryland

%ANYTL: A Versatile Table/Listing Macro

Analysis of variance and regression. November 13, 2007

Analysis of variance and regression. November 13, 2007

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Reading data in SAS and Descriptive Statistics

AP Statistics Summer Assignment:

Basic Medical Statistics Course

PART I: USING SAS FOR THE PC AN OVERVIEW 1.0 INTRODUCTION

Data organization. So what kind of data did we collect?

Telephone Survey Response: Effects of Cell Phones in Landline Households

April 4, SAS General Introduction

EXAMPLE 2: INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny

Basic Concepts #6: Introduction to Report Writing

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Introduction to STATA

(on CQUEST) A.L. Gibbs

Tabulating Patients, Admissions and Length-of-Stay By Dx Category, Fiscal Year, County and Age Group

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Advanced Data Visualization using TIBCO Spotfire and SAS using SDTM. Ajay Gupta, PPD

Tabular & Graphical Presentation of data

Brief Guide on Using SPSS 10.0

2. Don t forget semicolons and RUN statements The two most common programming errors.

Introduction to SAS: General

STAT 7000: Experimental Statistics I

Preparing for Data Analysis

STAT 503 Fall Introduction to SAS

STA 570 Spring Lecture 5 Tuesday, Feb 1

A SAS Macro for Covariate Specification in Linear, Logistic, or Survival Regression

STATA 13 INTRODUCTION

Tools to Facilitate the Creation of Pooled Clinical Trials Databases

Methods for Estimating Change from NSCAW I and NSCAW II

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen

Exam Questions A00-281

(on CQUEST) A.L. Gibbs

AND NUMERICAL SUMMARIES. Chapter 2

Transcription:

SAS Programs SAS Lecture 4 Procedures Aidan McDermott, April 18, 2006 A SAS program is in an imperative language consisting of statements. Each statement ends in a semi-colon. Programs consist of (at least) four types of statement grouped together in blocks: global statements data step statements procedures statements macro statements Outline SAS Formats: proc format Procedures for descriptive data analysis: the freq, means, and univariate procedures Procedures for statistical analysis: the ttest and reg procedures What will the output from this program look like? How many variables will be in the dataset example, and what will be the length and type of each variable? What will the variable package look like? SAS Formats It is sometimes useful to store data in one way and display it in another. For example, dates can be stored as integers but displayed in human readable format. A SAS format changes the way the data stored in a variable is displayed. There are two types of format: Internal formats (SAS already knows about these) User defined formats (you define these yourself). Internal SAS formats A format statement tells SAS to use a format with one or more variables 1

Permanent formats User defined formats A format statement added to a datastep permanently connects the name of a format to a variable. The format name is stored in the dataset header. A format statement begins with the format keyword and ends with a semicolon (it is a SAS statement after all). You can define your own format using the format procedure. Like all procedures it begins with the key word proc and ends with the run statement. proc format syntax: User defined formats proc format <options>; value formatname range1 = formatted value1.. rangen = formatted valuen ; proc format defines the format yesno. the format statement applies the format to the variable death. Used to define a format. User defined formats Define the format using proc format Tell SAS to use the format with a specific variable by using the format statement as before. User defined formats: Example proc format; value gen 1 = male 2 = female ; value age 10-29 = 10-29 30-39 = 30-39 40-49 = 40-49 50-75 = 50-75 ; value $dpt A = Dept A. B = Dept B. ; Defines three formats, gen, age and dpt. Format dpt is a character format suitable for character variables. 2

format ranges You can specify a range of values to be formatted in a given way proc format; value age 10-29 = 10-29 30-39 = 30-39 40-49 = 40-49 50-75 = 50-75 ; inclusive ranges you can use formats as look-up tables to categorize a variable. specifying format ranges low high other lowest value (excludes missing) highest value all other values not listed (including missing values) value1 - value2 means [value1,value2] value1 -< value2 means [value1,value2) value1 <- value2 means (value1,value2] the put function allows you to capture the formatted value in another variable. format names: must be 8 or fewer characters long cannot end with a number character formats begin with a $ can not use a SAS internal format name refer to format in format statement by using the name followed by a period Descriptive statistics exploratory data analysis is very important from many perspectives in SAS there are three procedures used routinely proc freq produces frequency counts and crosstabulation tables computes tests and measures of association Procedure freq univariate means tables for categorical data descriptive statistics for numeric data descriptive statistics for numeric data Syntax: proc freq <options>; tables requests / <options>; 3

proc freq data=mydata is an option Example The dataset sample has a variable gender. We would like to know what proportion of the sample data are male and what proportion are female. proc freq data=mydata; tables gender race chd ; tables gender * chd / chisq relrisk; proc freq data=sample; tables gender; chisq and relrisk are requests for statistics Example data: NMES The national medical expenditure survey (1987). Examine smoking and gender. Libname mylib c:\sas2006\lecture4 ; proc format; value smoke 0 = never 1 = current 2 = former ; value gen 0 = female 1 = male ; make two formats smoke and gen for the smoking and gender variables Example data: NMES proc freq data=mylib.nmes; tables male*smoke / chisq; format male gen. smoke smoke.; mylib is a libname (folder), nmes is the data 4

Output: Output: proc univariate produces simple descriptive statistics use PLOT options on PROC statement stem-and-leaf plot box plot normal probability plot (QQ plot) side by side box plots for by variable groups Syntax: proc univariate proc univariate data=mylib.nmes plot; title Univariate Output for Age ; var lastage; proc univariate <options>; var varlist / <options>; Output: Output: 5

Output: proc means similar to univariate no plots nicer output, particularly for more that one variable Syntax: proc means <options>; class varlist; var varlist / <options>; by varlist; output out=outdata <options>; proc means options data=dataset statistic default is: n mean std min max Others are: nmiss range median clm noprint suppress printing of output statements class statistics produced for each combination of class variable by statistics produced by each combination of by variables output produce an output dataset which contains the statistics proc means proc means data=mylib.nmes noprint n mean std stderr range nmiss; class male; var lastage; output out=results n=nage mean=mage std=sage; format male gen.; proc print data=results; Output: In a number of previous Phase I and II studies of male, non-insulin-dependent diabetic (NIDD) patients conducted by a drug company the mean body mass index (BMI) was found to be 28.4. An investigator has 17 male NIDD patients enrolled in a new study and wants to know if the BMI from this sample is consistent with previous findings. Patient Number 1 2 3 4 5 6 7 8 Height (CM) 178 170 191 179 182 177 184 182 Weight (Kg) 101.7 97.1 114.2 101.9 93.1 108.1 85.0 89.1 6

proc ttest Syntax: proc ttest <options>; var varlist; paired pairlist; by varlist; class varlist; Two-sample paired t-test. A new compound, ABC-123, is being developed for long-term treatment of patients with chronic asthma. Asthmatic patients were enrolled in a double-blind study and randomized to receive daily oral doses of ABC-123 or a placebo for 6 weeks. The primary measurement of interest is the resting FEV1 (forced expiratory volume during the first second of expiration), which is measured before and at the end of the 6-week treatment period. Does administration of ABC-123 appear to have any effect on FEV1? Patient Number 1 2 3 4 5 ABC-123 Yes Yes Yes Yes Yes Baseline FEV1 (Liters) 1.35 3.22 2.79 2.45 1.84 Week 6 FEV1 (Liters) N/A 3.55 3.15 2.30 2.37 7

Modeling with SAS examine relationships between variables estimate parameters and their standard errors calculate predicted values evaluate the fit or lack of fit of a model test hypotheses design outcome The linear model y = x 0 1 1 2 2 k k β + β x + β x + K + β + ε 2 ε ~ N(0, σ ) Weight = β 0 + β1height + β 2 Age + ε Note: outcome variable must be continuous and normal given independent variables 8

the linear model with proc reg estimates parameters by least squares produces diagnostics to test model fit (e.g. scatter plots) tests hypotheses proc reg data=mydata; model weight = height age; proc reg Syntax: proc reg <options>; model response = effects </options>; plot yvariable*xvariable = symbol ; by varlist; output <OUT=SAS data set> <output statistic list>; proc reg proc reg statement syntax: data = SAS data set name input data set outest = SAS data set name creates data set with parameter estimates simple prints simple statistics proc reg the model statement model response=<effects></options>; required variables must be numeric many options can specify more than one model statement model weight = height age; model weight = height age / p clm cli; proc reg the plot statement plot yvariable*xvariable <=symbol> </options>; produces scatter plots - yvariable on the vertical axis and xvariable on the horizontal axis can specify several plots optional symbol to mark points yvariable and xvariable can be variables specified in model statements or statistics available in output statement plot weight * age / pred; plot r. * p. / vref = 0; proc reg some statistics available for plotting: P. predicted values R. residuals L95. lower 95% CI bound for individual prediction U95. upper 95% CI bound for individual prediction L95M. lower 95% CI bound for mean of dependent variable U95M. upper 95% CI bound for mean of dependent variable plot weight * age / pred; plot r. * p. / vref = 0; plot (weight p. l95. U95.) * age / overlay; 9

proc reg the output statement output <OUT=SAS data set> keywords=names; creates SAS data set all original variables included keyword=names specifies the statistics to include output out=pvals p=pred r=resid; NMES variables of interest: totalexp total medical expenditure ($) chd5 indicator of CHD lastage age at last interview male sex of participant proc reg example here: 1. model estimate parameters etc 2. plot make three plots 3. output make an output dataset regout The run statement Many people assume that the run statement ends a procedure such as proc reg. This is because when SAS encounters a run statement it executes any outstanding instructions in the program buffer. But it may or may not end the procedure. proc reg data=lecture4.nmes; model totalexp = chd5 lastage male; model totalexp = chd5 lastage; plot r.*chd5; quit; /* ends the procedure */ 10