SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

Similar documents
Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

SAS Programming Basics

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

DSCI 325: Handout 9 Sorting and Options for Printing Data in SAS Spring 2017

Exam Questions A00-281

Processing SAS Data Sets

Formulas and Functions

Overview of Data Management Tasks (command file=datamgt.sas)

Introduction to SAS Mike Zdeb ( , #61

Chapter 3: Working With Your Data

ERROR: ERROR: ERROR:

Checking for Duplicates Wendi L. Wright

22/10/16. Data Coding in SPSS. Data Coding in SPSS. Data Coding in SPSS. Data Coding in SPSS

Control Structures. CIS 118 Intro to LINUX

DSCI 325: Handout 4 If-Then Statements in SAS Spring 2017

Producing Summary Tables in SAS Enterprise Guide

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

Bourne Shell Reference

EXAM - A Clinical Trials Programming Using SAS 9 Accelerated Version. Buy Full Product.

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

Fundamentals: Expressions and Assignment

Expressions. Definitions CHAPTER 12

Title:[ Variables Comparison Operators If Else Statements ]

%ANYTL: A Versatile Table/Listing Macro

Handling missing values in Analysis

10 The First Steps 4 Chapter 2

Introduction to DATA Step Programming SAS Basics II. Susan J. Slaughter, Avocet Solutions

XQ: An XML Query Language Language Reference Manual

INTRODUCTION SAS Prepared by A. B. Billings West Virginia University May 1999 (updated August 2006)

example: name1=jan name2=mike export name1 In this example, name1 is an environmental variable while name2 is a local variable.

Are You Missing Out? Working with Missing Values to Make the Most of What is not There

22S:166. Checking Values of Numeric Variables

The SAS interface is shown in the following screen shot:

Chapter 6: Modifying and Combining Data Sets

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

STAT:5400 Computing in Statistics

bash Execution Control COMP2101 Winter 2019

BASIC ELEMENTS OF A COMPUTER PROGRAM

Lexical Considerations

Chapter 2: Using Data

Data analysis using Microsoft Excel

FSEDIT Procedure Windows

Opening a Data File in SPSS. Defining Variables in SPSS

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

The Essential Meaning of PROC MEANS: A Beginner's Guide to Summarizing Data Using SAS Software

A Quick and Gentle Introduction to PROC SQL

SYSTEM 2000 Essentials

I AlB 1 C 1 D ~~~ I I ; -j-----; ;--i--;--j- ;- j--; AlB

Pathologically Eclectic Rubbish Lister

Microsoft Excel 2010 Handout

Creating a data file and entering data

Ten Great Reasons to Learn SAS Software's SQL Procedure

Lexical Considerations

A Side of Hash for You To Dig Into

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Template Language and Syntax Reference

1. Managing Information in Table

Software Reference Sheet: Inserting and Organizing Data in a Spreadsheet

B E C Y. Reference Manual

More with SQL Queries. Advanced SAS Programming

Getting Information from a Table

ECLT 5810 SAS Programming - Introduction

Introduction to DATA Step Programming: SAS Basics II. Susan J. Slaughter, Avocet Solutions

Plan Sponsor Security Quick Reference

Intermediate SAS: Working with Data

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

Recognition of Tokens

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

1 Lexical Considerations

Chapter 7 File Access. Chapter Table of Contents

Basic Concepts #6: Introduction to Report Writing

Chapter 1. Manage the data

Ordinary Differential Equation Solver Language (ODESL) Reference Manual

It s Proc Tabulate Jim, but not as we know it!

Lab 4: Shell scripting

Creation of SAS Dataset

Formulas in Microsoft Excel

Lesson 1: Creating and formatting an Answers analysis

MISSING VALUES: Everything You Ever Wanted to Know

Lab 1: Introduction to Data

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

STATISTICAL TECHNIQUES. Interpreting Basic Statistical Values

44 Tricks with the 4mat Procedure

Tabulating Patients, Admissions and Length-of-Stay By Dx Category, Fiscal Year, County and Age Group

chapter 2 G ETTING I NFORMATION FROM A TABLE

GIFT Department of Computing Science Data Selection and Filtering using the SELECT Statement

Working with Data and Charts

Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex

Report Writer Reference Manual

An Introduction to Stata

Formats and the Format Procedure

T H E I N T E R A C T I V E S H E L L

Macros are a block of code that can be executed/called on demand

Lab #3: Probability, Simulations, Distributions:

Excel Tips for Compensation Practitioners Weeks Pivot Tables

David Beam, Systems Seminar Consultants, Inc., Madison, WI

Transcription:

SAS seminar April 15, 2003 Åsa Klint The little SAS book Chapters 3 & 4 By LD Delwiche and SJ Slaughter Data step - read and modify data - create a new dataset - performs actions on rows Proc step - use an existing dataset - produce an output/results - performs actions on columns 1 2 3.1 Creating and Redefining variables Creating and redefining variables is straightforward in a SAS data step variable = expression; s Newvar=constant; Newvar=oldvar * constant; Use DROP or KEEP statement to decrease number of variables in your dataset Ordinary operators + addition - subtraction * multiplication / division ** exponentiation Standard mathematical rules of precedence. Use parentheses to override that order. 3 4 3.2 / 3.3 Using SAS functions Functions perform calculations or transformations and are used to simplify your work as the programming has already been taken care of by SAS. Write help functions in the command line in SAS for information on available functions General form Variable=function-name(argument, argument,...); Note: an argument can be a literal value, variable name or an expression Newvar=SUBSTR(arg, position, n) pnr='19970530-0765' (13 positions) birthdate=substr(pnr, 3, 6) (from position 3 I get six positions)... which results in... birthdate='970530' Note: this is still a CHAR variable 5 6 1

3.7 Working with SAS Dates The SAS system for working with dates is to count the number of days since January 1, 1960. Date SAS date value Jan 1, 1959-365 Jan 1, 1960 0 Jan 1, 1961 366 April 15, 2003 15810 This can then be transformed into something we can interpret by using formats/functions 3.4 Using IF-THEN statements Often we want to create variables on the form IF condition THEN action; if job='banker' then highses=1; IF condition AND condition THEN action; 2 if job='banker' and age>65 then ret_banker=1; 7 8 Comparisons and Logical operators symbol mnemonic meaning = EQ equals =, ^=, ~= NE not equal > GT greater than < LT less than >= GE greater than or equal <= LE less than or equal & AND all comparisons must be true,,! OR only one comparison must be true DO-END statement a DO-END statement is useful if you want to make several changes or create new variables for a subgroup or under certain conditions. Note that the DO-loop continues until you end it (using END;) If sex= female then do; if ICD=175 then ovarian=1; else ovarian=0; age_ovarian=diadat-birthdat; end; Else if sex EQ male then do;...... end; 9 10 3.5 Grouping/categorizing observations with IF-THEN-ELSE statements. Say we want to compare bankers to the other two occupations If job= banker then highses=1; else if job= conductor or job= driver then highses=0; else highses=. ; When the condition is true, SAS assigns the stated value to highses and then leaves the loop. The last ELSE is a trash-bin: anything that is not covered by the previous conditions is put to missing. Note that if you use a less than expression, for example X<350, this also includes X=. (missing) Make sure the condition part includes all possibilities, else you might get missings or hidden errors. Looking at the log reveals the number of missing values created, but there is no indication for observations that were not covered by your programming! 11 12 2

IF-THEN-ELSE statements are used for grouping/categorizing variables example if 0<=bmi<20 then bmicat=1; else if 20<=bmi<25 then bmicat=2; else if 25<=bmi then bmicat=3; else if bmi=. then bmicat=.; 3.6 Subsetting your data There are two ways to subset a dataset if you only want to use some of the observations in the dataset Basic form (tells SAS which observations to include): IF expression; Alternative (tells SAS which observations to exclude): IF expression then delete; 13 14 Chapter 4: sorting, printing and summarizing your data SAS procedures are used for actions on columns. Proc Sort Proc Print Proc Format Proc Means Proc Freq 15 4.1 Using SAS procedures The statements TITLE, FOOTNOTE and LABEL titles and footnotes are useful to keep order among your outputs. They stay in effect until you cancel them (by creating a blank title/footnote alt goptions reset=title) or end your SAS session. Ex FOOTNOTE SAS seminar April 15, 2003 ; By using the LABEL statement you can create a longer and more detailed (<256 characters) description for each variable. Ex LABEL HIENG = Energy-intake per day. High=1, Low=0 ; 16 Subsetting in procedures with the WHERE statement The WHERE statement can be used to select a subgroup of your dataset for a procedure in the same way as an IF-statement in a data step. The same logical operators as previously shown can be used here. Other useful phrases IS NOT MISSING BETWEEN AND CONTAINS IN (list) WHERE weight IS NOT MISSING; 17 4.3 Sorting your data with Proc Sort Some procedures demands that the data is sorted for them to be able to work. A BY-statement is used in Proc Sort to specify what variable(s) you want SAS to sort your data by. By default, the dataset you sort is replaced by your new, now sorted, dataset. However, you can create a new dataset by the OUT-option. 18 3

Useful options in Proc Sort NODUPKEY eliminates any duplicate observations with the same values for the BYvariables. Practical in situations with multiple observations for each individual and you only want, for example, the first diagnosis in your dataset. Ex Proc sort data=cancer NODUPKEY; by pnr diadat; By default SAS sorts in ascending order (from lowest to highest alt from A to Z). Add the keyword DESCENDING before the relevant variable to change this. 4.4 Printing your data with Proc Print Proc Print is used to look at the data in your output window. In its simplest form SAS prints all observations and all variables in your dataset. However, you can restrict and group the output by statements VAR variable-list - specifies what variables to print and in what order BY variable-list - groups the data into sections. Demands sorting according to BY-variable 19 20 Changing the appearance of printed values with Formats SAS decides what format it thinks is most appropriate for your data in terms of number of decimals, how much space for each value etc. There are many formats for numeric, character and date values. Note the period in the format as this distinguishes a format from a variable name. General forms of a SAS format Character $formatw. Ex $6. Numeric formatw.d Ex BEST6.3 Date formatw. Ex DATE7. 21 Creating your own formats using Proc Format Data are often coded with numbers in order to save disc space. In Proc Format you create formats that fits your purposes and you associate them with your data by using a format statement in the procedure. Ex Proc format value sex 1= male 2= female ; general form (for assigning format to a variable) format varname formatname. ; 22 Note: characters must be enclosed in quotes several values can be assigned the same value label. Separate them with a comma (,) or use the hyphen (-) for a continuous range. the keyword OTHER can be used to assign a value label to any values not listed in the value statement the keywords HIGH and LOW can be used in ranges to indicate the highest and the lowest nonmissing value for the variable. 23 4.9 Summarizing your data using Proc Means Proc Means provides simple statistics for numeric variables so that you get a feel for your data and might discover errors that has occurred while reading in the data. Available options (among others) N (number of non-missing observations) NMISS (number of missing observations) RANGE SUM MEDIAN MEAN STDDEV 24 4

If you use Proc Means with no other statements, you ll get statistics for all observations and all numeric variables in the dataset. To restrict this use statements such as VAR variable-list; specifies which numeric variables to use BY variable-list; Performs separate analyses for each level of variables in the list (note: sorting required) CLASS variable-list; Performs separate analyses for each level of variables in the list. More compact output than in the BY statement (note: no sorting required) Instead of having the statistics displayed in the output window we can create a new dataset with the selected statistics. The NOPRINT option is used to stop SAS from writing in the output-window OUTPUT OUT=newdataset creates a new dataset - The new dataset will contain the variables listed in the output-statistic list, the variables in the by/class-statement (if any) and two new variables (_TYPE_ and _FREQ_) 25 26 4.11 counting your data with Proc Freq Proc Freq is used to produce frequency tables of categorical data. General form Proc Freq data=dataset; tables variable-combinations; You can create from one-way to n-way tables. Options include * List * Missing * Norow * Nocol * Nopercent * Out=dataset 27 28 Next seminar: April 22 THANK YOU FOR LISTENING! Åsa Klint Asa.Klint@mep.ki.se 29 5