Chapter 6: Modifying and Combining Data Sets

Similar documents
Chapter 2: Getting Data Into SAS

Chapter 3: Working With Your Data

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

The Proc Transpose Cookbook

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

Base and Advance SAS

An Introduction to SAS University Edition

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Checking for Duplicates Wendi L. Wright

Transform data - Compute Variables

SAS - By Group Processing umanitoba.ca/centres/mchp

A Side of Hash for You To Dig Into

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Effectively Utilizing Loops and Arrays in the DATA Step

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

Data Set Options. Specify a data set option in parentheses after a SAS data set name. To specify several data set options, separate them with spaces.

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

An Animated Guide: Proc Transpose

SAS Online Training: Course contents: Agenda:

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

2. Don t forget semicolons and RUN statements The two most common programming errors.

Avoiding Macros in SAS. Fareeza Khurshed Yiye Zeng

12. Combining SAS datasets. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 269

SAS CURRICULUM. BASE SAS Introduction

3. Data Tables & Data Management

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

Intermediate SAS: Working with Data

Mastering the Basics: Preventing Problems by Understanding How SAS Works. Imelda C. Go, South Carolina Department of Education, Columbia, SC

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Don t Forget About SMALL Data

CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD

Producing Summary Tables in SAS Enterprise Guide

SAS Linear Model Demo. Overview

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

Processing SAS Data Sets

SAS Macros of Performing Look-Ahead and Look-Back Reads

Contents of SAS Programming Techniques

Introduction to PROC SQL

What you learned so far. Loops & Arrays efficiency for statements while statements. Assignment Plan. Required Reading. Objective 2/3/2018

General Tips for Working with Large SAS datasets and Oracle tables

Query Processing & Optimization

Beginner Beware: Hidden Hazards in SAS Coding

CREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Using PROC SQL to Generate Shift Tables More Efficiently

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

From An Introduction to SAS University Edition. Full book available for purchase here.

5b. Descriptive Statistics - Part II

APCS Semester #1 Final Exam Practice Problems

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

Contents. About This Book...1

Paper S Data Presentation 101: An Analyst s Perspective

S-M-U (Set, Merge, and Update) Revisited

Earthquake data in geonet.org.nz

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Introduction to SAS and Stata: Data Construction. Hsueh-Sheng Wu CFDR Workshop Series February 2, 2015

Merge Processing and Alternate Table Lookup Techniques Prepared by

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

Using Proc Freq for Manageable Data Summarization

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

STAT 540: R: Sections Arithmetic in R. Will perform these on vectors, matrices, arrays as well as on ordinary numbers

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

T.I.P.S. (Techniques and Information for Programming in SAS )

Leave Your Bad Code Behind: 50 Ways to Make Your SAS Code Execute More Efficiently.

Introduction to SAS Mike Zdeb ( , #1

DSCI 325: Handout 9 Sorting and Options for Printing Data in SAS Spring 2017

Lecture 1 Getting Started with SAS

Uncommon Techniques for Common Variables

SAS Enterprise Guide and Add-In for Microsoft Office Hands-on Workshop

David Beam, Systems Seminar Consultants, Inc., Madison, WI

Creating and Executing Stored Compiled DATA Step Programs

Techdata Solution. SAS Analytics (Clinical/Finance/Banking)

Introduction to SAS. Hsueh-Sheng Wu. Center for Family and Demographic Research. November 1, 2010

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

CS1114: Matlab Introduction

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

The inner workings of the datastep. By Mathieu Gaouette Videotron

Not Just Merge - Complex Derivation Made Easy by Hash Object

GETTING DATA INTO THE PROGRAM

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

FSVIEW Procedure Windows

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Random Sampling For the Non-statistician Diane E. Brown AdminaStar Solutions, Associated Insurance Companies Inc.

Basic Concepts #6: Introduction to Report Writing

Using Templates Created by the SAS/STAT Procedures

Getting Started with Arrays and DO LOOPS. Created 01/2004, Updated 03/2008, Afton Royal Training & Consulting

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

. UNDERSTANDING THE TRANSPOSE PROCEDURE

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Long Questions. 7. How does union help in storing the values? How it differs from structure?

Reading data in SAS and Descriptive Statistics

Combining Contiguous Events and Calculating Duration in Kaplan-Meier Analysis Using a Single Data Step

Title. Description. Quick start. stata.com. misstable Tabulate missing values

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

FSEDIT Procedure Windows

Transcription:

Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as a new data set. DATA newdatasetname; SET olddatasetname;...;...;...; run; University of South Carolina Page 1

Could stack multiple data sets together by putting several old data sets in the SET statement. If one data set contains variable(s) not included in the other data set(s), the observations from the other sets will have missing values for those variables in the combined data set. If old data sets are sorted by a specific variable, simply stacking them may not preserve the sorting. To preserve the sorting, we can interleave the data sets, with a BY statement (must sort all data sets first). University of South Carolina Page 2

Merging Data Sets When observations in two or more data sets are connected by having at least one common variable, it is possible to merge the data sets together. Example: DATA combineddataname; MERGE dataset1 dataset2; BY common variable; Note: If the two data sets have an identically named variable (other than the BY variable) then the merged data will contain only the values from the second data set. Both data sets need to be sorted by the BY variable before they can be merged. Can also merge each observation in a smaller data set with several observations from a larger data set (one-to-many match-merge). BY statement necessary in one-to-many merge, not necessary in one-to-one merge if data sets have same number of observations and are in same order. University of South Carolina Page 3

Merging Summary Statistics and Data Often we want to merge summary statistics (either statistics for entire data set or often for groups within the data set) with the observations themselves. First calculate summary statistics using PROC MEANS (after sorting if necessary) Output the summary statistics to another data set with an OUTPUT statement. Give the statistics meaningful names in this output data set. Use a MERGE statement to combine the original data with the OUTPUT data from PROC MEANS. Once summary stats are merged with original data, can calculate: 1. centered data observations 2. standardized data observations 3. data expressed as a percentage of group sums This is done by transforming data through functions involving the summary statistics. University of South Carolina Page 4

Merging the Grand Total with the Original Data When PROC MEANS is used without a BY statement, you can get grand total, grand mean, etc., rather than groupwise statistics. Merging is more difficult because the original data and summary data do not have a common variable. Need to trick SAS: DATA newdataset; IF N =1 THEN SET summarydataset; SET olddataset; University of South Carolina Page 5

Variables read from the summary data set with the first SET statement are retained with all observations. General trick for merging one (or a few) observations with many, where no common variable exists. UPDATE statement similar to MERGE, but typically used when data set changes over time new variables added, values of variables change for old observations, etc. (See pg. 192-193.) University of South Carolina Page 6

Creating several data sets with OUTPUT statement A single DATA step can create several SAS data sets. DATA line must give multiple data set names: DATA set1 set2 set3; OUTPUT statement often used with IF-THEN statements or within a DO loop. Example: IF... THEN OUTPUT set1; ELSE OUTPUT set2; University of South Carolina Page 7

OUTPUT statement can also be used to create several observations from one. Transforms wide data sets into long data sets. Often used with repeated-measures data (several values observed for each individual) OUTPUT also useful for generating function values. Used in a DO loop, OUTPUT will tell SAS to create an observation at each iteration of the DO loop. University of South Carolina Page 8

Data Set Options System options specified in Options statement (affect SAS operation, often formatting) Statement options affect the running of a step. Example: NOPRINT option in PROC MEANS NOWINDOWS option in PROC REPORT DATA =... option in any procedure University of South Carolina Page 9

Data set options: affect reading/writing of data set. Can use in DATA steps (with statements like DATA, SET, MERGE, UPDATE) or in PROC steps (with DATA =... option) KEEP = DROP = ; (specifies variables to keep in data set) ; (specifies variables to drop in data set) RENAME = (oldname = newname); (renames certain variables) FIRSTOBS = OBS = (tells SAS where to start reading data) (tells SAS where to stop reading data) IN option is typically used to track which data set an observation in a combined data set came from. variables in the IN option only exist during that data step, but can be used to create other variables. University of South Carolina Page 10

Selecting Observations with the WHERE= Option The WHERE= option selects a subset of observations similarly to the WHERE statement; it can be used in a DATA step or a PROC step. The basic syntax is: (WHERE = (condition)) such that the logical condition in the inner parentheses is used to select a subset of the observations. If used in a SET, MERGE, or UPDATE statement, the WHERE= option is applied to the data set that is being read. If used in a DATA statement, the WHERE= option is applied to the data set that is being written. University of South Carolina Page 11

Using PROC TRANSPOSE to Flip Observations and Variables PROC TRANSPOSE converts variables (data set columns) into observations (data set rows) or observations into variables. PROC TRANSPOSE DATA =... OUT =...; names the new transposed data set BY...; identifies variables you don t want transposed ID...; values of this variable will become variable names VAR...; the values of these variables will be transposed placed as rows for each level of the BY variable Must first sort by the BY variable. Note: If ID statement is missing, the newly created variables will have names col1, col2, etc. PROC TRANSPOSE is handy for converting wide data files into long data files (or vice versa), especially with longitudinal data. University of South Carolina Page 12

Automatic Variables in SAS During the DATA step, SAS creates temporary automatic variables. These are not typically saved as part of the data set, but they can be used in the DATA step. N keeps track of number of times SAS has looped through the DATA step (i.e., the number of observations that have been read). May be different from obs # if data has been subsetted Automatic variable ERROR is binary: 1 if observation has an error, 0 if no error. FIRST.groupvariable = 1 for first observation with a new value for groupvariable, = 0 otherwise LAST.groupvariable = 1 for last observation with a new value for groupvariable, = 0 otherwise Can be useful for picking out the highest or lowest values for each level of groupvariable. University of South Carolina Page 13