Create a SAS Program to create the following files from the PREC2 sas data set created in LAB2.

Similar documents
INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Creating New Variables in JMP Datasets Using Formulas Exercises

Remove this where. statement to produce the. report on the right with all 4 regions. Retain this where. statement to produce the

ASSIGNMENT #2 ( *** ANSWERS ***) 1

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

1 Files to download. 3 Macro to list the highest and lowest N data values. 2 Reading in the example data file

22S:166. Checking Values of Numeric Variables

Using an ICPSR set-up file to create a SAS dataset

Contents. About This Book...1

B/ Use data set ADMITS to find the most common day of the week for admission. (HINT: Use a function or format.)

SAS and Data Management

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

1. Study Registration. 2. Confirm Registration

Task: Design an ER diagram for that problem. Specify key attributes of each entity type.

STAT 7000: Experimental Statistics I

SAS and Data Management Kim Magee. Department of Biostatistics College of Public Health

OneUSG Connect. Hire a New Employee. Hire a New Employee HR_JA002

i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards

NYSLRS NYSLRS. Enroll a Member (Optional)

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo

A Simple Guide to Using SPSS (Statistical Package for the. Introduction. Steps for Analyzing Data. Social Sciences) for Windows

using and Understanding Formats

Database Concepts Using Microsoft Access

GP Mac. Drug report. Figure 0.1 Patient s on a Drug pop-up box

Introduction to SAS Statistical Package

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

If you have never used IACRA, your first step is to become registered as an applicant.

STEP BY STEP HOW TO COMPLETE THE ELECTRONIC BGC FORM

CFAR Biometrics_REDCap Data Restructuring Using SAS

Ten Great Reasons to Learn SAS Software's SQL Procedure

Beyond FORMAT Basics Mike Zdeb, School of Public Health, Rensselaer, NY

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Introduction to Database Concepts and Microsoft Access Database Concepts and Access Things to Do. Introduction Database Microsoft Access

Tanita Health Ware Help

Merge Processing and Alternate Table Lookup Techniques Prepared by

2. Don t forget semicolons and RUN statements The two most common programming errors.

CPRD Aurum Frequently asked questions (FAQs)

Automating Unpredictable Processes:

Lab #1: Introduction to Basic SAS Operations

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

Final Stat 302, March 17, 2014

Update Experience USER GUIDE. NextGen. Contact Us. NextGen Update Experience User Guide. E: P: F:

Intermediate SAS: Working with Data

Lecture 1 Getting Started with SAS

BE/EE189 Design and Construction of Biodevices Lecture 2. BE/EE189 Design and Construction of Biodevices - Caltech

Import and Browse. Review data. bp_stages is a chart based on a graphic

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

1/24/2012. Chapter 7 Outline. Chapter 7 Outline (cont d.) CS 440: Database Management Systems

SAS Training Spring 2006

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

11/27/2011. Derek Chapman, PhD December Data Linkage Techniques: Tricks of the Trade. General data cleaning issue

Lab 1: Introduction to Data

22S:172. Duplicates. may need to check for either duplicate ID codes or duplicate observations duplicate observations should just be eliminated

The first thing you may want to do is copy the EMS accounts to HMS. The certified list pulls the account information from HMS, but HMS isn t always

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Statistical Tests for Variable Discrimination

The BIRO software. 1st EUBIROD Annual Meeting, Dasman Centre, Kuwait City

Search and Reports. Vision 3

Standard Safety Visualization Set-up Using Spotfire

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

The editor window is where we write our SAS programs which we will begin doing shortly.

Omitting Records with Invalid Default Values

Guidance for building Study and CRF in OpenClinica

Welcome to ExACCT! Getting Started

Professional Development

Basic Medical Statistics Course

SYSTEM 2000 Essentials

Risk Adjustment Tool for Length of Stay and Mortality User Guide

INTRODUCTION TO SPSS. Anne Schad Bergsaker 13. September 2018

PCGENESIS PAYROLL SYSTEM OPERATIONS GUIDE

salary changes Mercer using shown here: below. Page

Creation of SAS Dataset

Part Identification. Accuracy. Special Features PI SYSTEM. Irregular Heartbeat Detection (IHB) Low Noise System. Memory Feature. Body.

Acquiring Staff IDs for Non-Payroll Staff

Aeromedical Electronic Resource Office (AERO) User's Guide Completing an AMS in AERO

A Feasibility and Acceptability Study of the Provision

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

Unit 2: Managing Patron Records

Introduction to SAS Mike Zdeb ( , #1

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 7 Data Modeling with Entity Relationship Diagrams

Select the group of isolates you want to analyze using the chart and statistics tool Create a comparison of these isolates Perform a query or

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Database Programming - Section 18. Instructor Guide

Automating the Production of Formatted Item Frequencies using Survey Metadata

proc print data=account; <insert statement here> run;

Subject Area Data Element Examples Earliest Date Patient Demographics Race, primary language, mortality 2000 Encounters

Table of Contents. The RETAIN Statement. The LAG and DIF Functions. FIRST. and LAST. Temporary Variables. List of Programs.

i2b2 User Guide Informatics for Integrating Biology & the Bedside Version 1.0 October 2012

Reference. Table of Contents Page No. PedCath Formula Reference.. 2. Body Surface Area Dubois and Dubois height-weight equation..

CHCN EOV Documentation

GMINs: Frequently Asked Questions for Dealership Employees

PROC REPORT AN INTRODUCTION

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA

Transcription:

Topics: Data step Subsetting Concatenation and Merging Reference: Little SAS Book - Chapter 5, Section 3.6 and 2.2 Online documentation Exercise I LAB EXERCISE The following is a lab exercise to give you experience combining SAS data sets. The data files, nmes, employee1-employee4, data1-data3 are located on the website on the LAB page under class3 http://www.biostat.jhsph.edu/bstcourse/bio632/default.htm. Download the files from LAB on the website to your folder. If you are taking the class for credit (either pass/fail or graded), please read the italicized instructions at the end of each section. Please save the logs, output sections and the answers to the questions specified into one word document and e-mail to the class e-mail sas@jhsph.edu. Please do not send all of the logs and output windows. Please label each section clearly and put your name and LAB3 in the subject line of the e-mail. Use a TITLE statements. Start the SAS Program Create a SAS Program to create the following files from the PREC2 sas data set created in LAB2. 1. Create a temporary file that contains only records with only known values of systolic and diastolic pressure (msbp and mdbp). 2. Create another file that contains only males whose age in 1998 was less than 75 years of age. Do not include the variables wgt and hgt on this dataset. 3. Create two files in the same data step that contain males and females separately. Save the saslog from these 3 data steps send in the exercise e- mail. Label this section Lab3 Exercise 1 1

Exercise II A. Concatenation and Merging 1. DATA1 and DATA2 are two SAS data sets described below that contain disease and follow-up information on a group of patients. The maximum number of diseases codes (ICD-9 codes) is 6. We want to create a new file, DATA1_2, by combining these two files. Both of the files contain the variables described below. Type in the following program into the ENHANCED EDITOR window and submit to create one file with the data derived from these two files. Check the SAS log and answer the questions. Libname mylib insert your folder name ; Data data1_2; Set ; Run; How many observations are in DATA1? How many observations in Data1_2? How many variables? Variable Description Type ID Patient ID Numeric DX1 Diagnosis 1 Character DX2 Diagnosis 2 Character DX3 Diagnosis 3 Character DX4 Diagnosis 4 Character DX5 Diagnosis 5 Character DX6 Diagnosis 6 Character Sex 0 = female Numeric 1 = male Yearc Year of last Numeric contact Yob Year of Birth Numeric Cvd Cardiovascular Numeric Disease 0 = no 1 = yes Smoker 0 = no 1 = yes Numeric Chol Cholesterol mg/dl Numeric 2

2. We have additional patient information to add to the Data1_2 file created in 1. DATA3 contains additional information described below for the patients in the Data1_2 file. Therefore, use a MERGE step to combine these files. We are adding additional data to existing records. This is a description of the data in DATA3 Variable Type Description ID numeric id SBP numeric systolic blood pressure mmhg DBP numeric diastolic blood pressure mmhg NO_CIG numeric number of cigarettes per day 0=none 1=1-10 2=11-19 3=20-39 4=40 or more BMI 18-21 numeric body mass index kg/m 2 Remember we need to sort both files by ID before merging (using PROC SORT). Proc Sort data= ; by id; Proc Sort data= ; by id; Data mylib.alldata; merge ; Proc print data=mylib.alldata; Run; Check the SAS log for errors. Although you may not have any errors, there is a major problem with the merge program. The program did not match-merge the data because the BY statement was missing. Instead the file was sequentially matched and data from different patients were combined into one record. How many observations in the ALLDATA file? Compare the values for ICD-9 codes for the first five records of the ALLDATA file to the first five records of the Data1_2. Notice the problems with the matching. Now return to the program editor window. Create a new SAS data set (ALLDATA) by match-merging the data in Data1_2 with the data in DATA3 using a key variable (id). To do this add a BY statement to the DATA step and rerun. How many observations are in the ALLDATA file? Compare the first five records to the records in Data1_2. 3. We are going to use the data set option (in= ) to determine which records did not match. Return to the program in the Enhanced Editor and add the following instructions to the DATA step. Remember the in variable for each file will equal one for each record on that file. 3

Data mylib.alldata; merge (in=count) (in=count2); by id; If count=0 then put id= count=; If count2=0 then put id= count2=; Proc print data=mylib.alldata; Title With By statement ; Run; Review the log window. How many records from the DATA1_2 file did not have a match in DATA3? How many records from the DATA3 file did not have a match in Data1_2? 4. Suppose you only want to include those records that matched included in my ALLDATA file. You can use the count and count2 variables in the DATA step to exclude the non-matches using IF-THEN clauses. Add the appropriate statement(s) to the program and run. Check the SAS log for errors. SAVE the SAS log from this final DATA step and the answer to the following question in the exercise e-mail. Label this section Lab3 Exercise II Part A. How many observations are in the ALLDATA file? NOTE: The SAS system has an option to prevent accidental merging without a BY statement. Look at the NOMERGEBY system option in HELP for further details. B. Concatenation and Merging The following files contain employee information. Use the SET and MERGE statements to combine the following files. 1. Create a combined SAS data set named employee1_2 (temporary or permanent, you choose) by concatenating the employee1 and employee2 files (SAS data sets). The data sets contain different individuals with the following variables: Variable SSN Description SOCIAL SECURITY NUMBER ( XXXXXXXXX) 4

Name employee name : lastname, first name Hire hire date Date Variable Salary Phone annual salary office telephone number: In the form : XXX-XXXX Add a LABEL statement to the DATA step to label the name, hire, and phone variables with the description given above. Add a PROC CONTENTS step to list out the contents of employee1_2. Review the LOG and OUTPUT windows. How many records are in the employee1_2 SAS data set? 2. Employee3 contains additional employees that we need to add to the file created in 1. Combine this file with the employee1_2 SAS data set created in section A.1 and name the new SAS data set employee123. DO NOT INCLUDE the variable name in the employee123 file (DROP or KEEP Data Set Option). The employee3 file includes the following variables: Variable SSN Description SOCIAL SECURITY NUMBER ( XXXXXXXXX) Name employee name : In the form lastname, first name Gender gender F=female M=male Hire hire date Date variable Salary annual salary Notice employee3 does not contain the phone variable, but does include the gender variable. Use PROC PRINT to print out the new dataset employee123. It should contain all of the records in employee1, employee2 and employee3. SAVE the OUTPUT window (from #2 only) containing the listing of employee123. Make sure that you put the name EMPLOYEE 123 file as the title at the top of the listing. Include the listing and the answers to the following 3 questions in your exercise e-mail. Label this section Lab3 5

Exercise II Part B.1 1. How many observations are in employee123? 2. What is the value for gender for SSN=244967839? 3. What is the office telephone number for SSN=933476520? 3. Add the following data from the employee4 file to the records from employee123 file created in 2. The Employee4 contains additional information on the same employees in the employee123 file. SSN is the key variable to use to match the records. Variable SSN Description SOCIAL SECURITY NUMBER Left date left the company date variable Blank if still an employee Phone home phone number In the form (XXX-XXXX) First, run PROC CONTENTS on the employee4 file. Notice the label for the phone variable. It is the home phone number. The variable phone on the employee123 file is the office telephone number. We want to merge the employee4 SAS data set with the employee123 SAS data set created in 3, BUT we want to keep both the home and office phone numbers. Remember SAS will retain only one of the variables because they have the same name (Hint: use a Data Set Option on the MERGE statement).match-merge using SSN as the key variable and create a new SAS data set employee_total. Print out the file using PROC PRINT. SAVE the LOG and OUTPUT windows (from #3) containing results from the program creating employee_total and the answers to the following questions. Please label this part of the report as Lab 3 Exercise II Part B.2 and include in your exercise e-mail. 1. How many records are in the employee123 and employee4 files? 2. How many records and variables are in the employee_total file? 6

3. List the SSN of the records that do not match? Use the IN data set option to identify the records that do not match and list them in the LOG window. 4. How many variables does the file employee_total have? 4. Modify the DATA step that creates employee_total to use the IN data option to include only those observations that exist in both files(employee123 and employee4). There will be 14 observations in employee_total. Save the SAS LOG (from #4) creating the new employee_total. Label this section as Lab 3 Exercise II Part B.3 and include in your exercise e-mail. 7