INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3

Similar documents
Create a SAS Program to create the following files from the PREC2 sas data set created in LAB2.

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Creating New Variables in JMP Datasets Using Formulas Exercises

ASSIGNMENT #2 ( *** ANSWERS ***) 1

A Simple Guide to Using SPSS (Statistical Package for the. Introduction. Steps for Analyzing Data. Social Sciences) for Windows

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

OneUSG Connect. Hire a New Employee. Hire a New Employee HR_JA002

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

1. Study Registration. 2. Confirm Registration

Road Map for CAT4 Suite. CAT4 Road Map. Road Map for CAT4 Suite

Remove this where. statement to produce the. report on the right with all 4 regions. Retain this where. statement to produce the

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

i2itracks Population Health Analytics (ipha) Custom Reports & Dashboards

SAS and Data Management

Statistical Tests for Variable Discrimination

Basic Medical Statistics Course

22S:166. Checking Values of Numeric Variables

1 Files to download. 3 Macro to list the highest and lowest N data values. 2 Reading in the example data file

Family doctor services registration

STEP BY STEP HOW TO COMPLETE THE ELECTRONIC BGC FORM

Use this task to submit a marriage life event in the UCPath website.

B/ Use data set ADMITS to find the most common day of the week for admission. (HINT: Use a function or format.)

Vision Services Application Overview

Vine Medical Group Patient Registration Form Your Information

Mission Lipid Data Management Software User s Guide

Registering and submitting data for multiple healthcare organizations... 4

using and Understanding Formats

Task: Design an ER diagram for that problem. Specify key attributes of each entity type.

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo

Introduction to SAS Statistical Package

Basic Medical Statistics Course

Admission Application: Intensive Residential Rehabilitation / Community Residence / Supportive Living COVER PAGE

IENG484 Quality Engineering Lab 1 RESEARCH ASSISTANT SHADI BOLOUKIFAR

SAS and Data Management Kim Magee. Department of Biostatistics College of Public Health

Contents. About This Book...1

Physician Quality Reporting System Program Year Group Practice Reporting Option (GPRO) Web Interface XML Specification

Using an ICPSR set-up file to create a SAS dataset

The first three steps in data entry, with examples in PASW/SPSS. Steve Simon P.Mean Consulting

Pittsource Instructions: Applying to the Standardized Patient position

Step 1: Completing the CCCApply and Cabrillo Application (TO BE COMPLETED FROM OCT 1 st and ON)

STAT 7000: Experimental Statistics I

HealthlinkOnline Lung Cancer Referral User Guide

Application For Employment (Apprenticeship Application Form)

GUADALUPE ENT, P.A. JENNIFER G. HENNESSEE, M.D. MAANSI DOSHI, D.O. LISA M. WRIGHT, PA

Bulk Registration File Specifications

GP Mac. Drug report. Figure 0.1 Patient s on a Drug pop-up box

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

INTRODUCTION TO SPSS. Anne Schad Bergsaker 13. September 2018

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

Automating Unpredictable Processes:

TRS-ACTIVECARE ENROLLMENT

Tanita Health Ware Help

Search and Reports. Vision 3

Person Centered Supported Living. Quarterly Report Project Period: July 1, September 30, Community and Family Support (CFS)

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

Lab #1: Introduction to Basic SAS Operations

Beyond FORMAT Basics Mike Zdeb, School of Public Health, Rensselaer, NY

Data Anonymization - Generalization Algorithms

B. Graphing Representation of Data

Automating the Production of Formatted Item Frequencies using Survey Metadata

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

2. Don t forget semicolons and RUN statements The two most common programming errors.

The first thing you may want to do is copy the EMS accounts to HMS. The certified list pulls the account information from HMS, but HMS isn t always

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

Human Capital Management: Step-by-Step Guide

A Feasibility and Acceptability Study of the Provision

Lab 1: Introduction to Data

If you have never used IACRA, your first step is to become registered as an applicant.

ENHANCED DBS APPLICATION FORM

Introduction to Database Concepts and Microsoft Access Database Concepts and Access Things to Do. Introduction Database Microsoft Access

Presbyterian Enrollment Standard Flat File (SFF) Layout Specification Version 1.7

esers Guide ELECTRONIC REPORTING SYSTEM Serving the People Who Serve Our Schools

Introduction to SAS Mike Zdeb ( , #61

CHCN EOV Documentation

ARTIFICIAL INTELLIGENCE (CS 370D)

SAS Training Spring 2006

TRAINING WORKBOOK Pilot Session 2

NYSLRS NYSLRS. Enroll a Member (Optional)

JAIL TECHNICIAN. Some form of picture identification, such as a driver's license, will be required at examinations.

LibreHealth Electronic Health Record

SYSTEM 2000 Essentials

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

student finance wales ALG FE Assembly Learning Grant Application Form for academic year 2013/14 SFW/ALG/F/V1314/A

proc print data=account; <insert statement here> run;

CPRD Aurum Frequently asked questions (FAQs)

Introducing Categorical Data/Variables (pp )

Dr Wan Nor Arifin Unit of Biostatistics and Research Methodology, Universiti Sains Malaysia.

Select the group of isolates you want to analyze using the chart and statistics tool Create a comparison of these isolates Perform a query or

Subject Area Data Element Examples Earliest Date Patient Demographics Race, primary language, mortality 2000 Encounters

CINAHL Plus with Full Text

Using the Health Indicators database to help students research Canadian health issues

Personal Data Change Form - Nordic

Research with Large Databases

eschoolplus Alief Independent School Distirct ONLINE STUDENT ENROLLMENT

Standard Safety Visualization Set-up Using Spotfire

The editor window is where we write our SAS programs which we will begin doing shortly.

Transcription:

Topics: Data step Subsetting Concatenation and Merging Reference: Little SAS Book - Chapter 5, Section 3.6 and 2.2 Online documentation Exercise I LAB EXERCISE The following is a lab exercise to give you experience combining SAS data sets. The data files, nmes, employee1-employee4, data1-data3, wide, long2, lab3longtowide.sas, and lab3widetolong.sas are located on the website on the LAB page under class3 http://www.biostat.jhsph.edu/bstcourse/bio632/default.htm. Download the self-extracting file class3.exe from the website. Extract contents to the d:\temp\sasclass folder. Create the folder if it does not exist. Start the SAS Program If you are taking the class for credit (either pass/fail or graded), please read the italicized instructions at the end of each section. You will need to print out sections of the SAS log and output windows and answers to some of the questions at the end of lab,. Please do not print all of the logs and output windows. Please label each section clearly and put your name at the top of the pages. Use a TITLE statement. The data is stored in SAS file nmes. All missing data values are coded as 9. The variables included in the data file are : Variable Name Age Gender Race Smoke Description Age of Subject 1 = Male 0 = Female 1 = African American 0 = Other 1 = Current 2 = Former 1

3 = Never -9=unknown LC CHD BMI Expend Marital Educ 1 = Lung Cancer or Laryngeal Cancer or COPD 0 otherwise 1 = Coronary Heart Disease 0 otherwise Body mass index with two decimal places -9=unknown Subjects Total Self-Reported Medical Expenditures 1 = Married 2 = Widowed or Divorced or Separated 3 = Never Married 1 = 1 year of college or more 2 = Completed High School 3 = Less than High School -9=unknown A. Write a Data step that will do the following : 1. Using IF/THEN statements recode the missing data (coded as 9) to. for the variables: smoke, bmi and educ. 2. Create a new variable called LogBMI using the SAS Function LOG. 3. Create a categorical Age variable that breaks age into the following categories: 40 55, 56 65, > 65 (Note: no subjects in the dataset are less than 40 years old) 4. Check that your code works correctly by printing out the resulting data set for the first 30 observations. Use a data set option in a PROC step. HAND IN: Print out the output from #4 ONLY to hand in. Label this section Lab3 Exercise 1 QA.4 2

B. Create the following subsets: 1. Create a file nmes1 that contains only males who are <=65 years old 2. Create two files in the same data step that contain males and females separately. HAND IN: Print out the SAS log from part B 1 and 2. Label this section Lab3 Exercise I QB. Exercise II A. Concatenation and Merging 1. DATA1 and DATA2 are SAS data sets described below that contain disease and follow-up information on a group of patients. The maximum number of diseases codes (ICD-9 codes) is 6. We want to create a new file, DATA1_2, by combining these two files. Both of the files contain the variables described below. Type in the following program into the ENHANCED EDITOR window and submit to create one file with the data derived from these two files. Check the SAS log and answer the questions. Libname mylib d:\temp\sasclass ; Data data1_2; Set ; Run; How many observations are in DATA1? How many observations in Data1_2? How many variables? Variable Description Type ID Patient ID Numeric DX1 Diagnosis 1 Character DX2 Diagnosis 2 Character DX3 Diagnosis 3 Character DX4 Diagnosis 4 Character DX5 Diagnosis 5 Character DX6 Diagnosis 6 Character Sex 0 = female Numeric 3

Yearc 1 = male Year of last contact Numeric Yob Year of Birth Numeric Cvd Cardiovascular Disease 0 = no 1 = yes Numeric Smoker 0 = no 1 = yes Numeric Chol Cholesterol mg/dl Numeric 2. We have additional patient information to add to the Data1_2 file created in 1. DATA3 contains additional information described below for the patients in the Data1_2 file. Create a new SAS data set (ALLDATA) by match-merging the data in Data1_2 with the data in DATA3 using a key variable (id). This is a description of the data in DATA3 Variable Type Description ID numeric id SBP numeric systolic blood pressure mmhg DBP numeric diastolic blood pressure mmhg NO_CIG numeric number of cigarettes per day 0=none 1=1-10 2=11-19 3=20-39 4=40 or more BMI 18-21 numeric body mass index kg/m 2 Remember we need to sort both files by ID before merging (using PROC SORT). Proc Sort data= ; by id; Proc Sort data= ; by id; Data mylib.alldata; merge ; Proc print data=mylib.alldata; Run; Check the SAS log for errors. Although you may not have any errors, there is a major problem with the merge program. The program did not match-merge the data because the BY statement was missing. Instead the file was sequentially matched and data from different patients were combined into one record. How many observations in the ALLDATA file? Compare the values for ICD-9 codes for the first five records of the ALLDATA file to the first five record of the Data1_2. Notice the problems with the matching. 4

Now return to the program editor window, add the BY statement to the DATA step and rerun. How many observations are in the ALLDATA file? Compare the first five records to the records in Data1_2. 3. We are going to use the data set option (in= ) to determine which records did not match. Return to the program in the Enhanced Editor and add the following instructions to the DATA step. Remember the in variable for each file will equal one for each record on that file. Data mylib.alldata; merge (in=count) (in=count2); by id; If count=0 then put id= count=; If count2=0 then put id= count2=; Proc print data=mylib.alldata; Title With By statement ; Run; Review the log window. How many records from the DATA1_2 file did not have a match in DATA3? How many records from the DATA3 file did not have a match in Data1_2? 4. Suppose you only want to include those records that matched included in my ALLDATA file. You can use the count and count2 variables in the DATA step to exclude the non-matches using IF-THEN clauses. Add the appropriate statement(s) to the program and run. Check the SAS log for errors. HAND IN: Print out the SAS log from this final DATA step and the answer to the following question. Label this section Lab3 Exercise II Part A Q4. How many observations are in the ALLDATA file? NOTE: The SAS system has an option to prevent accidental merging without a BY statement. Look at the NOMERGEBY system option in HELP for further details. 5

B. Concatenation and Merging The following files contain employee information. Use the SET and MERGE statements to combine the following files. 1. Create a combined SAS data set named employee1_2 (temporary or permanent, you choose) by concatenating the employee1 and employee2 files (SAS data sets). The data sets contain the following variables: Variable SSN Description SOCIAL SECURITY NUMBER ( XXXXXXXXX) Name employee name : lastname, first name Hire hire date Date Variable Salary Phone annual salary office telephone number: In the form : XXX-XXXX Add a LABEL statement to the DATA step to label the name, hire, and phone variables with the description given above. Add a PROC CONTENTS step to list out the contents of employee1_2. Review the LOG and OUTPUT windows. How many records are in the employee1_2 SAS data set? 2. Employee3 contains additional employees that we need to add to the file created in 1. Combine this file with the employee1_2 SAS data set created in section A.1 and name the new SAS data set employee123. DO NOT INCLUDE the variable name in the employee123 file (DROP or KEEP Data Set Option). The employee3 file includes the following variables: Variable SSN Description SOCIAL SECURITY NUMBER ( XXXXXXXXX) Name employee name : In the form lastname, first name Gender gender F=female M=male 6

Hire hire date Date variable Salary annual salary Notice employee3 does not contain the phone variable, but does include the gender variable. HAND IN: Print the OUTPUT window (from #2 only) containing the listing of employee123. Make sure that you put the name EMPLOYEE 123 file as the title at the top of the listing. Include the answers to the following 3 questions in your report. Label this section Lab3 Exercise II Part B Q2. 1. How many observations? 2. What is the value for gender for SSN=244967839? 3. What is the office telephone number for SSN=933476520? 3. Add the following data from the employee4 file to the records from employee123 file created in 2. Employee4 contains additional information on the employees in the employee123 file. SSN is the key variable. Variable SSN Description SOCIAL SECURITY NUMBER (XXXXXXXXX) Left date left the company date variable Blank if still an employee Phone home phone number In the form (XXX-XXXX) First, run PROC CONTENTS on the employee4 file. Notice the label for the phone variable. It is the home phone number. The variable phone on the employee123 file is the office telephone number. We want to merge the employee4 SAS data set with the employee123 SAS data set created in 3, BUT we want to keep both the home and office phone numbers. Remember SAS will retain only one of the variables because they have the same name 7

(Hint: use a Data Set Option on the MERGE statement).match-merge using SSN as the key variable and create a new SAS data set employee_total. Print out the file using PROC PRINT. HAND IN: Print the LOG and OUTPUT windows (from #3) containing results from the program creating employee_total and the answers to the following questions. Please label this part of the report as Lab 3 Exercise II Part B. Q3. 1. How many records are in the employee123 and employee4 files? 2. How many records and variables are in the employee_total file? 3. List the SSN of the records that do not match? Use the IN data set option to identify the records that do not match and list them in the LOG window. 4. How many variables does the file employee_total have? 4. Modify the DATA step that creates employee_total to use the IN data option to include only those observations that exist in both files. There will be 14 observations in employee_total. HAND IN: Print the SAS LOG (from #4) creating the new employee_total. Label this section as Lab 3 Exercise II Part B Q4. 8