Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA

Similar documents
Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

An Introduction to Visit Window Challenges and Solutions

SAS Programming and HEDIS: An Introductory Overview. Christopher Plummer, Keystone Health Plan Central, Camp Hill, PA

THE IMPACT OF DATA VISUALIZATION IN A STUDY OF CHRONIC DISEASE

Checking for Duplicates Wendi L. Wright

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Earthquake data in geonet.org.nz

Quality health plans & benefits Healthier living Financial well-being Intelligent solutions. Member Secure Website for Aetna International

Your mymeritain Personalized Member Website

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Automating the Production of Formatted Item Frequencies using Survey Metadata

Authorization Submission and Inquiry Guide

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Combining Contiguous Events and Calculating Duration in Kaplan-Meier Analysis Using a Single Data Step

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

PharmaSUG Paper CC11

Going Under the Hood: How Does the Macro Processor Really Work?

1. Study Registration. 2. Confirm Registration

ehepqual- HCV Quality of Care Performance Measure Program

Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI

THE OUTPUT NONMEM DATASET - WITH ADDL RECORDS

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

Creating Macro Calls using Proc Freq

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

LICKING MEMORIAL HOSPITAL PATIENT PORTAL LOGON GUIDE

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

LICKING MEMORIAL HOSPITAL PATIENT PORTAL SELF-ENROLLMENT GUIDE

Employer Reporting. User Guide

The Path To Treatment Pathways Tracee Vinson-Sorrentino, IMS Health, Plymouth Meeting, PA

Files Arriving at an Inconvenient Time? Let SAS Process Your Files with FILEEXIST While You Sleep

Maine ASO Provider Portal Atrezzo End User Guide

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

Referrals Guide. October Referrals Guide

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

Ver. 3 (4/13) (WSP) Physician User Manual

Referral Submission and Inquiry Guide

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS Code Joshua M. Horstman, Nested Loop Consulting, Indianapolis, IN

Please Don't Lag Behind LAG!

ABSTRACT INTRODUCTION SIMPLE COMPOSITE VARIABLE REVIEW SESUG Paper IT-06

PharmaSUG Paper PO12

ICD_CLASS SAS Software User s Guide. Version FY Prepared for: U.S. Centers for Disease Control and Prevention

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

EmployerAccess. Your Online Group Billing account gives you the convenience and control to: Get started with 3 easy steps!

Children s Health System iconnect100: Introduction to iconnect Acute Care User Manual

Step through Your DATA Step: Introducing the DATA Step Debugger in SAS Enterprise Guide

ICD_CLASS SAS Software User s Guide. Version FY U.S. Centers for Disease Control and Prevention. Prepared for:

Specialty Services Requests. Referrals

COLLECTION & HOW THE INFORMATION WILL BE USED

An Efficient Tool for Clinical Data Check

IRF-PAI PROVIDER REPORTS

Loan Closing Advisor SM. User Guide. December 2017

CPRD Aurum Frequently asked questions (FAQs)

Overview of AccuCare Feature Changes

The system will prompt for Login ID and Password (NB login credentials will be supplied to all staff after Commissioning).

PharmaSUG Paper SP04

HPHConnect for Employers User s Guide

APPLIED RESEARCH WORKS, INC.- COZEVA. Specialist Practice Support Guide

Dental Connect Payers

Basic Configuration Training Guide Part 1

The Dataset Attribute Family of Classes Mark Tabladillo, Ph.D., Atlanta, GA

Patient Portal User Guide The Patient s Guide to Using the Portal

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

A Three-piece Suite to Address the Worth and Girth of Expanding a Data Set. Phil d Almada, Duke Clinical Research Institute, Durham, North Carolina

DATA Step Debugger APPENDIX 3

PharmaSUG2014 Paper DS09

Cancer Waiting Times. Getting Started with Beta Testing. Beta Testing period: 01 February May Copyright 2018 NHS Digital

Ver. 4 (4/13) (WSP) Physician Office Staff User Manual

User Guide for TASKE Contact Web Interface

Let SAS Help You Easily Find and Access Your Folders and Files

PowerSchool Student and Parent Portal User Guide.

UC DAVIS PHYSICIANCONNECT Handbook

Covisint DocSite Enterprise

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

Scottish Care Information. SCI Gateway v10.3. Sending Referrals & Receiving Discharges User Guide

Indenting with Style

Clinical Optimization

Program Validation: Logging the Log

Paper Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP, 2 nd ed. Mark Keintz, Wharton Research Data Services

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Washington State Office of Superintendent of Public Instruction. Direct Certification System User s Manual

Availity Remittance Viewer

Welcome to The New Loomis Company Member Website, your complete online health plan Information Center!

Desktop Charge Capture

Common Sense Validation Using SAS

Meritain Connect User Manual. for Employees. 1 Meritain Connect User Guide for Employees

CERNER SINGLE DOCUMENT CAPTURE

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Understanding and Applying the Logic of the DOW-Loop

OLAP Drill-through Table Considerations

Medical Clearance; Compliance & Eligibility Sports Medicine Form

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

AcuExchange Provider Manual

Clinical Optimization

Transcription:

Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA ABSTRACT Event dates stored in multiple rows pose many challenges that have typically been resolved by lengthy SAS code. This is particularly true if the earliest start date, the final end date and the total number of event days must all be determined. This paper examines how to use a RETAIN statement in conjunction with FIRST.var and LAST.var processing to find and keep the earliest start date and final end date for an event. At the same time, DO loops and IF/THEN statements are incorporated to track the number of days the event occurred as well as any gap days, i.e. days between the start and end date on which the event did not occur. All this information can be obtained through one easy to use and understand data step. INTRODUCTION In the health care field new data is collected and added to existing data on a daily basis. As healthcare data analysts we often need to look through many rows of data to find the beginning, end and duration of an event. The challenge is compounded since rows may represent different time periods. In the case of enrollment data, each month of enrollment is represented by one row. Locating a row that shows a member s enrollment began in January and ended in December is a good start, but what was the actual duration of enrollment 45 days or 365 days between the start and end dates? Hospital admissions represent a similar situation. Each new admission is represented by a separate row of data. Multiple rows must be evaluated if we want to know the total number of days spent as an inpatient in a specified time period. For pharmacy data we may be asked to calculated medication regime adherence with multiple rows of prescription information. By collapsing overlapping event dates and chaining together abutting event dates that are stored in different rows we are able to pinpoint the earliest start and latest end dates and summarize the total number of enrollment months, inpatient days or days a patient had a medication. Although the examples outlined in this paper are health care related, we believe this data step is versatile enough to be used in many different applications. THE DATA The sample data below replicates member enrollment data. Each row is one enrollment segment. Enrollment segments usually begin on the first day of the month and end on the last day of the month. However, irregular rows that may have dates that overlap the previous enrollment segment or have gaps between the end of the previous segment and the beginning of the following segment can occur when certain changes to a member s information takes place. In the sample, we have limited the data to four key variables. The variable mem_num is a unique combination of 16 numerals for each member. The variable model indicates the type of coverage a member has. Model may remain constant or change from month to month. The variable cov_eff_dt is the date coverage became effective for each enrollment period. The variable cov_term_dt is the date coverage terminated for each enrollment period. mem_num model cov_eff_dt cov_term_dt 4440888130000001 HMO 06/01/2007 06/10/2007 4440888130000001 HMO 06/11/2007 06/30/2007 4440888130000001 PPO 06/18/2007 06/30/2007 4440888130000001 PPO 07/01/2007 07/31/2007 4440888130000001 PPO 08/01/2007 08/31/2007 4440888130000001 HMO 09/01/2007 09/30/2007 4440888130000001 HMO 10/01/2007 10/19/2007 4440888130000001 HMO 10/01/2007 10/31/2007 4440888130000001 HMO 11/01/2007 11/30/2007 4440888130000001 HMO 12/01/2007 12/02/2007 4440888130000001 HMO 12/03/2007 12/31/2007 4440888130000001 HMO 01/01/2008 01/12/2008 4440888130000001 HMO 01/15/2008 01/31/2008-1 -

PREPARE DATA BY SORTING It is critical that the data be sorted by the grouping variables (the variables that will be used in the BY statement in the data step) and the event dates before the chaining data step is implemented. For all grouping variables the dates must be in chronological order or the output data set will be incorrect. Here are two examples of how different sort orders will affect the output data set: Example A: Sorting by mem_num (as the grouping variable) and cov_eff_dt and cov_term_dt PROC SORT DATA = NESUG_test_enroll OUT = temp (KEEP = mem_num cov_eff_dt cov_term_dt); BY mem_num cov_eff_dt cov_term_dt; mem_num cov_eff_dt cov_term_dt 4440888130000001 06/01/2007 06/10/2007 4440888130000001 06/11/2007 06/30/2007 4440888130000001 06/18/2007 06/30/2007 4440888130000001 07/01/2007 07/31/2007 4440888130000001 08/01/2007 08/31/2007 4440888130000001 09/01/2007 09/30/2007 4440888130000001 10/01/2007 10/19/2007 4440888130000001 10/01/2007 10/31/2007 4440888130000001 11/01/2007 11/30/2007 4440888130000001 12/01/2007 12/02/2007 4440888130000001 12/03/2007 12/31/2007 4440888130000001 01/01/2008 01/12/2008 4440888130000001 01/15/2008 01/31/2008 Example B: Sorting by mem_num and model (as the grouping variables) and cov_eff_dt and cov_term_dt PROC SORT DATA = NESUG_test_enroll OUT = temp (KEEP = mem_num model cov_eff_dt cov_term_dt); BY mem_num model cov_eff_dt cov_term_dt; RUN; mem_num model cov_eff_dt cov_term_dt 4440888130000001 HMO 06/01/2007 06/10/2007 4440888130000001 HMO 06/11/2007 06/30/2007 4440888130000001 HMO 06/18/2007 06/30/2007 4440888130000001 HMO 10/01/2007 10/19/2007 4440888130000001 HMO 10/01/2007 10/31/2007 4440888130000001 HMO 11/01/2007 11/30/2007 4440888130000001 HMO 12/01/2007 12/02/2007 4440888130000001 HMO 12/03/2007 12/31/2007 4440888130000001 HMO 01/01/2008 01/12/2008 4440888130000001 HMO 01/15/2008 01/31/2008 4440888130000001 PPO 07/01/2007 07/31/2007 4440888130000001 PPO 08/01/2007 08/31/2007 4440888130000001 PPO 09/01/2007 09/30/2007 For this example, if we sort by both mem_num and model the cov_eff_dt and cov_term_dt dates will no longer be in chronological order. The final output will calculate the total enrolled days, gap days and breaks respectively for the rows where model equals HMO. However, the rows where model equals PPO will not be counted correctly. Make sure that only keep the variables needed for the chain in the output dataset after sorting. Every variable in the sorted output data set will affect the final results. Depending on the project and the desired output, knowing how long someone was enrolled by some other variable in the data set such as model may be important. In Appendix A we have provided and example of how to calculate enrollment by model and mem_num. - 2 -

Once the data is correctly sorted, here are a few possible scenarios that may be encountered: 1. The enrollment segment in one row overlap the segment in another row. 2. The dates of one enrollment row abut the dates in the next enrollment row. 3. Enrollment dates in one row are within the dates of another row. 4. There are gap days between enrollment segments. 5. There is only one enrollment segment for a particular member. CHAIN THE DATA Now that the data are sorted we are ready to begin evaluating the relationship of the dates in each row and get down to answering the question of when coverage began and ended and how many days each member was enrolled between the earliest cov_eff_dt and latest cov_term_dt. First we name the output data set that will contain all the chained enrollment information for each member. We also DROP variables from the output data set that will not contain summary level information. DATA Mbr_Enroll (DROP = cov_eff_dt cov_term_dt newstart); SET temp; Next we use a BY statement to initialize FIRST.var and LAST.var processing. The order of variable(s) in this BY statement must be consistent with the by grouping variable(s) in the previous PROC SORT statement. Otherwise your final output dataset may contain misleading results. BY mem_num; A RETAIN statement is used to hold the value of variables over different iterations of the data step. We will also need to create new variables and retain their values in order to make comparisons over multiple rows. See the bullets below for an explanation of the function of the newly created variables. RETAIN startelig newstart endelig gap break tot_mem_days; startelig: the earliest cov_eff_dt found for each member newstart: records the cov_eff_dt when a gap exists between enrollment segments endelig: the last cov_term_dt found for each member gap: contains the total number of days for which the member did not have coverage between the cov_term_dt and cov_end_dt. break: contains the number of times gap days appeared between enrollment segments tot_mem_days: cumulating the total days of enrollment for a group FIRST.mem_num identifies the first row for the member. Set the initial value of gap and break equal to zero to indicate no gaps or breaks have occurred yet. Set startelig equal to the cov_eff_dt; sorting the data in the prior to the data step assures us that the first row for a member contains the earliest enrollment start date. In this step newstart is also set to the cov_eff_dt, although it may be changed in later statements within the data step. For now endelig is set to the cov_term_dt as it is the last enrollment termination date we have up to this point. IF FIRST.mem_num THEN DO; tot_mem_days=0; gap=0; break=0; startelig=cov_eff_dt; newstart=cov_eff_dt; endelig=cov_term_dt; This statement resolves scenarios where one enrollment segment overlaps the previous but has a greater cov_term_dt than the previous enrollment segment. We will keep all the start dates we set in the previous step but we will keep the new cov_term_dt. THEN ELSE IF cov_eff_dt>=newstart and cov_eff_dt<= endelig and cov_term_dt>endelig endelig=cov_term_dt; - 3 -

When the two enrollment segments abut each other then we keep all the start dates set-up in the first IF statement and reset the endelig to the cov_term_dt. ELSE IF cov_eff_dt = endelig+1 THEN endelig=cov_term_dt; NOTE: If the cov_eff_dt and cov_term_dt in one row are inside the cov_eff_dt and cov_term_dt of another row, then we need to eliminate the row. From a programming standpoint we do not need to do anything to accomplish this task. This scenario will not meet any of the criteria set forth in the statements in this data step and is effectively deleted. When there are gap days between enrollment segments, we need to: 1. Reset all of our newly created variables except startelig 2. Add days between cov_eff_dt and endelig to any previously calculated gap days 3. Add one to the count of break 4. Update newstart and endelig to current cov_eff_dt and cov_term_dt. NOTE: Always calculate gap and breaks before resetting newstart and endelig. ELSE IF cov_eff_dt>endelig+1 THEN DO; gap = gap+ (cov_eff_dt-endelig-1); break+1; newstart=cov_eff_dt; endelig=cov_term_dt; Each row will be compared to the retained enrollment information until LAST.mem_num is reached. LAST.mem_num is the last enrollment row for that member. Once the last record is reached, we will calculate the total number of enrollment days minus any gap days. if last.mem_num then tot_mem_days=endelig-startelig -gap +1; Only the last row for each member will be outputted to the data set. We will also format our newly created date fields which would be SAS dates in our new data set if we did not format them. if last.mem_num; format startelig endelig mmddyy8.; run; Depending on what information we ve been asked to report, we may want to keep enrollment segments for which there are gaps between in separate rows for each member or we may want to summarize all the enrollment information into one row for each member. The SAS codes in Appendix A will provide solutions for different real world requests. CONCLUSION The data set created as a result of this data step will contain the summarized enrollment information for each member. This data can be used in many different ways, including reporting the average length of enrollment of members in this population or for selecting members for inclusion in a study population. Usually the variables that need to be chained are date/time variables. However, it is possible to use the same logic to chain character variables together. For instance, medical procedures can be chained by the order of treatment for a certain disease. The logic behind this small piece of code is straightforward and can easily be applied to many different scenarios. It will run smoothly in Window and UNIX platform. Just be sure that you sort the data properly, before you start to chain things together. - 4 -

ACKNOWLEDGEMENTS Special thanks to Jean Murphy and Michele DiPaulo for their invaluable feedback on the paper content and John Dawson for his consultation and help developing this code. CONTACT INFORMATION We welcome any comments or questions. Please contact the authors at: Libing Shi e-mail: Libing.Shi@bcbsma.com Virginia Rego e-mail: Virginia.Rego@bcbsma.com Blue Cross Blue Shield of Massachusetts 401 Park Drive Boston, MA 02215 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. - 5 -

APPENDIX %macro chain (indata=, /*name of input SAS dataset*/ start_dt=, /*start date of the event*/ end_dt=, /*end date of the event*/ outdata=, /*name of output SAS dataset*/ bygrpvar=, /*variable names for grouping the chain*/ test=) /*a test pointer to decide the output data contents*/ ; /*Sort the data before chainning*/ Proc sort data = &indata. out = temp (keep = &bygrpvar. &start_dt &end_dt); by &bygrpvar. &start_dt &end_dt; run; %let lastgrpvar=%scan(&bygrpvar.,-1); /*the last variable of the grouping*/ %IF &test= summary %THEN %DO; DATA &outdata._&test. (drop = &start_dt &end_dt newstart); set temp; by &bygrpvar.; retain startelig newstart endelig gap break tot_mem_days; IF FIRST.&lastgrpvar. THEN DO; tot_mem_days=0; gap=0; break=0; startelig=&start_dt; newstart=&start_dt; ELSE IF &start_dt>=newstart and &start_dt<= endelig and &end_dt>endelig THEN ELSE IF &start_dt = endelig+1 THEN ELSE IF &start_dt>endelig+1 THEN DO; gap = gap+ (&start_dt-endelig-1); break+1; newstart=&start_dt; if last.&lastgrpvar. then tot_mem_days=endelig-startelig -gap +1; if last.&lastgrpvar.; format startelig endelig mmddyy8.; % %ELSE %IF &test=detail %THEN %DO; DATA &outdata._&test. (drop = &start_dt &end_dt); set temp; by &bygrpvar.; retain startelig endelig gap break tot_mem_days; IF FIRST.&lastgrpvar. THEN DO; tot_mem_days=0; - 6 -

gap=0; break=0; startelig=&start_dt; DO; ELSE IF &start_dt>=startelig and &start_dt<= endelig and &end_dt>endelig THEN tot_mem_days=endelig-startelig +1; ELSE IF &start_dt = endelig+1 THEN DO; tot_mem_days=endelig-startelig +1; ELSE IF &start_dt>endelig+1 THEN DO; gap = &start_dt-endelig-1; break=1; startelig=&start_dt; tot_mem_days=endelig-startelig +1; format startelig endelig mmddyy8.; proc sort; by &bygrpvar. startelig descending endelig; proc sort nodupkey; by &bygrpvar. startelig; % run; %mend; %chain (indata=nesug_test_enroll, start_dt=cov_eff_dt, end_dt=cov_term_dt, outdata=enroll, bygrpvar=mem_num, test=summary); %chain (indata=nesug_test_enroll, start_dt=cov_eff_dt, end_dt=cov_term_dt, outdata=enroll2, bygrpvar=mem_num model, test=summary); %chain (indata=nesug_test_enroll, start_dt=cov_eff_dt, end_dt=cov_term_dt, outdata=enroll2, bygrpvar=mem_num model, test=detail ); - 7 -