Combining Contiguous Events and Calculating Duration in Kaplan-Meier Analysis Using a Single Data Step

Similar documents
ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

PharmaSUG Paper CC11

Cleaning up your SAS log: Note Messages

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

Speed Dating: Looping Through a Table Using Dates

Deriving Rows in CDISC ADaM BDS Datasets

Interactive Programming Using Task in SAS Studio

How to write ADaM specifications like a ninja.

PharmaSUG2014 Paper DS09

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Step through Your DATA Step: Introducing the DATA Step Debugger in SAS Enterprise Guide

Interleaving a Dataset with Itself: How and Why

It s Proc Tabulate Jim, but not as we know it!

Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

One Project, Two Teams: The Unblind Leading the Blind

Chapter 6: Modifying and Combining Data Sets

PhUSE Paper CC03. Fun with Formats. Sarah Berenbrinck, Independent, UK

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD

Not Just Merge - Complex Derivation Made Easy by Hash Object

Checking for Duplicates Wendi L. Wright

PharmaSUG Paper PO12

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

Automating Preliminary Data Cleaning in SAS

The Dataset Diet How to transform short and fat into long and thin

DATA Step Debugger APPENDIX 3

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

PharmaSUG China Paper 70

60-265: Winter ANSWERS Exercise 4 Combinational Circuit Design

Omitting Records with Invalid Default Values

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania

How to clean up dirty data in Patient reported outcomes

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

T.I.P.S. (Techniques and Information for Programming in SAS )

Beginner Beware: Hidden Hazards in SAS Coding

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

SAS System Powers Web Measurement Solution at U S WEST

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

PharmaSUG 2014 PO16. Category CDASH SDTM ADaM. Submission in standardized tabular form. Structure Flexible Rigid Flexible * No Yes Yes

JMP Clinical. Release Notes. Version 5.0

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

PhUSE Giuseppe Di Monaco, UCB BioSciences GmbH, Monheim, Germany

The Path To Treatment Pathways Tracee Vinson-Sorrentino, IMS Health, Plymouth Meeting, PA

Know What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data

One-Step Change from Baseline Calculations

A Three-piece Suite to Address the Worth and Girth of Expanding a Data Set. Phil d Almada, Duke Clinical Research Institute, Durham, North Carolina

SAS Macros of Performing Look-Ahead and Look-Back Reads

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

An Efficient Tool for Clinical Data Check

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania Kim Minkalis, Accenture Life Sciences, Wayne, Pennsylvania

Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Remember to always check your simple SAS function code! Yingqiu Yvette Liu, Merck & Co. Inc., North Wales, PA

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

Migration to SAS Grid: Steps, Successes, and Obstacles for Performance Qualification Script Testing

ODS/RTF Pagination Revisit

Programming and Data Structures Prof. N.S. Narayanaswamy Department of Computer Science and Engineering Indian Institute of Technology, Madras

Please Don't Lag Behind LAG!

Quality Control of Clinical Data Listings with Proc Compare

Taming a Spreadsheet Importation Monster

The Proc Transpose Cookbook

Main challenges for a SAS programmer stepping in SAS developer s shoes

Ranking Between the Lines

Indenting with Style

In this paper, we will build the macro step-by-step, highlighting each function. A basic familiarity with SAS Macro language is assumed.

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

Detecting Treatment Emergent Adverse Events (TEAEs)

PharmaSUG Paper TT11

What you learned so far. Loops & Arrays efficiency for statements while statements. Assignment Plan. Required Reading. Objective 2/3/2018

Customizing SAS Data Integration Studio to Generate CDISC Compliant SDTM 3.1 Domains

THE OUTPUT NONMEM DATASET - WITH ADDL RECORDS

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

Standard Safety Visualization Set-up Using Spotfire

Automate Clinical Trial Data Issue Checking and Tracking

How to review a CRF - A statistical programmer perspective

Embedded Systems Design Prof. Anupam Basu Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc.

Programming checks: Reviewing the overall quality of the deliverables without parallel programming

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Breaking up (Axes) Isn t Hard to Do: An Updated Macro for Choosing Axis Breaks

Programming, Data Structures and Algorithms Prof. Hema Murthy Department of Computer Science and Engineering Indian Institute Technology, Madras

ABSTRACT INTRODUCTION SIMPLE COMPOSITE VARIABLE REVIEW SESUG Paper IT-06

Tools to Facilitate the Creation of Pooled Clinical Trials Databases

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Scrambling of Un-Blinded Data without Scrambling Data Integrity! Jaya Baviskar, Pharmanet/i3, Mumbai, India

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

The Power of Combining Data with the PROC SQL

ABSTRACT. The SAS/Graph Scatterplot Object. Introduction

Advanced Reporting Tool

Paper CC06. a seed number for the random number generator, a prime number is recommended

Design and Analysis of Algorithms

Effectively Utilizing Loops and Arrays in the DATA Step

Reducing SAS Dataset Merges with Data Driven Formats

This Text file. Importing Non-Standard Text Files using and / Operators.

SAS Scalable Performance Data Server 4.3

Transcription:

Combining Contiguous Events and Calculating Duration in Kaplan-Meier Analysis Using a Single Data Step Hui Song, PRA International, Horsham, PA George Laskaris, PRA International, Horsham, PA ABSTRACT Many studies require contiguous events (e.g., events one or two days apart) being combined before survival analysis. Since individual events overlap in many different ways, it is challenging to effectively (1) combine the contiguous events, (2) calculate their duration (and other time-to-event parameters), and (3) output the combined events only to a dataset. In this work, we present a clean, single-data-step approach that can address the three challenges efficiently. A sample scenario is used to illustrate the process. INTRODUCTION Kaplan-Meier (KM) analysis, or survival analysis, represents a set of statistical methods that estimate lifetime or length of time between two events of a given event-of-interest (or EOI). Survival data are often analyzed in terms of time-to-event parameters, such as time-to-first-event (or onset), time-to-resolution, duration, etc. If there is at least one event as specified in a given EOI, the days to the earliest on-study event will be the time-to-first-event. Note that, if there are no events for the given EOI, time-to-first-event will be the time to the censor date per statistical analysis plan (or SAP). Duration is the difference between the (imputed) start date and the (imputed) end date. If an event does not have an imputed end (or start) date, it should be censored per SAP. For time-to-resolution, only those events that cross the last dose date (i.e., start before the last dose date and end after) are of concern. The calculation of time-to-event parameters is straightforward before combining contiguous events is required for KM analysis. Contiguous events are events that are a short time period apart (e.g., one or two days) defined per SAP. Combining the contiguous events before KM analysis is required in many studies. Given this requirement, the derivation of time-to-event parameters becomes extremely challenging due to the following reasons. First, a given EOI may have hundreds of AE events. In addition, the individual events may overlap in many different ways. One needs to differentiate contiguous events from those that are not so that the combination can be done correctly. Second, calculating the time-to-event parameters (such as duration) on the fly will be difficult given all the different combinations of events. Efficiently outputting the combined events onthe-fly is the third challenge. In this abstract, we present a clean, efficient, single-data-step approach that can combine the contiguous events and calculate the time-to-event parameters on the fly. Especially, we will show how to design and implement the proposed scheme with robustness and efficiency in mind. DESCRIPTION OF METHOD Several approaches can use to address the issues we mentioned above. The straightforward one is to go through all the events (for a given EOI) multiple rounds. For each round, adjacent events that are contiguous will be combined into one. The process continues until no more combination can be done. Some preliminary tests showed that, in order to combine all the events correctly for our study, it might take up to 30 rounds for the algorithm to converge. In this sense, the algorithm s running time is 30*n (where n is the total number of events). This simple solution becomes inefficient when the number of AE events increases to thousands or more. A second approach is to combine the events in groups of two in the first round. In the next round, we combine the two combined-events in the previous round, so on and so forth until the log(n)-th round, in which the algorithm converges. A simple illustrate can be seen below.

Round 1: (1 2) (3 4) (5 6) (7 8) (n-3 n-2) (n-1 n) Round 2: (1 2 3 4) (5 6 7 8) (n-3 n-2 n-1 n) The running time will be n. However, it will be hard to implement due to how data are processed in the data step. (Due to space limitations, more detailed discussion and results are not included in this abstract). In this abstract, we present a method that has a linear running time of n, utilizing SAS s features such as RETAIN statement. In addition, it combines and outputs the combined contiguous events on-the-fly. Two critical techniques used here are RETAIN statement and a look-ahead mechanism. The RETAIN statement is used to maintain some flags across the observations (during a data step), telling SAS whether to check for combination or to output an observation selectively at the end of the data step iteration. The second technique is a look-ahead mechanism. By looking-ahead (to get the AE start and end date of the next event, etc.), the method can make decisions on setting the flags or updating the parameters of the combined events (such as start and end date) when necessary to facilitate the on-the-fly processing. Here is a brief description of the look-ahead mechanism. The one we adapted (SASCOMMUNITY.ORG, 2011) can be described as follows. 1 data ae1; 2 set ae; 3 by subjid; 4 retain ; 5 set ae (firstobs=2 keep=aestdti aeendti 6 rename=(aestdti=next_aestdti aeendti=next_aeendti) ) 7 ae (obs=1 drop=_all_); 8 next_aestdti=ifn(last.subjid, (.), next_aestdti); 9 next_aeendti=ifn(last.subjid, (.), next_aeendti); 10 run; In the DATA step above, we have two SET statements (Line 2 and Line 5), all reading from the same data sets but different observations. In each iteration of the DATA step, the first SET statement (Line 2) reads one observation. The second SET statement (Line 5) has two input data sets. The first one (Line 5) reads the following observation (in Line 2) by specify data set option firstobs=2. In addition, aestdti and aeendti are renamed to next_aestdti and next_aeendti, respectively. The next time the first SET statement is executed, it reads the next observation, and so on. For the last iteration, while reading the very last observation (Line 2), no observation will be left for the data set on Line 5. Note that, the data step ends whenever any of the data sets (in any SET statement) reaches the last record. As a result, SAS will quit the data step without outputting this very last observation (Line 2). The second data set on Line 7 is to make sure that the very last observation is output as well. The look-ahead mechanism makes it possible to decide whether we need to combine with the next event ahead of time. However, the variety of event overlapping makes it hard to decide when the combination is done (so that we can output it). If one overlapping scenario is not considered, the combination may be totally wrong. In addition, since we intend to do it on the fly in one single data step, it will be hard to debug unless we design the algorithm robustly, together with tracking flags for correctness verification. Due to all these considerations, we follow a rigid algorithm design process to implement our method. The process includes four steps: problem statement, algorithm outline, algorithm design, and algorithm implementation. In the following, we describe each step in more detail, followed by a sample scenario.

1. PROBLEM STATEMENT As mentioned above, the problem to solve is to effectively (1) combine the contiguous events, (2) calculate their duration (and other time-to-event parameters), and (3) output the combined events only to a dataset, all on the fly. The following two examples illustrate a subset of the event overlapping scenarios and what the expected results are. Table 1 shows two events after combination: Event 1 consists of A + B + C, where the start date of B is less than two days from the end date of A (similar for C). Event 2 consists of D, which occurs 2 or more days after the end date of C and is therefore a separate event. Table 1. Example Scenario 1 Original Event Study day Event A Event 1 Study Day 1 2 3 4 5 6 7 8 9 A Event B Event 1 Event C Event 1 Event D Event 2 B C D In Table 2, A and B should be combine into one event, Event 1, since they are overlapped. In addition, Event 1 is an unresolved event with duration calculate from minimum of (AE start date for event A, AE start date for event B) to maximum of (censored AE end date for event A, AE end date for event B). Table 2. Example Scenario 2 Original Event Combined Event Study Day 1 2 3 4 5 6 7 8 9 Event A Event 1 Event B Event 1 A B 2. ALGORITHM OUTLINE Fig. 1 shows the notations that we used in this abstract. Given the problem statement before, we sketch the algorithm as follows, which consists of five steps. The first three steps are data preparation, which should be done in a separate data step to adjust aestdti and aeendti per SAP. The sorted data set (referred as ae_sorted below) will then be fed to the algorithm we described in the next subsection (algorithm design). The design, implementation, and testing of the last two steps are the focus of the rest of the abstract.

The Algorithm Outline For all records flagged for an EOI category (e.g., ae.skdcdn=1) do the following: 1) If aestdti is missing then set aestdti to the first dose date. 2) If aeendti is missing or if the aeendti is after data cutoff then censor aeendti to the data cutoff date if the subject is still on treatment, otherwise set to 30 days after last study drug administration date. 3) Sort the records by subjid, aestdti. 4) Increment the event line number by 1, set the event start date to the aestdti of the record, set the event end date to the aeendti, and the event duration (kmdy) to the event end date minus the event start date plus 1. 5) If the difference between the event end date and the next aestdti is less than 2 and the aeendti is greater than the event end date then set the event end date to aeendti and the event duration (kmdy) to the difference between aeendti and the event start date plus 1; if the next aestdti is greater than the event end date by 2 or more then return to Step 4. 3. ALGORITHM DESIGN Given the outline of the algorithm above, we describe the last two steps in SAS pseudo code as below. Fig. 1 presents the notations used in the discussion below. Fig. 1. In each data step iteration, we RETAIN four flags. Notations a) censfln: event status, whether the combined event is a resolved or unresolved event (0: resolved 1: unresolved). If it contains any unresolved events, the com- skdcdn = skin disorder code (1: yes 0: no) ae_sorted = the sorted AE dataset bined event is treated as unresolved. Otherwise, it is fdosdt = first dose date resolved. ldosdt = last dose date b) fstdt: the start date of the first event among all events of a combined event. In other words, it is the minimum (aestdti) of all events within a combined one. aestdti aeendti kmstdti = imputed ae start date = imputed ae end date = start date of the combined ae c) lstdt: the maximum end date (aeendti) of all the event of a combined event. d) contfln: this is an important flag, which tell SAS whether try to combine the current event with the previous one. By default, it sets to be 1. It will be set to 0 if the combination will stop at the current observation. This flag is also used to decide which observations will be output. In this abstract, all observations with contfln=0 will be output (as a combined event). We will see more clearly how it is used in the algorithm pseudo code below. We introduced two auxiliary variables, next_aestdti and next_aeendti, to make it easier to compare the start and end date of two adjacent events in the data set. The core algorithm is presented in three separate figures, Fig. 2, 3, and 4, due to the page size limitation. The algorithm consists of four major components: RETAIN statement, Case 1, Case 2, and combined event output. The algorithm is written in SAS pseudo code and is pretty self-explained. Here we summarize each of them and discuss potential issues that need carefully consideration. Fig. 2 shows the first two components, which are simple. The RETAIN statement retains the four flags we stated above when the data step goes through iterations. CASE 1 handles the situation where a subject has only one event (or record). In such a case, contfln is set to be zero since the next event should not be combined with the current one. The censfln is set to zero since this is a resolved event. CASE 2, the most complex part, is presented in Fig. 3 and Fig. 4. It is divided into three conditions. Note all observations (events) need to be check for Condition 1. Condition 2 and 3 are mutual exclusive. In other words, one observation will fit in either Condition 2 or Condition 3, both not both. kmendti kmdy = end date of the combined ae = event duration Fig. 2 Algorithm Pseudo Code (part 1 of 3) DATA STEP BEGIN; SET ae_sorted; *the sorted AE dataset; RETAIN censfln fstdt lstdt contfln; CASE 1: the subject has only one record if first.subjid and last.subjid then do; *no event combination is needed; contfln=0; censfln=0; * set flags; *end of Case 1; 4

Fig. 3 Algorithm Pseudo Code (part 2 of 3) DATA STEP (continued); CASE 2: if the subject has more than one record Condition 1: first record of a subject, reset flags if first.subjid then do; contfln=1; *set continued flag to 1; fstdt=aestdti; lstdt=aeendti; *initialization; censfln=0; *set resolved flag to 0 (resolved); Condition 2: check whether should be combined if contfln=1 and not last.subjid then do; a. should combine with next observation if (next_aestdti-kmendti<2 or next_aestdti<=lstdt+1) then do; update fstdt/lstdt; update kmstdti/kmendti; if aeendti<=next_aeendti then kmendti=next_aeendti; if next_aeendti>lstdt then lstdt=next_aeendti; b. should not combine with next observation else do; kmstdti=fstdt; if kmendti<lstdt then kmendti=lstdt; kmdy=kmendti-kmstdti+1; contfln=0; censfln=0; * set flags: aestdti/aeendti --> fstdt/lstdt; *end of Condition 2; Fig. 4 Algorithm Pseudo Code (part 3 of 3) DATA STEP (continued); CASE 2: (continued) Condition 3: event combination may not needed if contfln=0 or last.subjid then do; a. if contfln=1 and last.subjid then do; fstdt/lstdt --> kmstdti/kmendti; kmdy=kmendti-kmstdti+1; contfln=0; censfln=0; *set flags; aestdti/aeendti --> fstdt/lstdt; b. if (not last.subjid and contfln=0) and (next_aestdti-kmendti<2 or next_aestdti <= lstdt+1) then do; *should be combined; contfln=1; *set continued flag to 1; fstdt/lstdt-->aestdti/aeendti; censfln=0; *set resolved flag to 0 (resolved); kmstdti/kmendti-->fstdt/lstdt if aeendti<=next_aeendti then kmendti=next_aeendti; if next_aeendti>lstdt then lstdt=next_aeendti; c. else do; kmstdti=fstdt; if kmendti<lstdt then kmendti=lstdt; kmdy=kmendti-kmstdti+1; contfln=0; censfln=0; * set flags: aestdti/aeendti --> fstdt/lstdt; *end of Condition 3; *end of Case 2; OUTPUT THE COMBINED EVENTS if contfln=0; DATA STEP END; 5

The three conditions for CASE 2 are listed below: Condition 1: first record of a subject, reset flags Condition 2: check whether should be combined Condition 3: event combination may not needed Condition1 handles the situation where the observation is the first event for a new subject. Since we have a new subject now, all the flags need to be reset appropriately. contfln is set to 1. By default, we assume combination is needed. fstdt and lstdt keep the minimum aestdti and maximum aeendti seen by far. It is used for checking whether event combination is necessary in the iteration process. They are initialized to the aestdti and aeendti of the first event of a new subject. Finally, we set censfln to be 0, assuming resolved. Note the first observation of a new subject should still be checked for Condition 2 or Condition 3. Condition 2 describes the situation where the current observation should be combined with the previous event. We will also check whether it should be combined with the next event. The checking results two branches in the program (branch a and b as seen in the Fig. 3). In Branch a, we need to update fstdt, lstdt, kmstdti, and kmendti, respectively. The latter two (kmstdti and kmendti) are what we kept in the final output for the start and end date of the combined event. Thus, they need to keep earliest start date and latest end date of the contiguous events. The former two should be updated as well so that the combination can be done correctly if combining is still needed for the next event (the one we look-ahead at the current observation). In Branch b, we prepare the current observation for final output, since this is the last event of a series of contiguous events. The kmstdti and kmendti are set accordingly (kmstdti=fstdt; if kmendti<lstdt then kmendti=lstdt). kmdy, the duration, is calculated given kmstdti and kmendti. Then we set contfln to be 0 should it will be output at the end of the data step. Finally, we reset the rest of the flags, as in CASE 1, except contfln. The reason that contfln is not reset to 1 is because it is used for two purposes. First, it is used to signal whether we should combine with the previous event (note contfln is retained). Second, it is also used for output: if 0 output the observation. Otherwise, do not output since more combination may be needed. Bear this in mind as we proceed to Condition 3. You will see we need to check and take care of contfln flag carefully. Now let us look at Condition 3, which has three branches. Branch a is the case where it s the last observation of a given subject. Thus, no further combination is possible. It is processed as in the second branch of CASE 2. Branch b is for the case where the event is the first event for a possible new contiguous event series. The current value of contfln is zero because the previous event is the last observation of a contiguous event and contfln is set to be zero for output. That is also why in this branch we need to reset contfln to be one again. In some sense, this branch is equivalent to Condition 1 and Branch a of Condition 2. Finally, Branch c handles the rest of the situation, where the event should be set for output, as before. The last component of the algorithm is for outputting. If the contfln flag is set to be zero, the observation will be output to the final combined event dataset. It is this flag (contfln) that makes it possible to combine and output combined events on the fly. It is also this flag that needs carefully handling as we seen in CASE 2. 4. ALGORITHM IMPLEMENTATION Our implementation is done in SAS 9.1.3. Nevertheless, the algorithm applies to any SAS versions. Given the pseudo code above, the implementation of the algorithm in SAS is straightforward and will not be discussed further. 6

THE SAMPLE SCENARIO AND RESULTS Table 3 shows the AE events for a given EOI (skin disorder) for a subject. We will use this sample scenario to illustrate the presentation of our algorithm. Table 3. Sample AE Event Data for Skin Disorder Event SUBJID FDOSDT LDOSDT AESTDTI AEENDTI SKDCDN 1 ABC-XYZ-001 30-Jan-07 22-May-07 6-Feb-07 12-Feb-07 1 2 ABC-XYZ-001 30-Jan-07 22-May-07 12-Feb-07 2-Apr-07 1 3 ABC-XYZ-001 30-Jan-07 22-May-07 19-Mar-07 16-Apr-07 1 4 ABC-XYZ-001 30-Jan-07 22-May-07 2-Apr-07 2-Apr-07 1 5 ABC-XYZ-001 30-Jan-07 22-May-07 2-Apr-07 16-Apr-07 1 6 ABC-XYZ-001 30-Jan-07 22-May-07 9-Apr-07 16-Apr-07 1 7 ABC-XYZ-001 30-Jan-07 22-May-07 29-Apr-07 7-May-07 1 Fig. 5 is an illustration of the events to be combined. As can be seen, the seven events should be combined into two events. Fig. 5. AE Events in Timeline Event 2/1 2/6 2/12 3/1 3/19 4/2 4/9 4/16 4/29 5/7 1 2 3 4 5 6 7 Table 4 shows the results with all flags information kept for illustration (note, subjid is not displayed). Table 4. Contiguous Event Combination Results Event KMSTDTI KMENDTI KMDY CENSFLN CONTFLN NEXT_AESTDTI NEXT_AEENDTI 1 2/6/2007 2/12/2007 7 0 1 12-Feb-07 2-Apr-07 2 2/12/2007 4/2/2007 50 0 1 19-Mar-07 16-Apr-07 3 3/19/2007 4/16/2007 29 0 1 2-Apr-07 2-Apr-07 4 4/2/2007 4/2/2007 1 0 1 2-Apr-07 16-Apr-07 5 4/2/2007 4/16/2007 15 0 1 9-Apr-07 16-Apr-07 6 2/6/2007 4/16/2007 70 0 0 29-Apr-07 7-May-07 7 4/29/2007 5/7/2007 9 0 0.. According to our algorithm, only the last two rows (highlighted) will be output (where contfln=0). 7

CONCLUSIONS In this abstract, we presented our one-data-step process that can merge contiguous events and calculate duration on the fly and output those combined events only. We showed a four-step approach to design and implement the algorithm in a robust way. In the algorithm, we use one time-to-event parameter, duration, to illustrate our idea. In fact, other time-to-event parameters can also be included in the calculation when necessary (such as event free days that should be subtracted from the duration). We also used a sample scenario to illustrate our algorithm. The algorithm has been proved to be efficient and robust in our successfully finished study. Note that, there are many ways to combine the contiguous events. This abstract just showed one of them. REFERENCES: SASCOMMUNITY.ORG, Look-Ahead and Look-Back, http://www.sascommunity.org/wiki/look- Ahead_and_Look-Back (accessed August 21, 2012) ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Hui Song PRA International Inc. 630 Dresher Road Horsham, PA 19044 Work Phone: 215-444-8583 Email: SongHui@PRAintl.com George Laskaris PRA International Inc. 630 Dresher Road Horsham, PA 19044 Work Phone: 215-444-8575 Email: LaskarisGeorge@PRAIntl.com * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 8