Keeping Track of Database Changes During Database Lock

Similar documents
PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Global Checklist to QC SDTM Lab Data Murali Marneni, PPD, LLC, Morrisville, NC Sekhar Badam, PPD, LLC, Morrisville, NC

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Automate Clinical Trial Data Issue Checking and Tracking

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

A SAS Macro to Create Validation Summary of Dataset Report

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

An Efficient Tool for Clinical Data Check

PharmaSUG Paper TT11

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Clinical Data Visualization using TIBCO Spotfire and SAS

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

OUT= IS IN: VISUALIZING PROC COMPARE RESULTS IN A DATASET

An Introduction to Visit Window Challenges and Solutions

Streamline SDTM Development and QC

PharmaSUG Paper IB11

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

One Project, Two Teams: The Unblind Leading the Blind

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Pharmaceuticals, Health Care, and Life Sciences

Create Metadata Documentation using ExcelXP

PharmaSUG Paper AD09

A Standard SAS Program for Corroborating OpenCDISC Error Messages John R Gerlach, CSG, Inc.

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Validation Summary using SYSINFO

SAS as a Tool to Manage Growing SDTM+ Repository for Medical Device Studies Julia Yang, Medtronic Inc. Mounds View, MN

Taming the Box Plot. Sanjiv Ramalingam, Octagon Research Solutions, Inc., Wayne, PA

Data Quality Review for Missing Values and Outliers

Run your reports through that last loop to standardize the presentation attributes

Conversion of CDISC specifications to CDISC data specifications driven SAS programming for CDISC data mapping

Using PROC SQL to Generate Shift Tables More Efficiently

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

Fall 2012 OASUS Questions and Answers

Advanced Visualization using TIBCO Spotfire and SAS

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Go Compare: Flagging up some underused options in PROC COMPARE Michael Auld, Eisai Ltd, London UK

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

What's the Difference? Using the PROC COMPARE to find out.

SAS (Statistical Analysis Software/System)

PharmaSUG Paper CC02

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Developing Data-Driven SAS Programs Using Proc Contents

ODS/RTF Pagination Revisit

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

How to write ADaM specifications like a ninja.

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

Tracking Dataset Dependencies in Clinical Trials Reporting

PharmaSUG Paper PO12

Using SAS Macros to Extract P-values from PROC FREQ

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

PharmaSUG DS05

PharmaSUG Paper AD06

SAS Linear Model Demo. Overview

3N Validation to Validate PROC COMPARE Output

Taming a Spreadsheet Importation Monster

T.I.P.S. (Techniques and Information for Programming in SAS )

Deriving Rows in CDISC ADaM BDS Datasets

Generating Variable Attributes for Define 2.0

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

PharmaSUG 2013 Paper DS-02

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

PharmaSUG Paper DS-24. Family of PARAM***: PARAM, PARAMCD, PARAMN, PARCATy(N), PARAMTYP

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Checking for Duplicates Wendi L. Wright

%check_codelist: A SAS macro to check SDTM domains against controlled terminology

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA

Tools to Facilitate the Creation of Pooled Clinical Trials Databases

Don t Get Blindsided by PROC COMPARE Joshua Horstman, Nested Loop Consulting, Indianapolis, IN Roger Muller, Data-to-Events.

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

A Taste of SDTM in Real Time

Implementing external file processing with no record delimiter via a metadata-driven approach

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

What just happened? A visual tool for highlighting differences between two data sets

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

PharmaSUG China Paper 059

The Dataset Diet How to transform short and fat into long and thin

An Alternate Way to Create the Standard SDTM Domains

The Proc Transpose Cookbook

Automatically Configure SDTM Specifications Using SAS and VBA

Uncommon Techniques for Common Variables

Transcription:

Paper CC10 Keeping Track of Database Changes During Database Lock Sanjiv Ramalingam, Biogen Inc., Cambridge, USA ABSTRACT Higher frequency of data transfers combined with greater likelihood of changes to the database at the time of database lock necessitates the tracking of changes in the database. To ensure updates of only resolved data issues, datasets between the transfers must be compared. In this paper, a macro is presented that compares the datasets between the transfers and presents a report in the form of an excel spreadsheet. The first tab lists any datasets that are present in one transfer but not the other. Further processing is only done for datasets common to both the transfers. A second tab is created to present cases where there were differences in the number of records for that dataset between the transfers. Typically close to database lock, differences in number of records should be minimal if any at all. Subsequent tabs are created under the domain name if the values of any of the variables changed between each domain. Variables included in one transfer but not the other are covered in the same tab within the domain name. The macro is created taking advantage of %CALL EXECUTE and ODS tagsets and only the directory path of the transfer is required to run the macro. INTRODUCTION Greater number of data queries are raised and addressed at the time of database lock than at any other time of the clinical study. Most reporting at this stage of the clinical trial is for the clinical study report and hence critical to track database changes. To help programmers liaise with their data management counterparts and ensure that only agreed upon changes are reflected in the database it is helpful to have a tool that compares the current data transfer with the previous transfer. With heavily outsourced models it is highly recommended that such data comparisons be implemented to ensure quality. Both RAW and SDTM datasets should be compared as there have been experiences when no issues were reported with raw data but differences in SDTM data between transfers were found after un-blinding amongst SDTM variables that were unrelated to un-blinding. A macro was created specifically to aid this process. The macro creates an excel spreadsheet with different tabs. It is possible that a transfer may include or drop domains with respect to the previous transfer. To capture this scenario a tab called Missing Domain is created as illustrated below. Further processing is only done for datasets common to both the transfers. Another tab called N_mismatches is created for domains where there were differences in the number of records between the transfers as illustrated below. 1

Subsequent tabs are created under the domain name only if the values of any of the variables changed between the domain transfers as illustrated below for the AE tab. A PROC COMPARE was used to obtain the comparison report. The OUTDIF option was used to write an observation that contains the differences for each pair of matching observations with the affected variables amongst the records highlighted using XX in the report. In the highlighted record below, the XX is marked below the date 2015-06-22 indicating that the date was changed from 2015-06-17 to 2015-06-22. Current record Previous record Difference MACRO PARAMETERS Only four parameters are involved, three of which pertain to locations namely; CRTDIR= < directory location of current data transfer>, PREVDIR= <directory location of previous transfer>, OUTDIR= <directory location where report is to be saved> The fourth parameter is for variables that need to be dropped during PROC COMPARE. While this macro parameter is not required during SDTM comparisons it will be required when comparing raw data as raw data contains variables that are not collected in the ecrf nor mapped to SDTM format. DROPVAR= <variables to be dropped for comparison> METHODOLOGY There are two main parts to the macro. The first part deals with the creation of the first tab of the report where differences in the number of records between domains amongst the transfers are calculated. It is in the first part that a flag is created per domain if the numbers of records between the domains are equal. This flag serves as a trigger for the data comparison to be executed for those domains in the second part. Comparison will only be done between domains that had an equal number of records between the transfers. Though the entire macro has not been presented in the paper all core components and code to build one has been shared. PART ONE A list of all datasets in both the current and previous transfer directory is first obtained using PROC CONTENTS. For every domain a macro call is made to obtain the difference in the number of records amongst the transfers. The macro call is automated using CALL EXECUTE. A key feature integral to automating essentially involves making SAS write SAS code. This can be achieved by first creating a text string of the macro call and then invoking the macro call using CALL EXECUTE [1]. There are alternative ways of automating macro calls [2]. This section of the code is described below. /* get list of all datasets in current directory*/ proc contents data=&crtdir._all_ out=lst1(keep=memname) noprint; proc sort data=lst1 nodupkey; by memname; /* get list of all datasets in previous directory*/ proc contents data=&prevdir._all_ out=lst2(keep=memname) noprint; 2

proc sort data=lst2 nodupkey; by memname; /* determine datasets common to both transfers(lst) and only in either of the transfers. Datasets(lst) common to only both transfers will be used for determining mismatches in records or values */ data lst inlsta inlstb; merge lst1(in=a) lst2(in=b); by memname; if a and b then output lst; if a and not b then output inlsta; if b and not a then output inlstb; /* create dataset to highlight datasets in one transfer but not the other */ data misdom; length src $100; set inlsta(in=a) inlstb(in=b); if a then src='in current directory but not previous directory'; if b then src='in previous directory but not current directory'; /* create excel report for missing domains */ ods Tagsets.ExcelXP file="&outdir/report_&sysdate..xml" ; ods tagsets.excelxp options(sheet_name="missing domain"); title 'Domains missing between Transfers'; proc print data=misdom label; /* macro for determining difference in records per domain between transfers */ %macro allcmp1(inds=,nds=); /* sort individual domains using all variables available in the domain */ proc sort data=&prevdir.&inds out=src1; proc sort data=&crtdir.&inds out=src2; /* set both datasets together and id records from each domain */ data src; set src1(in=a) src2(in=b); if a then src="src1"; if b then src="src2"; /* determine counts for each datasets, transpose and then determine the difference in number of records */ proc freq data=src noprint; table src/out=cnt(keep=src count); proc transpose data=cnt out=nds; var count; id src; data nds; length domain $100; set nds; rename src1=old_n src2=new_n; domain="&inds"; difference=src2-src1; 3

if difference ne 0 then flag="y"; proc sql noprint; create table &nds as select domain,old_n,new_n,difference,flag from nds ; /* delete temporary datasets */ proc datasets library=work noprint; delete cnt src1 src2; %mend; /* create macro call by concatenating text required and assigning temporary datasets */ data x1; length allcmp1 $400; set lst end=eof; nds=cats('ds',_n_); allcmp1=cats('%allcmp1(inds=',memname,',nds=ds',_n_,')'); call execute(allcmp1); /* combine all datasets and split them based on differences =0(cmp) or >0 (nocmp) */ proc sql noprint; select nds into:nds_ separated by ' ' from x1; /* datasets that had same number of records will be saved to nocmp */ /* PROC COMPARE will only be done for domains that had same number of observations */ data cmp nocmp; set &nds_; if flag eq "Y" then output nocmp; else output cmp; PART TWO The second part involves the comparison (PROC COMPARE) between datasets that had the same number of observations between transfers. One of the key issues when developing this section of the macro was the sorting order used for data comparison. To solve this issue, the _ALL_ keyword was used and it worked for all domains except for laboratory data. It was the experience of the author that a separate sorting order was needed to make a useful comparison for laboratory data namely <SUBJID LBCAT LBTESTCD LBTEST LBDTC VISITNUM VISIT LBORRES>. For every domain a macro call is made to obtain the data comparison. The macro call is automated using CALL EXECUTE and works in the same way as described in part one. The logic to determine the parsing and validity of variables to be dropped for that domain (macro parameter, DROPVAR) has not been covered in the paper. Logic to highlight variables present in one transfer but not the other per domain is also included. The macro variable DROPVAR used in the code below is different from one explained in the macro parameters section. The section of code related to part two is described below. 4

%macro allcmp(inds=); proc sort data=prevdir.&inds out=src1; proc sort data=crtdir.&inds out=src2; %if %upcase(&inds)=lb %then %do; proc sort data=prevdir.lb out=src1; by usubjid lbcat lbtestcd lbtest lbdtc visitnum visit lborres; %end; proc sort data=crtdir.lb out=src2; by usubjid lbcat lbtestcd lbtest lbdtc visitnum visit lborres; proc compare base=src1 %if (&dropvar)^= %then %do;(drop=&dropvar) compare=src2(drop=&dropvar) %end; %else %if (&dropvar)= %then %do;compare=src2 %end; outnoequal outbase outcomp outdiff noprint out=compare; proc datasets library=work noprint; delete src1 src2; /* Determine commonality of variables */ proc contents data=crtdir.&inds out=content_1(keep=name) noprint; proc contents data=prevdir.&inds out=content_2(keep=name) noprint; data content_1; set content_1; name=upcase(name); data content_2; set content_2; name=upcase(name); proc sort data=content_1; by name; proc sort data=content_2; by name; data crtdir prevdir; merge content_1(in=a) content_2(in=b); by name; if a and not b then output crtdir; if b and not a then output prevdir; 5

data crtdir; set crtdir; src="current TRANSFER"; data prevdir; set prevdir; src="old TRANSFER"; /* Determining common variables and create output for variables not common */ ods tagsets.excelxp options(sheet_name="&inds"); proc print data=compare; title 'Variables in Current transfer but NOT in old transfer '; proc print data=crtdir; title 'Variables in old transfer but NOT in current transfer '; proc print data=prevdir; proc datasets library=work noprint; delete crtdir prevdir; %mend; /* invoke macro for data comparison */ data x; length allcmp $200; set cmp end=eof; allcmp=cats('%allcmp(inds=',domain,')'); call execute(allcmp); CONCLUSION Even though the title suggests the timing of the changes be tracked around database lock, this process can be implemented at earlier stages to ensure that the current transfer is not the same as the previous transfer before further analysis. REFERENCES [1] Wang, Hui. Creating data driven SAS code with CALL EXECUTE. PharamSUG2015, Orlando, FL. [2] Ramalingam, Sanjiv. Automating the pooling of variables across multiple datasets using PROC SQL and SAS macro. PharmaSUG2009, Portland, OR. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Sanjiv Ramalingam Biogen Inc. 300 Binny Street Cambridge, MA - USA Email: Sanjiv.ramalingam@biogen.com Brand and product names are trademarks of their respective companies. 6