An Efficient Tool for Clinical Data Check

Similar documents
Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

When Powerful SAS Meets PowerShell TM

PharmaSUG Paper AD03

Automate Clinical Trial Data Issue Checking and Tracking

Real Time Clinical Trial Oversight with SAS

PharmaSUG Paper TT11

PharmaSUG Paper PO10

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Data Quality Review for Missing Values and Outliers

Create Metadata Documentation using ExcelXP

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

PharmaSUG Paper PO12

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Keeping Track of Database Changes During Database Lock

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A Taste of SDTM in Real Time

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

An Introduction to Visit Window Challenges and Solutions

PhUSE US Connect 2019

PharmaSUG China

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

One Project, Two Teams: The Unblind Leading the Blind

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

PharmaSUG Paper PO21

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

An Efficient Solution to Efficacy ADaM Design and Implementation

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

A SAS Macro to Automate the Process of Pooling Sites

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Creating a Departmental Standard SAS Enterprise Guide Template

Purchase this book at

PharmaSUG Paper SP09

A Few Quick and Efficient Ways to Compare Data

IF there is a Better Way than IF-THEN

Run your reports through that last loop to standardize the presentation attributes

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Instruction How To Use Excel 2007 Pivot Table Example Data Source

A Picture is worth 3000 words!! 3D Visualization using SAS Suhas R. Sanjee, Novartis Institutes for Biomedical Research, INC.

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

Clinical Data Visualization using TIBCO Spotfire and SAS

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

SAS (Statistical Analysis Software/System)

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Creating a Patient Profile using CDISC SDTM Marc Desgrousilliers, Clinovo, Sunnyvale, CA Romain Miralles, Clinovo, Sunnyvale, CA

Benchmark Macro %COMPARE Sreekanth Reddy Middela, MaxisIT Inc., Edison, NJ Venkata Sekhar Bhamidipati, Merck & Co., Inc.

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

PharmaSUG Paper CC02

A SAS Macro to Create Validation Summary of Dataset Report

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Indenting with Style

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Pharmaceuticals, Health Care, and Life Sciences. An Approach to CDISC SDTM Implementation for Clinical Trials Data

Utilizing the VNAME SAS function in restructuring data files

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

MedDRA Dictionary: Reporting Version Updates Using SAS and Excel

ODS TAGSETS - a Powerful Reporting Method

Easy CSR In-Text Table Automation, Oh My

Have SAS Annotate your Blank CRF for you! Plus dynamically add color and style to your annotations. Steven Black, Agility-Clinical Inc.

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

SAS Training BASE SAS CONCEPTS BASE SAS:

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Fly over, drill down, and explore

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

A Macro that can Search and Replace String in your SAS Programs

Validation Summary using SYSINFO

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

SAS (Statistical Analysis Software/System)

3N Validation to Validate PROC COMPARE Output

Clip Extreme Values for a More Readable Box Plot Mary Rose Sibayan, PPD, Manila, Philippines Thea Arianna Valerio, PPD, Manila, Philippines

Taming a Spreadsheet Importation Monster

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Advanced Visualization using TIBCO Spotfire and SAS

The Output Bundle: A Solution for a Fully Documented Program Run

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

An Alternate Way to Create the Standard SDTM Domains

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Exploring the SAS Macro Function %SYSFUNC

An Introduction to Analysis (and Repository) Databases (ARDs)

ET01. LIBNAME libref <engine-name> <physical-file-name> <libname-options>; <SAS Code> LIBNAME libref CLEAR;

PharmaSUG China Paper 059

PharmaSUG Paper SP04

Why organizations need MDR system to manage clinical metadata?

Lesson 16: Collaborating in Excel. Return to the Excel 2007 web page

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

Summary Table for Displaying Results of a Logistic Regression Analysis

PharmaSUG Paper PO03

Excel Tools Features... 1 Comments... 2 List Comments Formatting... 3 Center Across... 3 Hide Blank Rows... 3 Lists... 3 Sheet Links...

Transcription:

PharmaSUG 2018 - Paper AD-16 An Efficient Tool for Clinical Data Check Chao Su, Merck & Co., Inc., Rahway, NJ Shunbing Zhao, Merck & Co., Inc., Rahway, NJ Cynthia He, Merck & Co., Inc., Rahway, NJ ABSTRACT High-quality data in clinical trials is essential for compliance with Good Clinical Practice (GCP) and regulatory requirements. If data issues are observed during statistical analysis, the interpretation of study results is impacted. As such, statistical programmers may be asked to perform data validity checks and detect potential data issues across multiple datasets during the analysis and reporting (A&R) processes. The focus of this paper is NOT on finding data issues prospectively, but on how to accurately and efficiently track findings during the course of statistical analysis and programming processes. In this paper, some routinely performed data checks are discussed briefly. The results of the data checks are generated and summarized through a macro consisting of two sub-macros. The macro combines the results into one Excel file and compares the current check results with the ones in the previous version automatically. The changes between two versions are marked in different colors so that the reviewers can identify pending issues quickly. This tool provides real time documentation that can track data issues related to the A&R process accurately and efficiently. INTRODUCTION In the pharmaceutical industry, statistical analysis and reports are based on clinical data which must be of high quality. During the trial, the data management (DM) team applies many edit checks to clean the data. In practice, most of the checks are univariable and within the same domain. As a result, discrepancies across multiple variables and mismatches among different datasets might exist after edit checks are complete with DM tools. Additional data issues may be identified using SAS programs for analyses and reporting. Besides identifying data issues, reporting and tracking data issues are also critical. Both the clinical team and the statisticians need to review the data issue report and include comments and feedback before the report with issue findings from SAS programs is sent to the DM. The DM will correct or query the issues according to the report. Excel is commonly used among professionals and has the flexibility required to integrate different comments into the tracking report. Therefore, it is used here as a tool to bring all the addressed data issues and feedback together. All the data issues captured by SAS programs from different sources are integrated into a single Excel workbook with multiple worksheets. Although Excel is flexible and easy to use, one needs to put a great deal of time-consuming manual effort on adjustments like column formatting to produce a tracking report with a user friendly interface., The level of effort is incompatible with frequent reports. A macro developed to automatically output the tracking spreadsheet with a customized interface is necessary. Historical information like comments from reviewers and the date on which the issue was found are kept in the report. Updates of issue contents are highlighted with different colors so that it is very easy for the reviewer to track activities among different roles and determine issue status. 1

DATA CHECK PROCEDURES The purpose of the data check here is to record and track data issues found during statistical analysis and reporting. It works as a supplement instead of a replacement to the data edit checks done by DM. The first step of the data check is to create the data check specification. The programmers will consult with statisticians and the clinical team to draft the data checks specification for a given project. DM will review the data check specification to avoid duplicate work by the DM edit check tool. After the specification is ready, the study programmers will follow the specification to identify issues. Different SAS datasets are generated for the various issue categories. These SAS datasets work as the input source for the developed macro to create a single Excel file with one worksheet for each issue category. Another worksheet with a status summary of the data issues is also provided by the macro. If a previous version of issue spreadsheet exists, the macro can compare the new spreadsheet with the old one. The comments from the previous version of the spreadsheet will be carried over to the new report if the issue still exists there. Different colors of cells are used to distinguish whether the issue was reported previously, or if it is a new one. This will facilitate reviewers to track the issues status easily and efficiently. DATA CHECK ITEMS It is important to check the data in advance so that there is sufficient time to query and correct data issues before the database is locked. Some examples of items to check are listed as below. 1. Discrepancies of randomized subjects: (1) Subject with randomization number but without reference date (2) Subject with randomization number but without randomization date (3) Subject randomized but with a record of screen failure in the Disposition dataset. 2. Discrepancies of death subjects: (1) Subject with a death record in the Death dataset but not in the Disposition dataset (2) Subject with a death record in the Disposition dataset but not in the Death dataset (3) Subject with different death dates between the Death dataset and the Disposition dataset 3. Discrepancies of administered drug: (1) Administered drug dose amount mismatched with the dispensed dose amount (2) Administered drug dose start date behind stop date (3) Administered drug dose with overlapped date 4. Discrepancies of administered drug and disposition datasets: (1) Administered drug dose date behind treatment or study discontinued date (2) Gap between last dose date and treatment discontinued date (3) Treatment discontinued date behind study discontinued date (4) Subject with Administered drug but not randomized 2

INDIVIDUAL WORKSHEET BUILDING After the data check specification is ready, a SAS program is developed to generate SAS datasets according to different data issue categories. All the SAS datasets are exported into one excel file with one worksheet for each data issue category. The SAS code (key part) is shown below: *=-------------- Generate Excel file with multiple worksheets -------*; %macro ExcelGen(outfile=%str(date_issue_check), outsheet=, indata=, tab=, desc=); *--- Check whether dataset is empty ----; %let Exist = No; %let NumObs = 0; %if %sysfunc(exist(&indata)) %then %let Exist = Yes; %if &Exist = Yes %then %do; %let DSNId = %sysfunc(open(&indata)); %let DSObs = %sysfunc(attrn(&dsnid.,nobs)); %let rc = %sysfunc(close(&dsnid.)); %let NumObs = &DSObs.; % %if &numobs > 0 %then %do; proc sql; create table &indata._ as select *, "" as DM_Comments length 200 from &indata. ; quit; proc export data=&indata._ outfile="&path.\data Issues.xls" dbms= excelcs REPLACE; sheet="&outsheet"; *--- The name of checked issue type ---; run; % %else %do; %put There is no record at dataset &indata. ; % %mend ExcelGen; According to the data check specification, this macro will be called repeatedly to generate an individual worksheet when a dataset is created and checked. All the worksheets are saved into one excel file for further use. 3

WORKSHEET FORMAT AND SUMMARY BUILDING The excel file created by the macro ExcelGen contains all detailed information about the data issues, but has not yet been formatted. A sub macro is developed to format the cells at each worksheet such as defining the width of columns, setting filter, and page orientation. In order to give reviewers an overview of all data issues, a new worksheet general is created to summarize the data issues from all worksheets, as shown in Fig. 1, where the column domain is the raw dataset names; the column Source is the hyperlink to the individual worksheet; the column Question Remark is the description of the data issue. A column Comments is added to adopt comments from reviewers. Additional notes could be added to the General worksheet as shown on the top of Fig. 1. Fig. 1 Summary table of previous data issue tracking COMPARISON OF TRACKING SHEETS For a long running trial, not only is a data issue report needed once, but also a cumulative data issue sheet is required to track the data issues and their status. A sub- macro edit0check0compare_excel (key part is shown below) is developed to create the cumulative data issue spreadsheet, which will compare the current spreadsheet shown in Fig. 2 with the previous version (Fig. 1) to track the issue status, as marked with different colors. The summary table of the output is shown in Fig. 3. In this sub-macro, yellow means new data issues; red means data issues exist but have occurred before; green means this kind of data issue occurred before but is resolved now. Comparing Fig.1 and Fig.2, the data issue at source AE_EX is a new one. Therefore, it is marked with yellow. The issue at source DMA_DISP, existed in Fig.1 but not in Fig.2. As a result, it is marked with green to show that it is solved at in the new data source. The issues existing in both Fig.1 and Fig.2 are marked with red. Fig. 4 is a screen shot of an individual worksheet in the cumulative report. 4

Fig. 2 Summary table of current data issue tracking Fig. 3 Summary table of cumulative data issue tracking Fig. 4 Data issue at worksheet DM *--------------------------Key code of cumulative summary table --------------------------; data general; merge old_general(in=a rename = (col3 = ocol3 col4 = ocol4 col8 = ocol8)) new_general(in=b ); by _col6 _col7; 5

if a and b then mssg='both'; if a and not b then mssg='old'; if b and not a then mssg='new'; col7=compress(col7,,'kw'); if ocol8 ne '' then col8 = ocol8; if ocol3 ne '' then col3 = ocol3; if ocol4 ne '' then col4 = ocol4; drop ocol3 ocol4 ocol8; run; proc report data = general nowd spanrows style(header)={font_weight=bold font_size=10pt just=center protectspecialchars=off borderstyle=solid bordercolor=black} style(column)={borderstyle=solid bordercolor=black}; column col1-col12 mssg; define col1 /order order= data 'Log #' style(column)={vjust=c just=c}; define col2 /display 'Domain' style(column)={vjust=c just=c};; define col3 /display 'Data Transfer Date'; define col4 /display 'Date Reported'; define col5 /display 'Reported by'; define col6 /display style(column)={url=$urlfmt. } 'Source'; define col7 /display 'Question / Remark'; define col8 /display 'Status'; define col9 /display 'Resolution'; define col10 /display 'Resolved By'; define col11 /display 'Date Resolved'; define col12 /display 'Comments'; define mssg /noprint ; compute COL6; if upcase(strip(col6)) in (&tablist) then do; call define(_col_,'style','style={color=blue textdecoration=underline}') ; endcomp; compute COL1; if upcase(strip(col1)) eq 'LOG #' then do; call define(_row_,'style','style={background=grey font_weight=bold}') ; if upcase(strip(col1)) =0 then do; call define(_col_,'style','style={color=white }') ; endcomp; compute mssg; if findw(upcase(mssg), "NEW") gt 0 then do; call define('col7','style','style={background=yellow}') ; else if findw(upcase(mssg), "BOTH") gt 0 then do; call define('col7','style','style={background=red}') ; 6

else if findw(upcase(mssg), "OLD") gt 0 then do; call define('col7','style','style={background=green}') ; endcomp; run; CONCLUSION High quality clinical data are essential for statistical analysis and reporting. The macro code discussed in this article provides an efficient way to track and document data issues in addition to formal data review completed by Data Management. The automatic output saves programmers significant time in the generation of issue reports. The macro adopts comments from reviewers and marks issue statuses with different colors, which provides the reviewers a useful tool to track and monitor data issues. The method helps the whole study team to identify and resolve the potential data issues in a timely and efficient manner which supports a high quality statistical analysis. REFERENCES Jeff Xia, Lugang Larry Xie, Shunbing Zhao, A Vivid and Efficient Way to Highlight Changes in SAS Dataset Comparison, PharmaSUG 2017, Paper AD04. Niraj J. Pandya, Vinodh Paida, Data Edit-checks Integration using ODS Tagset, PharmaSUG 2011, Paper DM03. ACKNOWLEDGEMENT The authors would like to thank Changhong Shi for her great support and valuable input of this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Name: Chao Su Enterprise: Merck Address: 126 E. Lincoln Avenue City, State ZIP: Rahway, NJ 07065-4607 Work Phone: 732-594-6459 E-mail: chao.su@merck.com Web: www.merck.com Name: Shunbing Zhao Enterprise: Merck Address: 126 E. Lincoln Avenue City, State ZIP: Rahway, NJ 07065-4607 Work Phone: 732-594-3976 E-mail: shunbing.zhao@merck.com Web: www.merck.com 7

Name: Cynthia He Enterprise: Merck Address: 126 E. Lincoln Avenue City, State ZIP: Rahway, NJ 07065-4607 Work Phone: 732-594-3876 E-mail: cynthia.he@merck.com Web: www.merck.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. MS, Windows and PowerShell are trademarks of the Microsoft group of companies. 8