3N Validation to Validate PROC COMPARE Output

Similar documents
Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

SAS2VBA2SAS: Automated solution to string truncation in PROC IMPORT Amarnath Vijayarangan, Genpact, India

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

PharmaSUG Paper PO12

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Validation Summary using SYSINFO

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Automate Clinical Trial Data Issue Checking and Tracking

One Project, Two Teams: The Unblind Leading the Blind

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Chasing the log file while running the SAS program

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

An Efficient Tool for Clinical Data Check

Data Quality Review for Missing Values and Outliers

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

Summary Table for Displaying Results of a Logistic Regression Analysis

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Paper S Data Presentation 101: An Analyst s Perspective

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

PharmaSUG Paper AD09

Creating Macro Calls using Proc Freq

Keeping Track of Database Changes During Database Lock

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Run your reports through that last loop to standardize the presentation attributes

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Post-Processing.LST files to get what you want

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

ODS/RTF Pagination Revisit

Generating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

Developing Data-Driven SAS Programs Using Proc Contents

A Macro that can Search and Replace String in your SAS Programs

PharmaSUG Paper TT11

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

Program Validation: Logging the Log

Let SAS Help You Easily Find and Access Your Folders and Files

PharmaSUG China 2018 Paper AD-62

Checking for Duplicates Wendi L. Wright

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA

A Side of Hash for You To Dig Into

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

The Output Bundle: A Solution for a Fully Documented Program Run

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

Uncommon Techniques for Common Variables

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

INTRODUCTION THE FILENAME STATEMENT CAPTURING THE PROGRAM CODE

SAS Drug Development Program Portability

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

Using PROC SQL to Generate Shift Tables More Efficiently

ODS for PRINT, REPORT and TABULATE

IT S THE LINES PER PAGE THAT COUNTS Jonathan Squire, C2RA, Cambridge, MA Johnny Tai, Comsys, Portage, MI

Quality Control of Clinical Data Listings with Proc Compare

Out of Control! A SAS Macro to Recalculate QC Statistics

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

A Few Quick and Efficient Ways to Compare Data

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

TLFs: Replaying Rather than Appending William Coar, Axio Research, Seattle, WA

Setting the Percentage in PROC TABULATE

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Automating Preliminary Data Cleaning in SAS

Creating an ADaM Data Set for Correlation Analyses

A Mass Symphony: Directing the Program Logs, Lists, and Outputs

Using SAS Metadata as a Key Performance Indicator!

Using Templates Created by the SAS/STAT Procedures

Real Time Clinical Trial Oversight with SAS

Submitting SAS Code On The Side

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

OUT= IS IN: VISUALIZING PROC COMPARE RESULTS IN A DATASET

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Useful Tips When Deploying SAS Code in a Production Environment

Using SAS Macros to Extract P-values from PROC FREQ

Get into the Groove with %SYSFUNC: Generalizing SAS Macros with Conditionally Executed Code

Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China

Omitting Records with Invalid Default Values

Syntax Conventions for SAS Programming Languages

Beginner Beware: Hidden Hazards in SAS Coding

Your Own SAS Macros Are as Powerful as You Are Ingenious

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Cleaning up your SAS log: Note Messages

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

Transcription:

ABSTRACT Paper 7100-2016 3N Validation to Validate PROC COMPARE Output Amarnath Vijayarangan, Emmes Services Pvt Ltd, India In the clinical research world, data accuracy plays a significant role in delivering quality results. Various validation methods are available to confirm data accuracy. Of these, double programming is the most highly recommended and commonly used method to demonstrate a perfect match between production and validation output. PROC COMPARE is one of the SAS procedures used to compare two data sets and confirm the accuracy. In the current practice, whenever a program rerun happens, the programmer must manually review the output file from the PROC COMPARE to ensure an exact match. This is tedious, time-consuming, and error prone because there are more output files to be reviewed and manual intervention is required. The proposed approach programmatically validates the output of PROC COMPARE in all the programs and generates an HTML output file with a Pass/Fail status flag for each output file. The status is flagged as Pass whenever the output file meets the following 3N criteria: 1. NOTE: No unequal values were found. All values compared are exactly equal. 2. Number of observations in base and compared data sets are equal. 3. Number of variables in base and compared data sets are equal. INTRODUCTION Every clinical study involves numerous data sets, tables, listings and graphs to validate and it is very crucial for the reason that these data characterize the subjects in the clinical study. Presently, the output of Tables (T), Listings (L) and Figures (F) are also available as data sets to speed up the validation process. These primary program output data sets are compared with the QC output data sets USING PROC COMPARE to ensure the accuracy. As of now, whenever the program rerun happens, programmer must manually review every output file for the statement "NOTE: No unequal values were found. All values compared are exactly equal". If this statement is present in the output file, it is concluded that two data sets are identical. Unfortunately, this assumption is not always true. Whether it is true or not, it is cumbersome task and time consuming to go through each and every output file when the number of programs or rerun increases and also error prone due to manual intervention. The approach proposed in this paper explains how to programmatically validate the output generated from PROC COMPARE of all the programs and also it ensures the accuracy. TYPICAL PROGRAM LIFE CYCLE A typical work day of a programmer starts with Data sets (D), Tables, Listings and Figures. The process flow and the role of primary programmers, review programmers and statisticians as follows: Display 1. Program Life Cycle 1

If there are any validation comments raised by the reviewer, both the primary and review programmers should first review the findings together and resolve the issues. Still if it is not resolved they discuss with statistician to come to a conclusion. VALIDATION USING PROC COMPARE PROC COMPARE is one of the widely used validation procedure to compare two data sets. The basic syntax is as follows: proc compare base=qcdata compare=datatovalidate; To expedite the validation process, the data sets that are used to generate the production DTLFs are saved as a permanent data set and it is being compared with the validation output data set developed by the reviewer using the above mentioned code. The typical output generated by PROC COMPARE will have four sections namely data set summary, variable summary, observation summary and optionally compare differences if there is any discrepancies found between the data sets. The programmer looks for the statement NOTE: No unequal values were found. All values compared are exactly equal. in the PROC COMPARE output as follows: proc compare base=sashelp.class compare=sashelp.class; Display 2: Proc Compare Expected Output But the presence of this statement does not always guarantee the accuracy of comparison. The PROC COMPARE can be misleading you in its output if the user is not very cautious. The below sample code compares the same data sets sashelp.class in which base data set drops the AGE variable but still PROC COMPARE generates the above mentioned statement in its output by treating BASE data set as reference. proc compare base=sashelp.class(drop=age) compare=sashelp.class; Even though the number of variables are not same still the statement NOTE: No unequal values were found. All values compared are exactly equal. is reported in the output of PROC COMPARE since it matches only the number of variables and observations from base dataset. 2

Display 3: Proc Compare Misleading Output CHALLENGES IN VALIDATION Each clinical study involves creating or reporting of numerous data sets, tables, listings and figures and their validations. 1. Manual checking of 100s of output or redoing the same while regenerating the same output is tedious, time consuming and error prone 2. PROC COMPARE can also be misleading the programmer with the above kind of outputs 3. The supervisor or team lead should await for the update from the programmers to know the status PROPOSED SOLUTION With %threen macro, the above challenges are over come. The steps involved in the solutions are outlined as follows: Step 1 Step 2 Step 3 Primary programmer produces the dataset For TLF, dataset used in / output of proc report Reviewer re-produce the same output using the code developed in parallel to primary program Follow variable naming convention used by programmer Generate the proc compare LST output and save it Use %threen to scan all the compare LST output files %threen produces a HTML output Reviewer review the only one HTML report instead of reviewing many LST files Display 4. Execution flow 3

%THREEN MACRO The functionality of %threen macro is as follows Read in PROC COMPARE LST output files Identify the files to read Read in all files and form single dataset Process the files Search for the potential messages Clean the data to keep only potential messages Reporting Create status flag, hyper link to the output file HTML report output Display 5. %threen Macro Process Flow HTML OUTPUT %threen macro creates the following HTML file. Each output file is hyperlinked to its corresponding file and it helps to easily open the file from the 100s of file for the further review if it all needed. Display 6. %threen Output The following measures of the base and compared data sets are reported. Status variable with Pass / Fail flag whenever the output file meets the following 3N criteria NOTE: No unequal values were found. All values compared are exactly equal Number of observations in Base and Compared data sets are equal Numbers of variables in Base and Compared data sets are equal In addition to that it provides overall summary of both base and compares data sets. 4

CONCLUSION The SAS macro %threen efficiently serves the purpose of validating all the output files generated from PROC COMPARE in a short time and also expedite the process of validation with accuracy. It reduces up to 90% of time spend on the manual review of the output files. It can also be integrated with other programs like the program for checking the log files and making of any project status reports. It has also a limitation like the primary program dataset output and validation program output dataset should be of same form and it does the checks only on the data display and not the cosmetics which have to be handled by the programmer. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Amarnath Vijayarangan Emmes Services Pvt Ltd, Bangalore, India avijayarangan@emmes.com amarnath7@gmail.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. APPENDIX: %THREEN MACRO %let path=c:\; %macro threen; filename pth "&path"; *** Prepare list of output files to read; data lstlist; length lstname $ 500; dval=dopen('pth'); if dval> 0 then do; count=dnum(dval); do i= 1 to count; lstname=dread(dval,i); if upcase(scan(lstname,2,'.'))='lst' then output; keep lstname; filename pth clear; *** Read in all output files; data lst; length filepath $ 500; set lstlist; filepath=cats("&path.\",lstname); infile dummy filevar=filepath length=len end=done missover; do while (not done); input record $varying2000. len; filename=filepath; record=strip(compbl(record)); if record^='' then output; proc sort data=lst; by filename; *** Ensure output file contains only one proc compare; proc freq data=lst noprint; tables filename /out=qc(drop=percent); where record='data Set Summary'; 5

*** Delete filename from further process if more than one proc compare presents; data lst; merge lst qc; by filename; if count=2 then delete; *** Create the necessary flags; data lst; length BDataset BNvar BNObs BLabel CDataset CNvar CNObs CLabel CommonVars CommonObs BNotCVars CNotBVars BNotCObs CNotBObs $ 100; set lst; by filename; if record='the COMPARE Procedure' then id1=100; else id1+1; if record='value Comparison Results for Variables' then id2=100; else id2+1; if record='dataset Created Modified NVar NObs Label' then id3=100; else id3+1; retain BDataset BNvar BNObs BLabel CDataset CNvar CNObs CLabel CommonVars BNotCVars CNotBVars CommonObs BNotCObs CNotBObs; if first.filename then do; BDataset=''; BNvar='';BNObs=''; BLabel=''; CDataset=''; CNvar=''; CNObs=''; CLabel=''; CommonVars=''; BNotCVars=''; CNotBVars=''; CommonObs=''; BNotCObs=''; CNotBObs=''; if id3=101 then do; BDataset=strip(scan(record,1,'')); BNvar=strip(scan(record,4,'')); BNObs=strip(scan(record,5,'')); BLabel=strip(substr(strip(record),(length(strip(record))) -index(strip(reverse(record)),strip(reverse(bnobs))))); if length(strip(blabel))>length(strip(bnobs)) then BLabel=substr(BLabel,1+length(strip(BNobs))); else Blabel=''; if id3=102 then do; CDataset=strip(scan(record,1,'')); CNvar=strip(scan(record,4,'')); CNObs=strip(scan(record,5,'')); CLabel=strip(substr(strip(record),(length(strip(record))) -index(strip(reverse(record)),strip(reverse(cnobs))))); if length(strip(clabel))>length(strip(cnobs)) then CLabel=substr(CLabel,1+length(strip(CNobs))); else Clabel=''; if record=:'number of Variables in Common' then CommonVars=strip(compress(scan(record,2,':'),,'kd')); if record=:catx('','number of Variables in',bdataset, 'but not in',cdataset) then BNotCVars=strip(compress(scan(record,2,':'),,'kd')); if record=:catx('','number of Variables in',cdataset, 'but not in',bdataset) then CNotBVars=strip(compress(scan(record,2,':'),,'kd')); if record=:'number of Observations in Common' then CommonObs=strip(compress(scan(record,2,':'),,'kd')); if record=:catx('','number of Observations in',bdataset, 'but not in',cdataset) then BNotCObs=strip(compress(scan(record,2,':'),,'kd')); if record=:catx('','number of Observations in',cdataset, 'but not in',bdataset) then CNotBObs=strip(compress(scan(record,2,':'),,'kd')); if record=:catx('','number of Variables in',bdataset, 'but not in',cdataset) then BNotCVars=strip(compress(scan(record,2,':'),,'kd')); if record=:catx('','number of Variables in',cdataset, 'but not in',bdataset) then CNotBVars=strip(compress(scan(record,2,':'),,'kd')); if record=:'note:' then Status='No unequal values were found'; if Status='No unequal values were found' and (BNvar=CNvar) and (BNObs=CNObs) then Status='Pass'; else Status='Fail'; drop id:; if last.filename; lstname=scan(lstname,1,'.'); lstname=cats('<a href=',cats("'",filename,"'"),'>',lstname,'</a>'); filename temp url %include temp; 'http://support.sas.com/rnd/base/ods/odsmarkup/tableeditor/tableeditor.tpl'; *** Create HTML report output; 6

ods tagsets.tableeditor file="&path\3n Validation.html" style=styles.sasweb options( banner_color_odd="pink" GRIDLINE="YES" GRIDLINE_COLOR="yellow" highlight_color='pink' frozen_headers="yes" frozen_rowheaders="yes"); title1 h = 15pt c=orange bold "3N Validation Report" ; title2 h = 12pt c=purple j=l bold "<a href=&path.>&path.</a>" ; proc report data=lst nowd split='*'; column lstname status ('Common Attributes' CommonVars CommonObs) ('Base Data Set Attributes' BDataset BNvar BNObs BLabel BNotCVars BNotCObs) ('Compare Data Set Attributes' CDataset CNvar CNObs CLabel CNotBVars CNotBObs); define lstname / display style(column)= {font_weight=bold just=center cellwidth=3.8 in} 'Output*File*Name' ; define status / display style(column)={just=center cellwidth=0.6 in} 'Status'; define CommonVars / display style(column)={just=center cellwidth=0.3 in} '# of Vars'; define CommonObs /display style(column)={just=center cellwidth=0.3 in} '# of Obs'; define BDataset /display style(column)={just=center cellwidth=0.8 in} 'Name'; define BNvar /display style(column)={just=center cellwidth=0.3 in} '# of Vars'; define BNObs /display style(column)={just=center cellwidth=0.3 in} '# of Obs'; define BLabel /display style(column)={just=center cellwidth=2.8 in} 'Label'; define BNotCVars /display style(column)={just=center cellwidth=0.8 in} '# of Vars*Not In*Compare'; define BNotCObs/display style(column)={just=center cellwidth=0.8 in} '# of Obs*Not In*Compare'; define CDataset /display style(column)={just=center cellwidth=0.8 in} 'Name'; define CNvar /display style(column)={just=center cellwidth=0.3 in} '# of Vars'; define CNObs /display style(column)={just=center cellwidth=0.3 in} '# of Obs'; define CLabel /display style(column)={just=center cellwidth=2.8 in} 'Label'; define CNotBVars /display style(column)={just=center cellwidth=0.9 in} '# of Vars*Not In*Base'; define CNotBObs /display style(column)={just=center cellwidth=0.9 in} '# of Obs*Not In*Base'; ods tagsets.tableeditor close; ods html close; %mend threen; %threen; 7