CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Similar documents
Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A Macro that can Search and Replace String in your SAS Programs

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

When Powerful SAS Meets PowerShell TM

An Efficient Tool for Clinical Data Check

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

PROBLEM FORMULATION, PROPOSED METHOD AND DETAILED DESCRIPTION

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

Using Dynamic Data Exchange

PharmaSUG Paper PO10

Create Metadata Documentation using ExcelXP

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Validation Summary using SYSINFO

One Project, Two Teams: The Unblind Leading the Blind

Program Validation: Logging the Log

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

The Output Bundle: A Solution for a Fully Documented Program Run

Quality Control of Clinical Data Listings with Proc Compare

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Dictionary.coumns is your friend while appending or moving data

Data Quality Review for Missing Values and Outliers

Run your reports through that last loop to standardize the presentation attributes

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

3N Validation to Validate PROC COMPARE Output

Keeping Track of Database Changes During Database Lock

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

Automate Secure Transfers with SAS and PSFTP

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Tracking Dataset Dependencies in Clinical Trials Reporting

A SAS Macro to Create Validation Summary of Dataset Report

A Few Quick and Efficient Ways to Compare Data

Your Own SAS Macros Are as Powerful as You Are Ingenious

Useful Tips When Deploying SAS Code in a Production Environment

Plot Your Custom Regions on SAS Visual Analytics Geo Maps

TLFs: Replaying Rather than Appending William Coar, Axio Research, Seattle, WA

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

PharmaSUG China Paper 059

A Taste of SDTM in Real Time

Let SAS Help You Easily Find and Access Your Folders and Files

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Reading in Data Directly from Microsoft Word Questionnaire Forms

Copy That! Using SAS to Create Directories and Duplicate Files

Pharmaceuticals, Health Care, and Life Sciences. An Approach to CDISC SDTM Implementation for Clinical Trials Data

PharmaSUG Paper PO12

What Do You Mean My CSV Doesn t Match My SAS Dataset?

An Alternate Way to Create the Standard SDTM Domains

An Automation Procedure for Oracle Data Extraction and Insertion

PharmaSUG Paper AD03

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

SAS Drug Development Program Portability

MedDRA Dictionary: Reporting Version Updates Using SAS and Excel

A Mass Symphony: Directing the Program Logs, Lists, and Outputs

Combining TLFs into a Single File Deliverable William Coar, Axio Research, Seattle, WA

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Reading and Writing RTF Documents as Data: Automatic Completion of CONSORT Flow Diagrams

PharmaSUG Paper CC02

Automating Preliminary Data Cleaning in SAS

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

SAS Display Manager Windows. For Windows

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

What's the Difference? Using the PROC COMPARE to find out.

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

PhUse Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

PharmaSUG China 2018 Paper AD-62

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

Uncommon Techniques for Common Variables

Automate Clinical Trial Data Issue Checking and Tracking

SAS2VBA2SAS: Automated solution to string truncation in PROC IMPORT Amarnath Vijayarangan, Genpact, India

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Developing Data-Driven SAS Programs Using Proc Contents

Implementing external file processing with no record delimiter via a metadata-driven approach

Check Please: An Automated Approach to Log Checking

PharmaSUG Paper SP09

PharmaSUG Paper TT11

IF there is a Better Way than IF-THEN

SAS Online Training: Course contents: Agenda:

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

Using PROC SQL to Generate Shift Tables More Efficiently

Patricia Guldin, Merck & Co., Inc., Kenilworth, NJ USA

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

A Linux Shell Script to Automatically Compare Results of Statistical Analyses

The Proc Transpose Cookbook

Statistics, Data Analysis & Econometrics

SUGI 29 Data Warehousing, Management and Quality

ABSTRACT: INTRODUCTION: WEB CRAWLER OVERVIEW: METHOD 1: WEB CRAWLER IN SAS DATA STEP CODE. Paper CC-17

PharmaSUG Paper AD09

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

Transcription:

CC13 An Automatic Process to Compare Files Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ ABSTRACT Comparing different versions of output files is often performed in the validation stage of work. When the number of output files is minimal, the simplest way to compare different versions is to manually open each file and check it visually. When files to be compared are in a standard format, a more systematic approach is to use SAS. In certain applications, thousands of output files may be generated for one project. Because it is tedious to enter thousands of file names in the SAS code to accomplish a comparison, there is a need to automate the process so all file names for specified folders are read automatically via SAS. This paper presents how the X command features in SAS compares multiple folders without the need to specify individual file names, and the files compared can be word documents, SAS programs, output logs, text files, and datasets. In addition, decisions comparing a specific pair of files are made automatically based on defined file attributes. The summary report of this paper uses PROC REPORT to present the comparison outcome. The automatic process to compare files presented is simple, portable, and can be easily combined with macros that compare individual files. KEYWORDS X COMMAND, PROC REPORT, QC, VALIDATION, PROC COMPARE INTRODUCTION In the pharmaceutical industry, input datasets, programs, output listings, and tables for analysis of clinical trial and reporting are usually stored under a standardized folder structure using specified file names. When an update to the files is required, datasets, programs, and output reports necessitate comparison. Likewise, evaluation of files stored in similar folders (backup folders) as well as comparison between testing folders and production folders is mandatory. The comparison process, if completed manually, is cumbersome and time consuming, and when the number of files is large, the automation of the comparison process becomes critical. One can certainly handle these comparisons outside of SAS, especially when there is no specific output report needed for comparison. However, in certain applications the creation of a comparison report to identify a distinction between specified files in various folders is necessary. This distinction may include statistics and details related to differences between specified files in a range of folders. Therefore, it is critical to be able to combine the comparison process and statistic reporting in the same SAS session. The methodology for accomplishing file comparisons and reporting the discrepancies is to use the X statements within SAS to automate the comparison process and generate different reports based on the file attributes. Papers have been written, Xu, et al. (2007) [1], presenting macros for comparing individual output files. It will be beneficial to apply the automatic read-in file name process to extend the capability to compare multiple output files. X COMMAND BASICS Running DOS or Windows Commands from within SAS The X statement can execute DOS or Windows commands from the SAS session [2]. The X statement has the following syntax: X <'Command'> This paper uses two DOS commands to do the work: COMMAND CD DIR DESCRIPTION Change directories. List the content of the directory. XWAIT and XSYNC System Option The XWAIT System Option controls whether the SAS user must type EXIT to return to the SAS session after an X statement or X command has finished executing a DOS command. On the other hand, the NOXWAIT System Option specifies the command processor to automatically return to the SAS session after the specified command is executed. There is no need to type EXIT. The XSYNC System Option specifies the operating system command to execute synchronously with the SAS session. That is, control is not returned to the SAS System until the command has completed. The NOXSYNC System Option specifies the

SAS user can execute an X command or X statement and return to the SAS session without closing the window spawned by the X command or X statement. This paper recommends using the combination of NOXWAIT and XSYNC because the command prompt window closes automatically when the application finishes, and the SAS System waits for the application to finish. IMPLEMENTATION AND ALGORITHM Step 1: Get the file name X Commands and the NOXWAIT and XSYNC Options are used to obtain the list of files. In the example illustrated here, there are two folders to be compared. In order to perform comparisons within SAS, each folder should have a text file loaded with its folder contents. The following key syntax shows how to automatically load the file names in the specified folders to text files along with other folder contents and file attributes. The folders to be compared can be set as macro parameters, and they are user defined while the text file names for the folder contents can be arbitrary names. Used in this paper, &basefder and &compfder are the user defined macro parameters; 'base' is the text file name for folder &basefder and 'compare' is the text file name for folder &compfder. option noxwait xsync; x "cd &outfder"; x "dir &basefder. >base.txt"; x "dir &compfder. >compare.txt"; Step 2: Convert the text files to SAS data sets The text file created from Step 1 will be read into a SAS dataset by using the INFILE statement. Variables chosen from the folder content text file include: file name, file created or modified date, file created or modified time, and file size. The file name contains the suffix or extension which usually indicates the type of file. An additional user defined macro parameter, &ftype, is introduced to specify what type of files to be compared. Files with the extension of specified &ftype will be compared. For example, &ftype=sas7bdat will compare two SAS datasets. The following code illustrates how to read the text file from Step 1 into a SAS dataset. data base; infile "&outfder.base.txt" firstobs = 8 delimiter = ' ' missover; input date_b mmddyy10. +2 time_b $5. ampm_b $2. +10 size_b $ filenm & $ ; format date_b mmddyy10.; informat filenm $120.; part1_b = scan (filenm,1,.); part2_b = scan (filenm,2,.); filenmlb=length(filenm); if part2_b = "&ftype"; data compare; infile "&outfder.compare.txt" firstobs = 8 delimiter = ' ' missover; input date_c mmddyy10. +2 time_c $5. ampm_c $2. +10 size_c $ filenm & $ ; format date_c mmddyy10.; informat filenm $120.; part1_c = scan (filenm,1,.); part2_c = scan (filenm,2,.); filenmlc=length(filenm); if part2_c = "&ftype";

Step 3: Summarize the comparison outcome After the 'base' and the 'compare' datasets have been created based on the folder content text files, the datasets are ready for the folder comparison. To start with the comparison, the base folder description dataset and compare folder description dataset are merged by file name. Note that there are several scenarios as the outcome of comparing the two folders when merging the two datasets by file name: exactly the same file; different file contents but with the same size; files with different sizes; files found only in base folder; files found only in compare folder. In order to identify these different scenarios of the comparison, a flag is introduced to classify the comparison result. When two files share the same file name and the same created date, time, and the same file size, one reasonable assumption is that these two files are the same file. When two files with the same file name have the same file size but different created date and/or time, the file contents can be the same or not. When two files with the same file name have different file sizes, then it is almost certain that the file content is different. Other possibilities are that the file existed in one folder but not the other. The flag created in this step will later be used for further comparison. This paper uses PROC FREQ and PROC REPORT to summarize the findings of the comparison. data result; merge base(in=in1) compare(in=in2); by filenm; if in1 = 1 and in2 = 1 then do; if date_b = date_c and time_b = time_c and ampm_b = ampm_c and size_b eq size_c then flag = 1; else if size_b eq size_c then flag = 2; else if size_b ne size_c then flag = 3; end; if in2= 0 then flag = 4; if in1= 0 then flag = 5; label filenm = 'File Name' flag = 'Compare Result'; *** Generate summary reports for the discrepancies proc format; value compflag 1 = 'Same File' 2 = 'Different Files but Same Size' 3 = 'Files with Different Size' 4 = 'Files Found Only in Base Folder' 5 = 'Files Found Only in Compare Folder' ; ods pdf file="&outfder.&report..pdf" ; proc freq data=result; table flag / nocum nopercent ; format flag compflag.; title 'Summary result of the folder comparison'; title2 "Base folder: &basefder"; title3 "Compare folder: &compfder"; proc report data=result nowindows; column.. ; define.. compute after ;

title 'Same File'; title2 "Base folder: &basefder"; title3 "Compare folder: &compfder"; Step 4: Compare individual files When the folder comparison summary result suggests the two files are different, the next logical step is to compare the individual file content. If the two files to be compared are SAS datasets, PROC COMPARE can be used to compare the difference in terms of the variables and observations. The following syntax would direct SAS to go through all SAS dataset pairs with either different dataset sizes or dataset dates identified in Step 3 from the two compared folders and compare them with PROC COMPARE. %if &ftype = sas7bdat %then %do; data diff; set result; if flag in (2,3) ; proc sql noprint; select count (flag) into: num from diff; quit; %let num=&num; proc sql noprint; select part1_c into :file1 - :file&num from diff; quit; libname base "&basefder"; libname compare "&compfder"; %do i=1 %to &num; proc compare base = base.&&file&i compare = compare.&&file&i ; %end; %end; The application shown here uses PROC COMPARE to compare SAS datasets. Other applications use some existing macros, for example, Xu, et al. (2007), to compare WORD or RTF documents. It will be beneficial to apply the automatic read-in file name process to extend the capability to compare multiple WORD or RTF files. OUTPUT The ODS PDF statement PROC REPORT procedure generates a summary result stored in a PDF file. Snapshot of the Comparison Report:

CONCLUSION An automatic process to compare files and generate reports has been explored with the use of X statements within SAS. A generic macro has been developed to accomplish the automation process. Some examples have been shown to save the possible lengthy comparison process with different reports based on file attributes, and the macro can be easily combined with macros which compare individual files. REFERENCES [1] Xu, M., Zhou, J. (2007) %DiFF: A SAS Macro to Compare Documents in Word or ASCII Format, in Proceedings of the Pharmaceutical SAS Users Group Conference (PharmaSUG 2007) [2] SAS online document: "Running DOS or Windows Commands from within SAS." ACKNOWLEDGMENTS The authors would like to thank the management team of Merck Research Laboratories for their advice on this paper/presentation. Contact Information Your comments and questions are valued and encouraged. Contact the authors at: Simon Lin Merck & Co., Inc. 126 Lincoln Avenue P.O. Box 2000 Rahway, NJ 07065 Phone: 732-594-0773 e-mail: Simon_Lin@merck.com

Huei-Ling Chen Merck & Co., Inc. 126 Lincoln Avenue P.O. Box 2000 Rahway, NJ 07065 Phone: 732-594-2249 e-mail: Huei-Ling_Chen@merck.com TRADEMARK SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.