A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Similar documents
A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

PharmaSUG China Paper 70

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Keeping Track of Database Changes During Database Lock

PharmaSUG Paper PO10

PharmaSUG China Paper 059

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

A SAS Macro to Create Validation Summary of Dataset Report

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Best Practices for E2E DB build process and Efficiency on CDASH to SDTM data Tao Yang, FMD K&L, Nanjing, China

PharmaSUG Paper IB11

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Let SAS Help You Easily Find and Access Your Folders and Files

PharmaSUG Paper TT11

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

A Macro that can Search and Replace String in your SAS Programs

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

OUT= IS IN: VISUALIZING PROC COMPARE RESULTS IN A DATASET

When Powerful SAS Meets PowerShell TM

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

One Project, Two Teams: The Unblind Leading the Blind

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit

Reading in Data Directly from Microsoft Word Questionnaire Forms

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

Taming a Spreadsheet Importation Monster

3N Validation to Validate PROC COMPARE Output

ODS/RTF Pagination Revisit

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Run your reports through that last loop to standardize the presentation attributes

PharmaSUG Paper CC02

An Alternate Way to Create the Standard SDTM Domains

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

An Efficient Tool for Clinical Data Check

esubmission - Are you really Compliant?

Automate Clinical Trial Data Issue Checking and Tracking

How a Code-Checking Algorithm Can Prevent Errors

PharmaSUG China Big Insights in Small Data with RStudio Shiny Mina Chen, Roche Product Development in Asia Pacific, Shanghai, China

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

Quality Control of Clinical Data Listings with Proc Compare

Making the most of SAS Jobs in LSAF

Clinical Data Visualization using TIBCO Spotfire and SAS

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

PDF Multi-Level Bookmarks via SAS

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

PROBLEM FORMULATION, PROPOSED METHOD AND DETAILED DESCRIPTION

PharmaSUG Paper PO22

Not Just Merge - Complex Derivation Made Easy by Hash Object

Benchmark Macro %COMPARE Sreekanth Reddy Middela, MaxisIT Inc., Edison, NJ Venkata Sekhar Bhamidipati, Merck & Co., Inc.

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

SAS Macros of Performing Look-Ahead and Look-Back Reads

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

Validation Summary using SYSINFO

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

SAS as a Tool to Manage Growing SDTM+ Repository for Medical Device Studies Julia Yang, Medtronic Inc. Mounds View, MN

Advanced Visualization using TIBCO Spotfire and SAS

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

What's the Difference? Using the PROC COMPARE to find out.

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Using an ICPSR set-up file to create a SAS dataset

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

PhUSE US Connect 2019

PharmaSUG China. model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

Lab CTCAE the Perl Way Marina Dolgin, Teva Pharmaceutical Industries Ltd., Netanya, Israel

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Easy CSR In-Text Table Automation, Oh My

A Macro to Keep Titles and Footnotes in One Place

A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN

What just happened? A visual tool for highlighting differences between two data sets

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

Patricia Guldin, Merck & Co., Inc., Kenilworth, NJ USA

Interactive Programming Using Task in SAS Studio

Have SAS Annotate your Blank CRF for you! Plus dynamically add color and style to your annotations. Steven Black, Agility-Clinical Inc.

PharmaSUG Paper PO12

Plot Your Custom Regions on SAS Visual Analytics Geo Maps

Creating and Executing Stored Compiled DATA Step Programs

Real Time Clinical Trial Oversight with SAS

MedDRA Dictionary: Reporting Version Updates Using SAS and Excel

Oh Quartile, Where Art Thou? David Franklin, TheProgrammersCabin.com, Litchfield, NH

Creating Macro Calls using Proc Freq

IF there is a Better Way than IF-THEN

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

A Few Quick and Efficient Ways to Compare Data

Creating a Patient Profile using CDISC SDTM Marc Desgrousilliers, Clinovo, Sunnyvale, CA Romain Miralles, Clinovo, Sunnyvale, CA

Transcription:

PharmaSUG China 2018 Paper 64 A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China ABSTRACT For an ongoing study, especially for middle-large size studies, regular or irregular data reviewing is an import method to monitor the progress and quality of the study. In this paper, we develop a tool to compare the different data transfers, including added or deleted records, values updated for the existing records and no change of records. INTRODUCTION In clinical trial, for an ongoing study, many people, include medical, data management, statistician and so on are concerned about the progress and quality of the study. Regular or irregular data reviewing is an important method. But it will take a boundary of time and energy due to the huge data, especially for middle-large size studies. For different data transfers, the changes can be generalized into three parts: added or deleted records, values updated for the existing records and no change of records. Figure 1 shows a general sample data (only list one subject and some variables). Figure 1. Sample Data As we see, there are some changes between these two. How to identify the changes clearly with shorter time? A utility tool will be discussed in following part. PROGRAMMING PROCESS FLOW The programming process flow will display in this part. It includes two. 1. Apply PROC COMPARE for all data sets. It will get the comparison results entirely. 1

2. For the exactly equal dataset, the comparison is finished. What we concerned about is the unequal data sets. It includes added or deleted records, value changed and no changes. Figure 2 is the programming process flow. STEP 1 Figure 2. Programming Process Flow Apply PROC COMPARE for all data sets, it will get the comparison result entirely. Filename filelist pipe "dir ""&logpath"" /b "; DATA filelist; infile filelist length=reclen; input filename $varying1024. reclen ; if index(lowcase(filename), ".sas7bdat"); memname=scan(filename,1); set filelist end=eof; call symputx(compress("data" put(_n_,best.)),filename); if eof then call symputx("num",_n_); Firstly, get all data sets in the folder. Then read each data sets name memname into a macro variable data and use macro variable num to represent the total number of data sets. After that, get all the data sets prepared and using PROC COMPARE, related code is listed as follow: %do i=1 %to # Proc compare data=old.&&data&i compare=new.&&data&i outbase outcomp out=diff&i outnoequal listall criterion=10e-9 maxprint=(50,32767); set diff&i end=eof; if eof then call symput("mvar",trim(left(put(_n_,8.)))); %if &mvar=0 %then %do; Proc delete data=diff&i; 2

Here &logpath is the path of your data sets stored. We can save the result as PDF if necessary. At the same time, it necessary to output the unequal data set and named as diff&i. Then check the observation in diff&i. It will be deleted if there is no observation (exactly equal) in diff&i. Figure 3 is the result of unequal. STEP 2 Figure 3. PROC COMPARE Result (Unequal) After PROC COMPARE, the overall comparison result will display. For the unequal data sets, it s time consuming to check the unequal records, especially for middle-large size and multiple variables. We need to develop a new tool to identify the changes clearly. It includes four steps. 1. Find the common variables. For SDTM and ADaM datasets, we can identify the common variables by classes. /* Find common variables*/ Proc contents data=new.&&data&i out=contents noprint; DATA common; set contents; where name in('subjid','visit') or find(name,'decod') or find(name,'trt') or find(name,'testcd') or find(name,'param') or find(name,'qnam'); set common end=eof; call symputx(compress("common" put(_n_,best.)),name); if eof then call symputx("count",_n_); DATA common1; length common common_ $200; retain common common_; common='';common_=''; %do i=1 %to &count; if not missing(common) then common=strip(common) ' ' "&&common&i."; else common="&&common&i."; if not missing(common_) then common_=strip(common_) ',' "'&&common&i.'"; else common_="'&&common&i.'"; set common1; call symputx("common1",common); call symputx("common1_",common_); 3

Macro variable common1 list all common variables and common1_ is the character of them. 2. Merge the new and old data sets by common variables. Check the total number of the observation. It will be flagged as NEW if added records and flagged as DELETE if deleted records. /* FLAG='NEW' and FLAG= DELETED */ if not a and b then FLAG= NEW ; if a and not b then FLAG= DELETE ; 3. Check the records in common and rename the variables except common variables. Rename with prefix OLD for the old and NEW for the new. Apply the loop to compare all the variables between the OLD variables and NEW variables. It will be flagged as NO CHANGES if all the variables are exactly equal. And it will display the details if the value in OLD and NEW is different. /*FLAG='NO CHANGES' and FLAG=details*/ length FLAG $50 TERM $200; retain CHANGE TERM; CHANGE=0; TERM=''; %do i=1 %to &num1; if NEW_&&cname&i=OLD_&&cname&i then CHANGE=CHANGE; else if NEW_&&cname&i^=OLD_&&cname&i then CHANGE=CHANGE+1; if NEW_&&cname&i^=OLD_&&cname&i and not missing(term) then TERM=strip(term) ', ' "&&cname&i." ':' strip(old_&&cname&i) ' to ' strip(new_&&cname&i); else if NEW_&&cname&i^=OLD_&&cname&i then TERM="&&cname&i." ':' strip(old_&&cname&i) ' to ' strip(new_&&cname&i); if CHANGE=0 then FLAG='No CHANGES'; else if CHANGE>0 then FLAG='HAVE CHANGED'; Here num1 and cname are macro variables. And num1 represents the total number of variables except common variables and cname is the name of variables. Variable CHANGE represents the total number of changes and TERM shows the details. 4. Set the three datasets together and all the changes will display in the final data set. RESULT Apply the utility tool described above, it will get the comparison result and list in Figure 4. Figure 4. Details of Unequal 4

From Figure 5, it is clear to obtain the details of these two, even though there are multiple variables changes. CONCLUSION Apply the utility tool to compare data transfer, it will identify the details in shorter time and clearly. It s not only about the data transfer, but also check the data issue. But there is a restriction for the tool. It s better to keep the structure of the two datasets the same, especially the variables type. RECOMMENDED READING Base SAS Procedures Guide SAS For Dummies CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Jun Wang Enterprise: FMD K&L, Inc. Address: 4F,Building 4,Nanan Ruizhi, No.99 Shengtai Road, Jiangning District City, State ZIP: Nanjing, 210000 E-mail: jun.wang@fountain-med.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 5