A SAS Macro to Create Validation Summary of Dataset Report

Similar documents
A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

PharmaSUG Paper TT11

Keeping Track of Database Changes During Database Lock

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Validation Summary using SYSINFO

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Automate Clinical Trial Data Issue Checking and Tracking

What Do You Mean My CSV Doesn t Match My SAS Dataset?

PharmaSUG Paper IB11

1 Files to download. 3 Macro to list the highest and lowest N data values. 2 Reading in the example data file

An Efficient Tool for Clinical Data Check

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Conversion of CDISC specifications to CDISC data specifications driven SAS programming for CDISC data mapping

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

Program Validation: Logging the Log

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

Clinical Data Visualization using TIBCO Spotfire and SAS

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

Have SAS Annotate your Blank CRF for you! Plus dynamically add color and style to your annotations. Steven Black, Agility-Clinical Inc.

Data Quality Review for Missing Values and Outliers

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

A Macro that can Search and Replace String in your SAS Programs

PharmaSUG China Paper 059

Tracking Dataset Dependencies in Clinical Trials Reporting

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

What's the Difference? Using the PROC COMPARE to find out.

Create Metadata Documentation using ExcelXP

Power Data Explorer (PDE) - Data Exploration in an All-In-One Dynamic Report Using SAS & EXCEL

Reducing SAS Dataset Merges with Data Driven Formats

One Project, Two Teams: The Unblind Leading the Blind

Electricity Forecasting Full Circle

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

Advanced Visualization using TIBCO Spotfire and SAS

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Validating Analysis Data Set without Double Programming - An Alternative Way to Validate the Analysis Data Set

The Output Bundle: A Solution for a Fully Documented Program Run

OUT= IS IN: VISUALIZING PROC COMPARE RESULTS IN A DATASET

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Automatically Configure SDTM Specifications Using SAS and VBA

SAS Online Training: Course contents: Agenda:

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Go Compare: Flagging up some underused options in PROC COMPARE Michael Auld, Eisai Ltd, London UK

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

PharmaSUG Paper PO12

Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC

PharmaSUG Paper AD09

Adjusting for daylight saving times. PhUSE Frankfurt, 06Nov2018, Paper CT14 Guido Wendland

PDF Multi-Level Bookmarks via SAS

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Base and Advance SAS

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS?

An Introduction to Visit Window Challenges and Solutions

PharmaSUG Paper PO10

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Interactive Programming Using Task in SAS Studio

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

SAS as a Tool to Manage Growing SDTM+ Repository for Medical Device Studies Julia Yang, Medtronic Inc. Mounds View, MN

PharmaSUG Paper CC02

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Automating Preliminary Data Cleaning in SAS

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

How to Create Data-Driven Lists

START CONVERTING FROM TEXT DATE/TIME VALUES

SAS Drug Development Program Portability

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Creating Macro Calls using Proc Freq

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

CDISC Variable Mapping and Control Terminology Implementation Made Easy

An Automation Procedure for Oracle Data Extraction and Insertion

The Proc Transpose Cookbook

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

ODS DOCUMENT, a practical example. Ruurd Bennink, OCS Consulting B.V., s-hertogenbosch, the Netherlands

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

This Too Shall Pass: Passing Simple and Complex Parameters In and Out of Macros

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

Text Wrapping with Indentation for RTF Reports

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

A Better Perspective of SASHELP Views

Run your reports through that last loop to standardize the presentation attributes

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

Transcription:

ABSTRACT PharmaSUG 2018 Paper EP-25 A SAS Macro to Create Validation Summary of Dataset Report Zemin Zeng, Sanofi, Bridgewater, NJ This paper will introduce a short SAS macro developed at work to create summary report of dataset validation. The code of the SAS macro is included in the paper and can be easily modified to fit reader s need. The macro is warmly welcomed by the programming group at work and has proven to be very useful and efficient. INTRODUCTION In Pharmaceutical industry, a statistical programming team is usually assigned to support multiple ongoing clinical studies. It is very demanding for a programming lead to manage programming activities effectively, keep a clear big picture of the team progress and document study related files in a well-organized way. Inspired by Chen s paper (Chen, Y. PharmaSUG, 2010), we developed a short SAS macro to create validation summary of dataset report. We shall share the details of the macro in this paper. The organization of the paper would be the following. First we will introduce the settings of the macro and the target output macro creates. Second we will state the key steps in the macro with detailed explanation of technical parts. Finally the code of the macro is attached in the Appendix. We hope our paper will be easily adopted by interested users to serve their needs. SETTING OF THE MACRO In a particular clinical later phase study, our statistical programming team need to create large amount of SDTM and ADaM datasets and then independently validated them. It is very desirable to have a central file to document the details of dataset creation and validation status. As shown in below Table 1. Table 1: Study level SDTM/ADaM data validation report The dataset name, primary programmer and validator columns come from the study assignment Excel sheet. At the study planning stage, we will make up with a tracking excel file for datasets in which all datasets need to be created are listed and corresponding primary and validation programmers are assigned (like Table 2 below). The last column information in Table 1 is based on validation comparison and data clean status. In our studies, we receive monthly data transfer along with data clean status provided by data management. The clean status file is a patient level data, basically says which patient s data is cleaned (identified by USUBJID). At the beginning of the study, usually there are only a few patients. As study progresses to the end like final database lock milestone, all patients in the study should be cleaned. 1

The column Validation Output is the dataset name of the comparison output data. In the validation dataset program, we ask validator to output the difference dataset with name convention Vxx_DIFF where xx is the name of the main dataset. For example, in ADAE dataset validation program, the output difference dataset will be named as VADAE_DIFF. We ask individual programmer to include code like below: title2 "COMPARISON: ADAE between primary and validation"; proc compare data=add.adae compare=qcd.v_adae listall out=qcd.vadae_diff outnoequal outbase outcomp outdif criterion=0.0000001; id usubjid AEREFID aedecod aseq astdt asttm; Table 2: Assignments on SDTM/ADaM dataset creation and validation KEY STEPS IN THE MACRO The macro %validation_data (See APPENDIX for the complete code) has only one input parameter which is primary dataset Library. In order to create summary report as in Table 1 for both SDTM and ADaM datasets, the macro will be called two times, one is for SDTM dataset library, and the second one is the ADaM dataset library. If just calls the macro once (say for SDTM library), then the summary report will only consists of SDTM dataset validation information. Below are the steps the macro will perform: Step 1: check the input library datasets, create macro variable to loop through each dataset; Step 2: get each dataset file attributes. The key SAS function will be FOPTNAME (see the link in the reference for the detailed information). %sysfunc(foptnsme(fileid)) returns the number of attributes of a given file, %sysfunc(foptnsme(fileid, i) ) returns the ith position attribute name, %sysfunc(finfo(fileid, attribute name)) returns the attribute name value (See Table 3 for complete list of attribute names and their values). In the macro output Table 1, Creation date and Validation date columns are both from the Last Modified value in Table 3 of primary and validation datasets, respectively. Table 3: File attributes from FOPTNAME Step 3: Subset dataset with only clean patients, the last column in the macro output Table 1 is the comparison results of subset datasets. Based on the study needs, this step can be skipped if clean patient list is not applicable. 2

Step 4: Analyze the comparison results (saved in vxx_diff dataset) between primary and validation datasets. The macro analyzes the difference dataset (eg. vadae_diff), the contents of the last two columns in the macro output Table 1 are from the analysis results. We list the possible values of the last second column in the macro output Table 1 as below. The same applies to the last column with an addition restriction to clean patients. Validation Findings Conditions Matched If update difference dataset has observation 0. Difference Records: xx% xx is the number of records of difference dataset divide 3. Validation Date is Before Primary If the validation date/time is prior to the primary dataset. Validation Output Data Not Exist If validation dataset doesn t exist CONCLUSION The SAS macro %validation_data is widely used in many clinical studies at work. After every data refresh and dry run, the macro output gives the programming team a nice summary of validation status, inform individual programmer to look into their data for potential issues. The macro output records the details of study dataset validation and serves as a good study document for important study milestones, like final database lock. We highly recommend peers to develop similar macro to manage dataset validations for their clinical studies. REFERENCES Chen Yang (2010), Let your title macro report study progress, Proceedings of the PharmaSUG 2010, http://www.pharmasug.org/cd/papers/ad/ad07.pdf SAS FOPTNAME Function, http://support.sas.com/documentation/cdl/en/lrdict/64316/html/default/viewer.htm#a000209587.htm ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Zemin Zeng, Sanofi, Bridgewater, NJ Email: zengzemin@gmail.com / zemin.zeng@sanofi.com APPENDIX: THE CODE OF THE MACRO /********************************************************************* * Program Name : validation_data.sas * Program Purpose : To create dataset validation report * Compound/Study/Analysis : SARxxxxx/xxxxx/CSR * Program Author : Zemin Zeng * Program creation date : 2017-01-15 * Input data files : Library name for primary datasets * Output data files : Validation_data_report_SDD/ADD * System : SAS version 9.4 - WISE environment * Macro call Sample : %validation_data(dtpart=sdd) ********************************************************************** /* ========================== Beginning of Code =================== */ 3

%macro validation_data(dtpart=sdd); * Step 1: check the input library datasets, create macro variable to loop through each dataset; proc sql; create table toc as select libname, memname, memtype from dictionary.members where libname="&dtpart" and memtype='data'; quit; *** get number of datasets and save it in macro variable &nod; data _null_; set toc end=eof; if eof then call symput('nod', compress(_n_)); * Step 2: get each dataset file attributes; %macro att(outdt=, pre=p); DATA &outdt (keep=&pre.dt &pre.own &pre.tm nkey); fid = FOPEN("_file"); numopts = FOPTNUM(fid); DO j = 1 TO numopts; optname = FOPTNAME(fid,i); optval = FINFO(fid, optname); if j = 1 then txt=optval; &pre.dt=upcase(scan(reverse(scan(reverse(txt), 1, '/')), 1, '.')); if j = 2 then &pre.own=optval; if j = 5 then &pre.tm=optval; END; nkey=&i; RUN; %mend; /*To get the subset of cleaned patients*/ proc sort data=sdd.cleanpid out=cleanpid ; by usubjid; where upcase(clean)='yes'; *** Loop through each dataset in TOC; %do i=1 %to &nod; data _null_; set toc; if _n_=%eval(&i); call symput('dtset', lowcase(trim(left(memname)))); /*if corresponding validation dataset not exist*/ %IF not %sysfunc(exist(qcd.v&dtset._diff)) %then %do; %put _ERR: Validation of &dtpart..&dtset output diff dataset does not exist!; filename _file "%sysfunc(pathname(&dtpart))/&dtset..sas7bdat"; %att(outdt=prim, pre=p); data _rpt; set prim(keep=p: nkey); length vdt vown vtm $200; vdt='na'; vown='na'; vtm='na'; vobsr=.; vobs1=.; vobsc=.; %if &dtpart=sdd %then %do; sdtm=0; %else %do; sdtm=1; %else %do; %put v&dtset._diff.sas7bdat exist; * Step 3: Subset dataset with only clean patients; %let dsid = %sysfunc(open(qcd.v&dtset._diff)); %if (&dsid) %then %do; %if %sysfunc(varnum(&dsid,usubjid)) %then %do; proc sort data=qcd.v&dtset._diff out=v&dtset._diff; by usubjid; data cln_&dtset._diff; merge cleanpid(keep=usubjid in=a) v&dtset._diff(in=b); by usubjid; if a and b; %else %do ; data cln_&dtset._diff; set qcd.v&dtset._diff; 4

%let rc = %sysfunc(close(&dsid)); /*to subset to cleanpid--primary*/ %let dsid = %sysfunc(open(&dtpart..&dtset)); %if (&dsid) %then %do; %if %sysfunc(varnum(&dsid,usubjid)) %then %do; data pc&dtset; merge cleanpid(keep=usubjid in=a) &dtpart..&dtset(in=b); by usubjid; if a and b; %else %do ; data pc&dtset; set &dtpart..&dtset; %let rc = %sysfunc(close(&dsid)); %LET DSID = %SYSFUNC(OPEN(qcd.v&dtset._diff)); %LET NUMOBS =%SYSFUNC(ATTRN(&DSID,NLOBS)); %LET RC = %SYSFUNC(CLOSE(&DSID)); %LET DSID = %SYSFUNC(OPEN(&dtpart..&dtset)); %LET PNUMOBS =%SYSFUNC(ATTRN(&DSID,NLOBS)); %LET RC = %SYSFUNC(CLOSE(&DSID)); %LET DSID = %SYSFUNC(OPEN(work.cln_&dtset._diff)); %LET CNUMOBS =%SYSFUNC(ATTRN(&DSID,NLOBS)); %LET RC = %SYSFUNC(CLOSE(&DSID)); %LET DSID = %SYSFUNC(OPEN(work.pc&dtset)); %LET CPNUMOBS =%SYSFUNC(ATTRN(&DSID,NLOBS)); %LET RC = %SYSFUNC(CLOSE(&DSID)); %IF &NUMOBS ne 0 %THEN %DO; %put The number of Observations= &NUMOBS; %END; filename _file "%sysfunc(pathname(&dtpart))/&dtset..sas7bdat"; %att(outdt=prim, pre=p); filename _file "%sysfunc(pathname(qcd))/v&dtset._diff.sas7bdat"; %att(outdt=val, pre=v); data _rpt; merge prim(keep=p: nkey) val(keep=v: nkey); by nkey; vobs1=&numobs; vobsr=%sysevalf(100*(&numobs/&pnumobs)/3); %if &CPNUMOBS ne 0 %then %do; vobsc=%sysevalf(100*(&cnumobs/&cpnumobs)/3); %if &CPNUMOBS = 0 %then %do; vobsc=9999; %if &dtpart=sdd %then %do; sdtm=0; %else %do; sdtm=1; /*end with validation dataset not zero obs*/ proc append base=_result data=_rpt force; /*loop through dataset end*/ /**************************************************** Reporting handling *****************************************************/ *Step 4: Analyze the comparison results; data _result; set _result; length des desc $50; if vobsr=. then des='validation Output Data Not Exist'; else if vobsr=0 then des='matched!'; else des='difference Records: ' strip(put(vobsr, 5.2)) '%'; if vobsc=. then desc='validation Output Data Not Exist'; else if vobsc=0 then desc='matched!'; else desc='difference Records: ' strip(put(vobsc, 5.2)) '%'; if vobsc=9999 then desc='no Clean Data'; Sig=''; if length(ptm)=18 then do; 5

_pdt1=substr(ptm, 1, 9); _pdt2=substr(ptm, 10); dt=input(strip(_pdt1), date9.); _pdtc=strip(put(dt, yymmdd10.)) "T" _pdt2; end; if length(vtm)=18 then do; _vdt1=substr(vtm, 1, 9); _vdt2=substr(vtm, 10); _vdt=input(strip(_vdt1), date9.); _vdtc=strip(put(_vdt, yymmdd10.)) "T" _vdt2; if _vdtc<_pdtc then do; des='validation Date is Before Primary'; desc='validation Date is Before Primary'; end; end; format _vdt yymmdd10.; proc print data=_result; var sdtm pdt pown ptm vdt vown vtm des: sig; *to output listing use gstars macro; %if &dtpart=add %then %do; /*To get the primary and validator's names from the tracking sheet */ data _track(keep=sdtm pdt pown vown); infile "%sysfunc(pathname(doco))/valid_data.csv" truncover firstobs=2; input line $200.; sdtm=input(scan(line, 1, ','), best.); pdt=scan(line, 2, ','); pown=scan(line, 3, ','); vown=scan(line, 4, ','); proc sort; by sdtm pdt; data _result; merge _result(in=a drop=pown vown) _track; by sdtm pdt; if a; /*Call Sanofi macro to create rft output*/ %gslist(filename = Validation_data_report,pgmname = validation,location=qc,dataset = _result,columns = pdt pown ptm vdt vtm vown des desc,vardef = pdt [wrap=y label="primary \par Data"] pown [wrap=y label="owner"] ptm [wrap=y label="creation \par Date"] vdt [wrap=y label="validation \par Output"] vtm [wrap=y label="validation \par Date"] des [wrap=y label="validation \par Findings"] vown [wrap=y label="validator"] desc [wrap=y label="validation \par Findings \par Clean Data"], varby =, varbylab =,orderby =,firstonly =,breakafter =,title1,options options mprint mlogic missing=''; *Stop the macro; %exit: %mend; =%str(summary of validation Dataset Report) = dest=app fontsize=8 stretch=y orient=l Freflist=N &GlbOpt4GS colsize=0.6 0.6 1 0.8 1 0.6 1.6 1.6 ); /*Macro call to output validation report*/ %validation_data(dtpart=sdd); %validation_data(dtpart=add); 6