Tracking Dataset Dependencies in Clinical Trials Reporting

Similar documents
A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Useful Tips When Deploying SAS Code in a Production Environment

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Troy Martin Hughes ABSTRACT INTRODUCTION

Functions vs. Macros: A Comparison and Summary

Validation Summary using SYSINFO

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

A Mass Symphony: Directing the Program Logs, Lists, and Outputs

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Organizing Deliverables for Clinical Trials The Concept of Analyses and its Implementation in EXACT

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

SAS Online Training: Course contents: Agenda:

Storing and Reusing Macros

PharmaSUG Paper TT11

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

The Output Bundle: A Solution for a Fully Documented Program Run

Implementing external file processing with no record delimiter via a metadata-driven approach

Keeping Track of Database Changes During Database Lock

Pros and Cons of Interactive SAS Mode vs. Batch Mode Irina Walsh, ClinOps, LLC, San Francisco, CA

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

A SAS Macro to Create Validation Summary of Dataset Report

A Macro that can Search and Replace String in your SAS Programs

A Few Quick and Efficient Ways to Compare Data

ABSTRACT INTRODUCTION MACRO. Paper RF

WHAT ARE SASHELP VIEWS?

Data Quality Review for Missing Values and Outliers

Run your reports through that last loop to standardize the presentation attributes

Contents of SAS Programming Techniques

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

An Efficient Tool for Clinical Data Check

Create Metadata Documentation using ExcelXP

Your Own SAS Macros Are as Powerful as You Are Ingenious

Comparison of different ways using table lookups on huge tables

A simplistic approach to Grid Computing Edmonton SAS Users Group. April 5, 2016 Bill Benson, Enterprise Data Scienc ATB Financial

A Table Driven ODS Macro Diane E. Brown, exponential Systems, Indianapolis, IN

Applications Development

Patricia Guldin, Merck & Co., Inc., Kenilworth, NJ USA

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Locking SAS Data Objects

The Submission Data File System Automating the Creation of CDISC SDTM and ADaM Datasets

The SERVER Procedure. Introduction. Syntax CHAPTER 8

Using an ICPSR set-up file to create a SAS dataset

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA

Interactive Programming Using Task in SAS Studio

PharmaSUG Paper AD06

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

ABC Macro and Performance Chart with Benchmarks Annotation

Automating Preliminary Data Cleaning in SAS

DBLOAD Procedure Reference

T.I.P.S. (Techniques and Information for Programming in SAS )

A Better Perspective of SASHELP Views

Note: Basic understanding of the CDISC ODM structure of Events, Forms, ItemGroups, Items, Codelists and MeasurementUnits is required.

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Paper AP14 Modifying The LogParse PassInfo Macro to Provide a Link between Product Usage in Rtrace Log and Time Used in Job Log

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

One Project, Two Teams: The Unblind Leading the Blind

Developing Data-Driven SAS Programs Using Proc Contents

SAS Data Integration Studio 3.3. User s Guide

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

SAS Drug Development Program Portability

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Conversion of CDISC specifications to CDISC data specifications driven SAS programming for CDISC data mapping

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Taming a Spreadsheet Importation Monster

PharmaSUG Paper CC02

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Chasing the log file while running the SAS program

Edwin Ponraj Thangarajan, PRA Health Sciences, Chennai, India Giri Balasubramanian, PRA Health Sciences, Chennai, India

Integrating Large Datasets from Multiple Sources Calgary SAS Users Group (CSUG)

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Review of PC-SAS Batch Programming

Plot Your Custom Regions on SAS Visual Analytics Geo Maps

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

ABSTRACT INTRODUCTION THE GENERAL FORM AND SIMPLE CODE

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

MedDRA Dictionary: Reporting Version Updates Using SAS and Excel

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Transcription:

Tracking Dataset Dependencies in Clinical Trials Reporting Binoy Varghese, Cybrid Inc., Wormleysburg, PA Satyanarayana Mogallapu, IT America Inc., Edison, NJ ABSTRACT Most clinical trials study reporting involves creation of analysis datasets from raw data. Analysis datasets function as an intermediate point where complex computations are performed on raw data and stored for later use in tables, listings and graphs. Depending on the complexity of the study and the endpoints being analyzed, these datasets may have to be created using not only raw data but also other analysis datasets. If there are a large number of analysis datasets in a study, it is not only time consuming to manually document dependencies but may also present an area where human errors may lead to severe consequences. The purpose of this paper is to present a technique that will enable to automatically store meta information about datasets including their dependencies that can be used to ensure quality control checks. Some such checks are: output dataset has a time stamp later than input datasets being used for creation, datasets do not have circular dependency, etc. INTRODUCTION Most organizations use a hierarchical folder structure to organize datasets, tables, listings, graphs, programs and associated validation work. Some systems have a built-in framework that automatically tracks analysis to raw/analysis data dependency. If such a framework is not available, programmers can keep track of dependencies by manually documenting them. The dependency list would then determine the order in which analysis dataset programs will be run. Maintaining a dependency list is not only a laborious process that involves constant coordination between team members but also poses the risk of inadvertent human errors. If a considerable amount of analysis datasets are being created (eg. Phase II/Phase III trials), the risk and labor get compounded. Automatic tracking of dataset dependencies can be enabled by making modifications to the programming infrastructure and introducing %read and %write macro calls in the analysis dataset programs. The infrastructure changes and macros are discussed in detail in the subsequent sections. The complete SAS code for the macros (%read and %write) is provided in the paper and can be used as is, in most cases. FOLDER STRUCTURE AND METADATA INFORMATION Fig. 1 Typical folder structure Fig. 2 Modified folder structure 1

Fig.1 is a snapshot of a typical folder structure. Fig.2 shows the relative location where metadata folder is created. This folder will contain metadata information generated by the analysis dataset programs using macros %read and %write. Fig.3 is a snapshot of the contents of metadata folder. The dataset meta_a_dem.sas7bdat is automatically created by the analysis dataset program a_dem.sas. The convention followed in naming the metadata dataset is to append meta_ to the program name. Fig. 3 Sample contents of metadata folder Fig. 4 Data structure and sample information contained in meta_a_dem.sas7bdat Fig.4 shows the data structure and typical contents of the meta dataset. The attrib variable helps in identifying the input and output datasets and the program that created this information. The datetime variable has creation datetime information of the input and output datasets and the time at which the program is submitted to the SAS engine for execution. AN and EXTRACT are libnames defined in the autoexec.sas file used by analysis dataset programs. AUTOEXEC FILE The autoexec.sas file used by analysis dataset programs has to be modified to include a library reference meta to the metadata folder. The libname meta is used by %read and %write macros. libname meta "<path to metadata folder>"; 2

The sasautos option has to revised to point to the macro folder, if it is not already pointing to this location. The macro folder will contain read.sas and write.sas. options sasautos=("<path to macro folder>") mautosource; %READ MACRO The %read macro is used to read datasets from analysis and extracts data library to the work data library. The %read macro has 3 parameters. They are: LIB source library name (Required parameter) DSN input dataset name (Required parameter) OUT output dataset name (Optional parameter). If this parameter is not specified in the macro call, the dataset copied to the work library will have the same name as the input dataset. The algorithm used in the %read macro is divided into 2 parts: INITIALIZATION This part of the macro is executed only once during a SAS session. The tasks performed are: 1. Obtain <program name>. 2. Check if metadata dataset exists. If so, delete existing dataset. 3. Create dataset structure and output <program name>. 4. Save meta_<program name> dataset in the metadata folder REPETITION This part of the macro is executed at each macro invocation during a SAS session. The tasks performed are: 5. Read dataset from source data library and store in work data library 6. Obtain last modified datetime information and append to meta_<program name> dataset in metadata folder The complete sas code for the %read macro is listed below. Copy the code as is and save as read.sas in the macro folder. %macro read(lib=,dsn=,out=); %if %symexist(firstcall) eq 0 %then %do; %global progname; proc sql; select distinct scan(scan(trim(left(xpath)), -1, "\"),1,'.') into: progname from sashelp.vextfl where index(upcase(xpath),'.sas'); quit proc datasets library=meta nolist; delete meta_&progname; data meta.meta_&progname; length metadata attrib $100; format datetime datetime.; datetime=input("&sysdate:&systime",datetime.); metadata=upcase("&progname"); attrib='program NAME'; output; label metadata='meta data' attrib='attribute' datetime='date & Time' ; 3

%end; %if %symexist(firstcall) eq 0 %then %do; %global firstcall; %let firstcall=1; %end; %if &out= %then %let out=&dsn; data work.&out; set &lib..&dsn; ods listing close; ods output Attributes=meta._temp_attrib_&lib._&dsn; proc contents data=&lib..&dsn; ods output close; ods listing; data meta._temp_attrib_&lib._&dsn; length metadata attrib $100; format datetime datetime.; set meta._temp_attrib_&lib._&dsn; where compress(upcase(label1))='lastmodified'; metadata=upcase("&lib..&dsn"); attrib="input DATA"; datetime=nvalue1; keep metadata attrib datetime; proc datasets library=meta nolist; append base=meta.meta_&progname data=meta._temp_attrib_&lib._&dsn; delete _temp_attrib_&lib._&dsn; proc sql undo_policy=none; create table meta.meta_&progname as select distinct * from meta.meta_&progname; %mend read; %WRITE MACRO The %write macro call is made after the analysis dataset is created in the work data library. The %write macro stores the analysis dataset in the analysis data library. With regard to Fig.1 this location is mystudy\data\analysis. The library reference an pointing to this location must be defined the autoexec.sas file. The %write macro has only 1 required parameter which is the analysis dataset name to be stored. The %write macro does not have an initialization stage although it uses macro variable progname that is created by the %read macro. This is based on the assumption that the %write macro call will be made only after the %read macro has been invoked at least once. The tasks performed by the %write macro are: 1. Read dataset from work data library and store in analysis data library 2. Obtain last modified datetime information and append to meta_<program name> dataset in metadata folder The complete sas code for the %write macro is listed below. Copy the code as is and save as write.sas in the macro folder. %macro write(dsn=); proc datasets nolist; 4

copy in=work out=an; select &dsn; ods listing close; ods output Attributes=meta._temp_attrib_an_&dsn; proc contents data=an.&dsn; ods output close; ods listing; data meta._temp_attrib_an_&dsn; length metadata attrib $100; format datetime datetime.; set meta._temp_attrib_an_&dsn; where compress(upcase(label1))='lastmodified'; metadata=upcase("an.&dsn"); attrib="output DATA"; datetime=nvalue1; keep metadata attrib datetime; proc datasets library=meta nolist; append base=meta.meta_&progname data=meta._temp_attrib_an_&dsn; delete _temp_attrib_an_&dsn; proc sql undo_policy=none; create table meta.meta_&progname as select distinct * from meta.meta_&progname; %mend write; SAMPLE ANALYSIS DATASET PROGRAM Fig. 5 shows %read and %write macro calls in a sample analysis dataset program. Fig. 5 Sample analysis dataset program --SAS PROCESSING %read(lib=an,dsn=dem); --SAS PROCESSING %read(lib=extract,dsn=ae); --SAS PROCESSING %write(dsn=ae); 5

USING THE METADATA INFORMATION SAMPLE APPLICATIONS #1 GENERATING DATA DEPENDENCY LISTING Macro %gen_dep_list uses the metadata information to create a dependency list of all analysis datasets. Fig. 6 shows the proc print output generated by the macro. Fig. 6 Proc Print ouput generated by %gen_dep_list The complete SAS code for the %gen_dep_list is listed below. libname meta "<path of metadata folder>"; %macro gen_dep_list; ods listing close; ods output members=memlist(keep=name); proc datasets library=meta memtype=data; ods output close; ods listing; data _null_; set memlist end=last; call symput('m' compress(put(_n_,best.)),compress(name)); if last then call symput('n',compress(put(_n_,best.))); %do dcnt=1 %to &n; data &&m&dcnt; set meta.&&m&dcnt; if attrib=:'input' then ordr=1; if attrib=:'output' then ordr=2; if attrib=:'program' then ordr=0; proc sort data=&&m&dcnt; by ordr metadata; data &&m&dcnt(keep=pgmname metadata inputdsn rename=(metadata=outputdsn)); retain inputdsn pgmname; length inputdsn $200 pgmname $20; set &&m&dcnt; by ordr metadata; if ordr=0 then pgmname=compress(metadata); if ordr=1 and first.ordr then inputdsn=trim(inputdsn) compress(metadata); else if ordr=1 and not first.ordr then inputdsn=trim(inputdsn) ', ' compress(metadata); if ordr=2 then output; %end; 6

data metadata; set %do dcnt=1 %to &n; &&m&dcnt %end; ; proc sort data=metadata; by pgmname; proc print data=metadata; var pgmname outputdsn inputdsn; %mend gen_dep_list; %gen_dep_list; #2 EXECUTION ORDER FOR ANALYSIS DATASET PROGRAMS Macro %exe_order uses the metadata information to obtain the execution order for analysis dataset programs. Fig. 7 shows the proc print output generated by the macro. Fig. 7 Proc Print output generated by %exe_order PROGRAM ANALYSIS DATASET PRIORITY a_dem.sas dem.sas7bdat 1 a_ae.sas ae.sas7bdat 2 a_subjchar.sas subjchar.sas7bdat 3 Fig. 8 Relationship between programs, analysis datasets and priority values The algorithm used in this macro is briefly described below: 1. Create a dataset which contains metadata information for all analysis dataset programs. 2. If the analysis dataset program is not using any other analysis dataset, keep only one observation with inputdsn missing otherwise keep all observations where inputdsn begins with AN. 3. For analysis datasets that have inputdsn missing, assign a priority =1 otherwise assign priority =0. 4. For any analysis dataset with a priority=0, iterate through the dataset to check if all input analysis datasets have been assigned a priority value greater than 0. If all input analysis datasets have a non zero priority value then assign priority value = maximum of (priority values of input datasets) + 1, otherwise reset priority value to 0. 5. Once all analysis datasets have a non-zero priority, the program execution order is determined as the maximum value of priority assigned to each program. 7

The complete SAS code for the %exe_order is listed below. %macro exe_order; ods listing close; ods output members=memlist(keep=name); proc datasets library=meta memtype=data; ods output close; ods listing; data _null_; set memlist end=last; call symput('m' compress(put(_n_,best.)),compress(name)); if last then call symput('n',compress(put(_n_,best.))); %do dcnt=1 %to &n; %end; proc sql undo_policy=none; create table &&m&dcnt as select distinct progname, outputdsn, case when cnt=0 then '' else inputdsn end as inputdsn from (select *, sum(index(inputdsn,'an.')) as cnt from (select a.metadata as inputdsn, b.metadata as outputdsn,c.metadata as progname from (select * from meta.&&m&dcnt where attrib='input DATA') a, (select * from meta.&&m&dcnt where attrib='output DATA') b, (select * from meta.&&m&dcnt where attrib='program NAME') c)) where cnt=0 or index(inputdsn,'an.') > 0; data metadata; set %do dcnt=1 %to &n; &&m&dcnt %end; ; if inputdsn='' then priority=1; else priority=0; %let misspr=1; %do %while (&misspr > 0); proc sql undo_policy=none; create table int1 as select * from metadata where inputdsn in (select outputdsn from metadata where priority ne 0) order by progname,outputdsn,inputdsn; create table int2 as select distinct a.progname, a.inputdsn, a.outputdsn, sum(b.priority,1) as priority from int1 a, metadata b 8

where a.inputdsn=b.outputdsn order by progname,outputdsn,inputdsn; proc sort data=metadata; by progname outputdsn inputdsn; data metadata(keep=progname outputdsn inputdsn priority); merge metadata(in=a) int2(in=b rename=(priority=newpr)); by progname outputdsn inputdsn; if a; if b then priority=newpr; proc sql undo_policy=none; create table metadata as select inputdsn, outputdsn, progname, case when min(priority)=0 then 0 else max(priority) end as priority from metadata group by progname, outputdsn; proc sql; select count(*) into: misspr from metadata where priority = 0; %end; proc sql; create table pgmordr as select distinct progname length=20 format=$20., max(priority) as priority from metadata group by progname order by priority, progname; proc print data=pgmordr; var progname priority; %mend exe_order; %exe_order; CONCLUSIONS The technique discussed in this paper makes 3 assumptions: 1. Each analysis dataset program within a study will read and write datasets only using the %read and %write macro calls. 2. The metadata folder will contain only datasets created by %read and %write macro calls. 3. The datasets contained within metadata folder will not be renamed, modified or deleted. If these assumptions are not complied with, then the metadata information will be incomplete and unusable. The %read and %write macro calls create metadata datasets for each program and not a single dataset within a study which will encompass information pertaining to all analysis datasets. This is because SAS/SHARE is required for concurrent update of a single SAS dataset whereas this technique avoids the need for concurrent updates. 9

Although this technique will accommodate programs that generate multiple analysis datasets, it is advisable to use one program to generate only one analysis dataset. This technique has been successfully tested in batch mode using SAS version 9.2 on Windows Operating System. ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Binoy Varghese Cybrid Inc., Wormleysburg, PA mailme@binoyvarghese.com www.clinicalsasprogramming.com Satyanarayana Mogallapu IT America Inc., Edison, NJ mogallapuvs@yahoo.com 10