An Efficient Solution to Efficacy ADaM Design and Implementation

Similar documents
The Implementation of Display Auto-Generation with Analysis Results Metadata Driven Method

How to write ADaM specifications like a ninja.

Creating an ADaM Data Set for Correlation Analyses

Deriving Rows in CDISC ADaM BDS Datasets

PharmaSUG Paper DS24

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

Introduction to ADaM and What s new in ADaM

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

DCDISC Users Group. Nate Freimark Omnicare Clinical Research Presented on

Traceability Look for the source of your analysis results

Introduction to ADaM standards

It s All About Getting the Source and Codelist Implementation Right for ADaM Define.xml v2.0

Implementing CDISC Using SAS. Full book available for purchase here.

Riepilogo e Spazio Q&A

Applying ADaM Principles in Developing a Response Analysis Dataset

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments...

ADaM and traceability: Chiesi experience

An Introduction to Visit Window Challenges and Solutions

Programming checks: Reviewing the overall quality of the deliverables without parallel programming

PharmaSUG2014 Paper DS09

What is the ADAM OTHER Class of Datasets, and When Should it be Used? John Troxell, Data Standards Consulting

A Taste of SDTM in Real Time

PharmaSUG DS05

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

Material covered in the Dec 2014 FDA Binding Guidances

Automated Creation of Submission-Ready Artifacts Silas McKee, Accenture, Pennsylvania, USA Lourdes Devenney, Accenture, Pennsylvania, USA

ADaM Implementation Guide Prepared by the CDISC ADaM Team

Why organizations need MDR system to manage clinical metadata?

ADaM for Medical Devices: Extending the Current ADaM Structures

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

Traceability in the ADaM Standard Ed Lombardi, SynteractHCR, Inc., Carlsbad, CA

An Alternate Way to Create the Standard SDTM Domains

PharmaSUG 2014 PO16. Category CDASH SDTM ADaM. Submission in standardized tabular form. Structure Flexible Rigid Flexible * No Yes Yes

PharmaSUG Paper PO22

ADaM Compliance Starts with ADaM Specifications

Use of Traceability Chains in Study Data and Metadata for Regulatory Electronic Submission

CDASH Standards and EDC CRF Library. Guang-liang Wang September 18, Q3 DCDISC Meeting

AUTOMATED CREATION OF SUBMISSION-READY ARTIFACTS SILAS MCKEE

Automate Analysis Results Metadata in the Define-XML v2.0. Hong Qi, Majdoub Haloui, Larry Wu, Gregory T Golm Merck & Co., Inc.

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

From SAP to BDS: The Nuts and Bolts Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, United BioSource Corp., Ann Arbor, MI

PharmaSUG Paper PO21

Sandra Minjoe, Accenture Life Sciences John Brega, PharmaStat. PharmaSUG Single Day Event San Francisco Bay Area

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania Kim Minkalis, Accenture Life Sciences, Wayne, Pennsylvania

Xiangchen (Bob) Cui, Tathabbai Pakalapati, Qunming Dong Vertex Pharmaceuticals, Cambridge, MA

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation

Updates on CDISC Standards Validation

Some Considerations When Designing ADaM Datasets

Analysis Data Model Implementation Guide Version 1.1 (Draft) Prepared by the CDISC Analysis Data Model Team

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library

Legacy to SDTM Conversion Workshop: Tools and Techniques

ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

TS04. Running OpenCDISC from SAS. Mark Crangle

NCI/CDISC or User Specified CT

Customizing SAS Data Integration Studio to Generate CDISC Compliant SDTM 3.1 Domains

Study Data Reviewer s Guide Completion Guideline

Leveraging Study Data Reviewer s Guide (SDRG) in Building FDA s Confidence in Sponsor s Submitted Datasets

ADaM Reviewer s Guide Interpretation and Implementation

PharmaSUG China Paper 70

How to handle different versions of SDTM & DEFINE generation in a Single Study?

Beyond OpenCDISC: Using Define.xml Metadata to Ensure End-to-End Submission Integrity. John Brega Linda Collins PharmaStat LLC

Clinical Metadata Metadata management with a CDISC mindset

PharmaSUG Paper DS-24. Family of PARAM***: PARAM, PARAMCD, PARAMN, PARCATy(N), PARAMTYP

Leveraging ADaM Principles to Make Analysis Database and Table Programming More Efficient Andrew L Hulme, PPD, Kansas City, MO

Managing CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

esubmission - Are you really Compliant?

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

CDASH MODEL 1.0 AND CDASHIG 2.0. Kathleen Mellars Special Thanks to the CDASH Model and CDASHIG Teams

CDISC Variable Mapping and Control Terminology Implementation Made Easy

PharmaSUG Paper PO10

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

Edwin Ponraj Thangarajan, PRA Health Sciences, Chennai, India Giri Balasubramanian, PRA Health Sciences, Chennai, India

Linking Metadata from CDASH to ADaM Author: João Gonçalves Business & Decision Life Sciences, Brussels, Belgium

Automate Clinical Trial Data Issue Checking and Tracking

Standards Driven Innovation

Interactive Programming Using Task in SAS Studio

Clarifications About ADaM Implementation Provided in ADaMIG Version 1.1

DIA 11234: CDER Data Standards Common Issues Document webinar questions

PhUSE US Connect 2019

ADaM Implementation Guide Status Update

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

THE DATA DETECTIVE HINTS AND TIPS FOR INDEPENDENT PROGRAMMING QC. PhUSE Bethan Thomas DATE PRESENTED BY

PharmaSUG Paper PO03

Dealing with changing versions of SDTM and Controlled Terminology (CT)

Validating Analysis Data Set without Double Programming - An Alternative Way to Validate the Analysis Data Set

CDISC Standards and the Semantic Web

Dataset-XML - A New CDISC Standard

The Wonderful World of Define.xml.. Practical Uses Today. Mark Wheeldon, CEO, Formedix DC User Group, Washington, 9 th December 2008

Customer oriented CDISC implementation

Creating Define-XML version 2 including Analysis Results Metadata with the SAS Clinical Standards Toolkit

PhUSE Paper TT05

Creating Define-XML v2 with the SAS Clinical Standards Toolkit 1.6 Lex Jansen, SAS

SAS (Statistical Analysis Software/System)

PharmaSUG Paper AD03

Transcription:

PharmaSUG 2017 - Paper AD05 An Efficient Solution to Efficacy ADaM Design and Implementation Chengxin Li, Pfizer Consumer Healthcare, Madison, NJ, USA Zhongwei Zhou, Pfizer Consumer Healthcare, Madison, NJ, USA ABSTRACT In clinical trial data processing, the design and implementation of efficacy datasets are often challenging. The efficacy datasets here refer to the analysis data subject level (ADSL) and efficacy endpoints datasets at specific study level (e.g., ADEFF). Those two types of datasets are also recommended for FDA submission. To achieve optimal programming with efficient and reusable codes, this paper investigates some standardization methods for the design and implementation of ADSL and ADEFF datasets. For ADSL, based on rigid SDTM common domains, and ADSL components and functions, ADSL variables are further designated into categories of global, project, and study (GPS). The global variables (approximately 80% of all ADSL variables) in ADSL are specified, derived, and validated only once within a company; the project variables can be further managed within a therapeutic area or an indication; and study variables are handled at specific study level. A global macro is developed to implement the ADSL processing, where the macro is called for deriving G variables and P level variables. The S level variables are added from study programming team. Therefore the programming team can focus mainly on study specific variable derivations. For ADEFF, this paper introduces a two-layer ADaM design method for generating the efficacy endpoints dataset. The first layer is an interim dataset developed with timing windows and imputation rules. Then derived from the first layer dataset, the second layer is an endpoints dataset holding either binary or continuous endpoints in a vertical or horizontal structure. In each layer, the derivation flows in sequential steps; the individual steps are maximally macrotized, e.g., for the derivation of LOCF. With this approach, the complicated concepts are divided into simpler manageable steps and then assembled together and further polished (i.e., aligning metadata with specifications). The second layer dataset is used for supporting all the efficacy endpoint analyses. For traceability, it is also recommended to submit the first layer dataset. INTRODUCTION The Clinical Data Interchange Standards Consortium (CDISC) 1 has defined a series of data models. CDASH (Clinical Data Acquisition Standards Harmonization) is a data collection standard harmonized with SDTM (Study Data Tabulation Model). SDTM should fully reflect the collected data (e.g., mapping for any collected data and deriving a limited number of variables, but no imputation for missing data). ADaM (Analysis Data Model) should only be derived from SDTM. The key endpoint analyses, inferential analyses, and complicated analyses should be designed in ADaM datasets. However, not every analysis needs to have a corresponding ADaM dataset. Some simple tables can be directly created from SDTM. The CDISC data processing models are illustrated in Figure 1. Traceability and analysis-ready concepts are the two core features of the ADaM design process. ADaM datasets should fully support analyses and facilitate reviews. There are several dataset structures defined in the ADaM Standards such as: Subject-Level Analysis Dataset (ADSL), Basic Dataset Structure (BDS), and Occurrence Dataset Structure (OCCDS). The efficacy dataset (ADEFF) is often designed and implemented using a BDS structure and can be challenging to program. Safety analysis datasets using 1 http://www.cdisc.org 1

either a BDS or an OCCDS structure tend to be more straightforward in terms of design and implementation. This paper summarizes the practices of efficacy dataset generations including ADSL and ADEFF, introducing ADSL generation with GPS driven method and ADEFF generation with two-layer ADaM design method, respectively. Here ADSL functions as a supporting dataset for ADEFF. Figure 1 CDISC Data Processing Model For illustration purpose, the trial example, if applied, is simplified as a randomized population and parallel trial design. For missing values, last observation carried forward (LOCF) approach is assumed. The SDTM QS, LB domains are assumed to derive ADaM efficacy endpoints. The design and implementation of integrated summary of efficacy (ISE) datasets are beyond the scope of this paper. ADSL DESIGN AND IMPLEMENTATON ADSL dataset is a required submission dataset, structuring one record per subject and describing attributes of a subject not varying over visits during the course of a study. ADaM Implementation Guide (IG) version1.1 specifies standard variables of subject identifiers, demographics, population indicators, treatment, trial dates, and trial level experience variables like disposition and overall compliance. Additionally, FDA Study Data Technical Conformance Guide 2 further requires important baseline subject characteristics, and covariates presented in the study protocol should also be listed in ADSL and other ADaM datasets. 2 http://www.fda.gov/downloads/forindustry/datastandards/studydatastandards/ucm384744.pdf 2

From the above components of ADSL, ADSL is able to support key subject evaluations and also provide source variables to other ADaM datasets. The key subject evaluations may include the demographic table, baseline table, disposition table, and optionally the overall exposure table and the overall treatment compliance table. In the ADaM occurrence data structure (e.g., ADAE), the denominator used for percentage calculation in the analysis is also directly summarized from ADSL. The multiplicity of information in ADSL requires multiple domains as the sources to the ADSL. However, most variables can be directly copied or derived from common SDTM domains, e.g., DM, DS, EC/EX, and VS. Other variables such as study specific baselines, strata, covariates make the derivations more flexible. The source data may come from LB, QS, or other therapeutic area SDTM domains. ADSL for common variables based on the common domains are formalized as global variables across all studies, thus specified, derived, and validated only once but used across all the studies. Approximately 80% of ADSL variables can be defined as global. The other ADSL variables such as indication and study specific baselines and covariates can be designated as therapeutic area, project or study. For instance, in Virology, numeric Baseline HCV Viral Load Value (IU/mL) is a project variable, consistently derived from non-missing LB.LBSTRESN before or on treatment start date with LB.LBTESTCD='HCVVLD'. However, the variable for Statin Usage is only used in a specific study, derived from scanning CM.CMDECOD, thus specified only at study level. A global macro is developed to implement the ADSL global variables processing, and optionally, another global macro is further developed for project variable derivations. The study level variables are specifically added by the study programming team. The method is named as GPS navigation method. Based on GPS navigation, the ADSL processing flow is illustrated in figure 2. Deriving Global variables by global macro call Deriving Project variables by project macro call Deriving Study variables with specification ADSL generated by metadata alignment with specification, e.g., for variable orders Figure 2 ADSL Generation with GPS Navigation Method There are multiple ways to manage ADSL GPS metadata for the ADSL generation in production such as with MDR (metadata repository) or Excel sheet in DEFINE.xml style designated with G, P, and S in one additional column. The horizontal structure of ADSL makes the GPS navigation method feasible and operational. It may not be applied to other vertical ADaM structures such as OCCDS or BDS. The method introduced here is still semi-automated in the sense that S variables are still being specifically handled by the study team. 3

However, from the project management perspective, the solution is simple, efficient, and easy to implement in production. ADEFF DESIGN AND IMPLEMENTATON In addition to ADSL, a limited number of SDTM domains as input dataset are needed to develop the efficacy ADaM dataset (ADEFF). The required SDTM domains would be commonly LB, QS, or therapeutic area (TA) specific domain(s) (mostly called SDTM efficacy domain(s)). For efficacy analyses, timing windows and imputations are widely defined in a statistical analysis plan (SAP) along with endpoints definitions. ADEFF should comply with ADaM implementation guide and agency requirements (e.g., FDA Study Data Technical Conformance Guide). To achieve better implementation, a structured design technique, consisting in dividing a complex task or concept into several simple modules (procedures), then inter-relating those modules (procedures), should be performed. With this approach, the programming codes become easier to implement, understand, debug, and maintain. Readability with less complexity is a very important factor in programming (e.g., less macro layers, appropriate comments). Structured design facilitates readability, making the implementation easy to understand and review. Incorporating the above efficacy ADaM design factors, one two-layer design method is developed. Figure 3 illustrates the architecture of ADaM efficacy dataset design. LB QS ADSL LB, QS as input SDTMs; ADSL providing core variables ADINTRM ADEFF Two-layer ADaM Design: ADINTRM is an interim dataset, loading raw data, applying time window and imputation; ADEFF is a binary and/or continuous endpoints dataset, further derived from ADINTRM and supporting efficacy endpoints analyses ADTTE Efficacy Endpoints Analyses Figure 3 Architecture of Efficacy Datasets Design ADSL functions as a driver dataset to the efficacy dataset, providing core variables of demographics and baseline characteristics, analysis population set flags, planned treatment groups, covariates, sub-groups. Even though the interim ADaM dataset ADINTRM is not designed to support any analyses, but rather support the derivations of further endpoints, ADINTRM should be submitted for traceability purpose. Additionally ADINTRM may support listings. 4

Depending on the analyses requested in the SAP, an efficacy ADaM time-to-event (ADTTE) dataset may be further derived from the ADEFF dataset. Under two-layer efficacy ADaM design architecture, figure 4 depicts the sequential derivation flow of ADINTRM. With the listed steps in figure 4, the interim ADINTRM can be easily developed. Considering a randomized population set as the base in the first step (step ❶), screen failure subjects are excluded from the dataset in the beginning. The supportive core variables are included only at the final step (step ❺). With this method, the temporary datasets are cleaner and easier to manipulate. If a follow-up study visit comes in an on-treatment phase illogically, apply time window separately by prior to or on treatment (TRTSDT as the reference) and after treatment (TRTEDT as the reference) in step ❷. The LOCF (step ❸), which is one example of common imputation methods, is a much formalized process in programming. It can be coded as a macro. When defining a visit shell, one needs to follow the planned visit schedule defined in protocol. It is preferable to carry forward timing variables as well from previous non-missing AVAL if those variables are currently missing, e.g., VISIT, VISITNUM, VISITDY, ADT, ADY. Technically, the derivations of parameter-invariant variables (step ❹) should be put at a later programming stage. Get ID variables forming base from ADSL where RANDFL= Y ; Load raw data from SDTMs; Left join two datasets❶ Derive AVISIT/AVISITN by applying time windows with formats separately on ADY of planned study phases❷ Impute missing AVAL using LOCF; when carrying subject s previous non-missing value, also carries timing variables❸ Derive parameter invariant variables, e.g., ABLFL, BASE, AWTDIFF, ANL01FL, PARAMN, etc. ❹ Merge all core variables from ADSL such as key demographic, baseline, covariate, bygroup, treatment group, population set, date, etc. ❺ Figure 4 Derivation flow of ADINTRM The endpoint is a function of selected items in ADINTRM. Based on ADINTRM, the generation of the endpoint dataset becomes straightforward by applying the endpoint definition to AVAL/PARAMCD in addition to copying other variables. Therefore, the processing is not repeated here. Instead, some notes are provided like the ones below. 1. Avoid the floating decimal issue for binary endpoints, e.g., using CHG=input((AVAL-BASE), best.) instead of CHG=AVAL-BASE, similarly, PCHG=input((AVAL-BASE)/BASE*100,best.) instead of PCHG= (AVAL-BASE)/BASE*100; 5

2. Add IMPUTFL (Imputation Flag) variable. AVAL in an endpoint dataset is the function of selected items in ADINTRM, and the imputation method specified with DTYPE is only traceable in ADINTRM. Very often, the descriptive analyses are required to be based on observed values. Therefore, the IMPUTFL variable is needed in the endpoint dataset to facilitate those analyses. The IMPUTFL can be derived as: sum (ADINTRM.DTYPE) ^='')>0) then IMPUTEFL='Y' per AVISIT per PARAMCD per SUBJECT, regardless of the VISIT. With the macrotized derivations, the codes reusability is maximized. In the first layer dataset ADINTRM, the time windows deriving AVISIT and AVISITN can be easily achieved by utilizing the SAS PROC FORMAT. The imputation process is highly standardized as well and can be implemented with a macro incorporating LOCF, BLOCF (baseline observation carried forward), WOCF (worst observation carried forward), or any other common methods. The derivation of the parameter invariant variable--anl01fl, used for specific unique analysis record selection from multiple records within the same time window, is also straightforward with a macro and parameterized with the method of either closest to targeted day (e.g., VISITDY) or worst value or some other predefined method. In the second layer dataset ADEFF, the endpoint derivations can be implemented with a macro by parameterizing the input dataset name, endpoint name, (PARAM/PARAMCD), data selection conditions, and the endpoint definition. The below SAS macro depicted in figure 5 demonstrates the implementation of binary response endpoints. Figure 5 Macro for Binary Endpoint Derivation Very often the validation of an efficacy endpoint dataset requires independent double programming. The common challenge is to reach an exact match within competitive timeline. From our practices, the twolayer ADaM design method for developing efficacy dataset improves not only implementation and review, but validation as well. The two layer design allows the validation process for second layer dataset in parallel even if the first layer dataset is not a full match yet. The architecture (Figure 3) can be optionally documented in the Data Guide to facilitate reviews. DISCUSSION Practically, for less complexity, the ADSL implementation can be further divided by trial design types, e.g., parallel vs. cross-over. One specification and program for parallel and another one for cross-over make the implementation and maintenance simple. It would be possible to implement an efficacy endpoint dataset with an automated concept under this twolayer design architecture. However, the multiplicity of endpoints of study by study and TA by TA would make the implementation too complex to manage. At this point, it would be sufficient just to keep the programming in the individual study level but with the same design architecture and coding structure across studies/projects with this two-layer ADaM design method. 6

The methods discussed in this paper are focused on the study level and exclude integrated summary of efficacy (ISE). Further investigations are needed to efficiently develop efficacy dataset for the ISE analyses using similar optimal programming philosophy. ACKNOWLEDGEMENT The authors would like to thank Anne LeMoigne for the paper review and invaluable inputs. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Chengxin Li Enterprise: Pfizer Consumer Healthcare Address: 1 Giralda Farms City, State ZIP: Madison, NJ 07940 Work Phone: 793 4014046 E-mail: chengxin.li@pfizer.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. 7