ADaM and traceability: Chiesi experience BIAS Seminar «Data handling and reporting in clinical trials with SAS» Glauco Cappellini 22-Feb-2013 Agenda Chiesi Model for Biometrics Regulatory Background ADaM: Key Principles ADaM: Main Elements Analysis Data Subject-Level (ADSL) Basic Data Structure (BDS) Metadata Conclusions 2 1
Chiesi s view on biometrics activities In recent years Chiesi has adopted a strategy to conduct clinical trials based on outsourcing the majority of biometrics activities to external, specialized providers. An internal biometrics group has been maintained, devoted to: Write study protocols in the context of the development program; Supervise CRO s activities and review documents; Provide input for regulatory submissions (i.e. supplementary analyses). 3 Chiesi s growth and implication for biometrics activities USA United Kingdom Belgium Germany France Netherlands CEE Countries Russia Italy Chiesi expansion in terms of sales Spain China Pakistan Brazil Local Companies Local Partners Morocco Algeria Tunisia Chiesi s worldwide expansion has imposed the biometrics group to: Interact with regulatory agencies outside from the EU (typically US); Increase trials complexity in terms of therapeutic areas, geographic localization and providers involved. This imposed us the adoption of internal data standards to ease data management and statistical activities. Egypt Greece Turkey 4 2
REGULATORY BACKGROUND 5 FDA Data Requirements for Electronic Submissions In 2005 FDA issued a guidance on «Providing Regulatory Submissions in Electronic Format Human Pharmaceutical Product Applications and Related Submissions Using the ectd Specifications» Data Tabulations Observations in tabular format CRTs Data submitted to FDA Data Listings Domain views by subject, by visit Define Metadata Description Document Patient Profiles Complete view of all subject data Analysis Files Custom datasets to support analyses 6 3
International Data Standards To promote the development and adoption of international data standards, FDA have constantly interacted with the Clinical Data Interchange Standards Consortium (CDISC), a not-for-profit organization, whose mission is "to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of health-care. 7 CDISC Standard for Analysis Datasets: ADaM Analysis Data Model (ADaM) is the CDISC data specification for analysis datasets. Analysis Data Model: Version 2.1, released Dec 2009 ADaM Implementation Guide: Version 1.0, released Dec 2009 8 4
ADaM: a constant interaction between CDISC and FDA Since inception, the CDISC ADaM team has been encouraged and informed by FDA statistical and medical reviewers. [ ] ADaM standard has been developed to meet the needs of the FDA and industry (Analysis Data Model, v2.1, CDISC) Specifications for Analysis datasets for human drug product clinical and analytical studies are provided by the ADaM. (Study Data Specification, v2.0, FDA) 9 ADaM: KEY PRINCIPLES 10 5
ADaM Key Principles (I) Analysis datasets should: Be Analysis-Ready Allow replication of analysis with little of no programming or complex data manipulation «One proc away» principle eliminates or greatly reduces the amount of programming required by the statistical reviewer Source: S.Wilson, 2012 11 ADaM Key Principles (II) Analysis datasets should: Be Usable by Current Available Tools Be in format of V5 transport files (.xpt) Usable by SAS, JMP, S+, etc. The macros needed for conversion are available free of charge at: http://www.sas.com/fda-esub Source: S.Wilson, 2012 12 6
ADaM Key Principles (III) Analysis datasets should: Facilitate clear and unambiguous communication and provide a level of traceability Providing clear and unambiguous communication of the science and statistics of the trial is essential and it relies heavily on the availability of machine-readable metadata. Clearly describe and document the processes of: Analysis Dataset creation Analysis Results generation Source: S.Wilson, 2012 13 Traceability: a key requirement Traceability refers to the completeness of the information about every step in a process chain. (Wikipedia) Data become traceable if sequence of: source data transformation logic target data is available for the whole study CRF SDTM Datasets Analysis Datasets Analysis Results Data Traceability 14 7
Traceability: a complex task Data Spec Annotated CRF SAP Data Flow Information Flow CRF SDTM Datasets Analysis Datasets Analysis Results Define.xml Analysis Dataset Metadata Analysis Dataset Metadata Input/Output Files/DB Metadata 15 ADaM: MAIN ELEMENTS 16 8
Data Structures (I) Analysis Data Subject- Level (ADSL) One record per subject To describe each subject s experience in a trial Basic Data Structure (BDS) One or more records per subject, per analysis parameter, per analysis time point. To support the majority of parametric and non parametric statistical analysis 17 Metadata: data about data Metadata are defined as data providing information about one or more aspects of the data, such as: Data Attributes (e.g. type of variables, controlled terminology); Data Traceability (e.g data origin, algorithms used for calculations); Adapted from: http://geospatialmetadata.blogspot.it/ 18 9
ADaM Traceability Two levels of traceability are defined in the Analysis Data Model: Metadata traceability It creates the link between the analysis variable and its source dataset or variable; It is required for ADaM compliance. Data point traceability It creates the link between the single data point and its predecessor record. To be done only if feasible. 19 ANALYSIS DATA SUBJECT-LEVEL (ADSL) 20 10
ADSL dataset Contains all variables important for describing a subject s experience in the trial: Critical demographic variables (e.g. age, sex, race) Disease factors (e.g. disease onset, disease severity) Treatment code / group Other possible relevant factors which could affect response (e.g. smoking) Variables used as strata for randomization Important event dates (e.g. treatment start and stop dates) Study population (e.g. analysis set flags) 21 Issue 1: Variables to be included in the ADSL dataset The variables that need to be included in the ADSL are the ones that: Describe the status of the enrolled subjects prior to treatment; Group the subjects in some way for analysis purposes. For the choice, it is suggested to follow the ICH Guidance (ICH E3, Section 11.2): Relevant individual subject demographic and baseline data [ ] for all individual subjects randomized [ ] should be presented in a by-subject tabular listing. (Analysis Data Model, v2.1, CDISC) So the best practice is to include in the ADSL all the variables needed to produce the standard Demography, Baseline Characteristics, and Disposition tables. 22 11
Issue 2: Population flags management Subject-Level Population Flags are required for every population defined in the SAP Null values are not allowed Multiple programming of the same population-flag during the creation of different dataset is to be avoided 23 Issue 3: Traceability solution for ADSL (I) Due to the «one record per subject» structure of the ADSL dataset, sometimes lack of data-point traceability is observed. As a simple example, let s consider the following case, where 3 treatment administration are foreseen by protocol and only the first one is of interest: ADSL dataset SDTM EX dataset Variable Metadata USUBJID SITEID AGE ARM RANDDT TRTSTDT 05-0001 05 57 Placebo 9-Jan-2011 16-Jan-2011 DOMAIN USUBJID EXSEQ EXSTDTC EXENDTC EX 05-0001 1 2011-01-16 2011-01-16 EX 05-0001 2 2011-02-13 2011-02-13 EX 05-0001 3 2011-03-13 2011-03-13 24 12
Issue 3: Traceability solution for ADSL (II) Another option, as suggested by Minjoe, is to create an intermediate BDS-like dataset to store traceability information related to first study treatment exposure: DOMAIN USUBJID EXSEQ EXSTDTC EXENDTC SDTM EX dataset EX 05-0001 1 2011-01-16 2011-01-16 EX 05-0001 2 2011-02-13 2011-02-13 EX 05-0001 3 2011-03-13 2011-03-13 PADSL (BDS-like) USUBJID PARAMCD AVAL SRCDOM SRCVAR SRCSEQ 05-0001 TRTSTDT 16-Jan-2011 EX EXSTDTC 1 ADSL dataset USUBJID SITEID AGE ARM RANDDT TRTSTDT 05-0001 05 57 Placebo 9-Jan-2011 16-Jan-2011 Source: S.Minjoe, PharmaSUG 2012 Paper DS18 25 BASIC DATA STRUCTURE (BDS) 26 13
BDS datasets (I) One or more records per subject, per analysis parameter, per analysis time point. It supports the majority, but not all, the parametric and non-parametric statistical analyses. It focuses on variables that are related to the analysis parameter (PARAM/PARAMCD): Descriptions of the analysis parameter and its value; Information regarding the subject being analysed (i.e. identifiers and flags) Analysis-enabling variables (i.e. baseline, covariates) Timing variables to describe the collection windows Supportive variables to ensure traceability 27 BDS datasets (II) At least one subject-level population flag is required Record-level flags to be used when other selection variables are insufficient to identify the exact set of records used for one or more analyses. Redundancy is highly recommended in BDS datasets: all ADSL variables can be copied to BDS datasets to support traceability or enable analysis 28 14
Issue 1: Parameters to be added as new columns In case a new dependent variable needs to be added, clear rules have been defined in the ADaM-IG. The creation of new column is allowed only in case all the following requirements are met: 1. The new dependent variable is a function of AVAL and, optionally, BASE; 2. It is parameter-invariant; 3. It does not involve any transform of BASE. For all the other cases, the new variable has to be added as collection of new rows. 29 Issue 1: Parameters to be added as new columns As an example, consider the following real data we recently received. Instead of adhering to the ADaM-IG and add new rows for FEV 1 trough values at 12h and AUC (0-12h), both have been added as new columns, violating the rule. Adherence to these rules maintains compliance; without the rules, we would have no standard. Source: N.Freimark, PharmaSUG 2012 Paper DS16 30 15
Issue 2: Record-level identification flags management Can be defined in case the available information are not sufficient to identify the exact set of records to be used for one or more specific analysis. Unlike population-level flags, record-level flags can be null. In some situation the use of the combination of CRITy and CRITyFL can be used to identify specific analysis records satisfying a specific criterion. 31 Issue 3: Traceability of derived data (I) Two variables are foreseen by the ADaM-IG to identify and characterize derived variables: PARAMTYP = DERIVED to identify derived parameters DTYPE = <method of derivation> to describe the method used for derivation In the metadata it is necessary to precisely describe the algorithm used for derivation. Furthermore, as specified in the ADaM-IG section 4.4.1: retaining in one dataset all data used in the determination of the analysis parameter value will provide the clearest traceability [ ] within the standard ADaM BDS. This approach would also ease the generation of sensitivity analyses by reviewers. 32 16
Issue 3: Traceability of derived data (II) As an example, let s consider an analysis of a systolic blood pressure performed adopting the Last Observation Carried Forward (LOCF) principle: USUBJID VSTESTCD VSSTRESN VISIT VISITNUM SDTM VS data set 3000 SYSBP 145 Baseline 1 3000 SYSBP 130 Week 1 2 3000 SYSBP. Week 2 3 3000 SYSBP 133 Week 3 4 To ensure traceability and ease sensitivity analysis all the input data should be included in the analysis dataset in addition to the derived record: USUBJID VISIT AVISIT ADY PARAMCD AVAL DTYPE PARAMTYP ADVS DATASET 3000 Baseline Baseline -1 SYSBP 145 3000 Week 1 Week 1 7 SYSBP 130 3000 Week 1 Week 2 7 SYSBP 130 LOCF DERIVED 3000 Week 3 Week 3 22 SYSBP 133 33 3000 SYSBP 131 AVERAGE DERIVED METADATA 34 17
ADaM metadata: Chiesi s experience Each CRO provides a slightly different solutions, spanning from.xls files to a complete define.xml Pros and cons have been identified and will be discussed in order to optimize the information received for each trial and define a minimum set of requirements to match our internal quality standards. The following elements will be discussed: Hyperlinks; Analysis Traceability; Validation. 35 Hyperlinks (I) Hyperlinks are very useful to access directly to the data or further documentation (e.g. the SAS programme used to generate the analysis dataset or the SAP section where the derivation of a specific variable is defined); Furthermore, the possibility to browse the document is highly enhanced. Easily added in case a define.xml is provided: 36 18
Hyperlinks (II) Can be also added in case a define.pdf or define.xls file is provided. 37 Analysis Traceability (I) Complete traceability from analysis results to SDTM datasets and eventually source data is required both from the ADaM model and FDA. As an example of good practice, consider the following table: 38 19
Analysis Traceability (II) Analysis Result Metadata Metadata field Metadata Display Identifier Table 14.2.6.2 Display name Result identifier PARAM PARAMCD Analysis variable Reason DATASET Selection criteria Analysis of Dyspnoea Index (ITT Population) Analysis of Covariance Focal Score FS AVAL Secondary efficacy analysis as pre-specified in the protocol ADQS ITTFL= Y and PARAMCD= FS Documentation See SAP Section 4.7.4.5 Programming statements See ADQS.sas 39 Analysis Traceability (III) Analysis Variable Metadata Parameter Identifier Variable Name Variable Label Type Format Controlled Terms Source *DEFAULT* PARAM Parameter text $8. Functional Impairment, Magnitude of Task, Magnitude of Effort QS.QSTEST FS PARAM Parameter text $8. Focal Score Focal Score is assigned to the total score record *DEFAULT* AVAL Analysis Value integer 8. QS.QSSTRESN FS AVAL Analysis Value integer 8. Sum of TDI domain scores, see SAP section 14.2 *DEFAULT* DTYPE Derivation Type text $40. Not applicable, therefore blank FS DTYPE Derivation Type text $40. DTYPE= SUM when PARAM= Focal Score 40 20
Validation CDISC ADaM Validation Checks, v1.2, has been recently released. OpenCDISC validator, an open source tool available at http://www.opencdisc.org/, should be used for SDTM, ADaM and define.xml validation. The Validation Checks and the OpenCDISC validator are not meant to define the whole spectrum of ADaM compliance, including content and well defined metadata. 41 CONCLUSIONS 42 21
Advantages of standardisation: Agency s view Since inception, FDA recognised the potential benefit of standard analysis datasets to optimize the review process. Same data structures, variable names, data locations and metadata across different trials, applications and Sponsors: Ensure transparency and traceability of data; Allow the development of standard software for clinical data review; Ease meta-analyses and analyses across drug classes. 43 Advantages of standardisation: Sponsor s view As a Sponsor, we have decided to adopt international standards, since we identify the following clear benefits: Data integration from different vendors; Data integration from different trials for submission; Post hoc analyses as required by regulatory authorities and payers agencies. 44 22
TAKE HOME MESSAGE: GO FOR STANDARDS! 45 Acknowledgements Annamaria Muraro Head of Statistics and Data Management Chiesi Statisticians Roberta Baronio Anna Compagnoni Debora Santoro Stefano Vezzoli Chiesi Data Managers Maria Bocchi Elena Carzana Isabella Montagna 46 23
Glauco Cappellini Statistician & Data Manager Chiesi Farmaceutici email: g.cappellini@chiesi.com 47 24