Building Traceability for End Points in Analysis Datasets Using SRCDOM, SRCVAR, and SRCSEQ Triplet Xiangchen (Bob) Cui, Tathabbai Pakalapati, Qunming Dong Vertex Pharmaceuticals, Cambridge, MA 2010 Vertex Pharmaceuticals Incorporated
Outline Introduction: Traceability in ADaM Programming Introduction: SRCDOM, SRCVAR, and SRCSEQ Triplet Establishing Traceability in Sweat Chloride Analysis Dataset (ADSW) Using SRC Triplet Conclusion Questions/Comments/Thoughts 2
The Key Principles Of ADaM as Stated in ADaM IG v1.0 1. Analysis data should be analysis ready 2. Provide a level of traceability back to the input data 3
Introduction: What is Traceability? Per ADaM IG V1.0, The Property That Enables The Understanding Of Data Linkage And/Or the Relationship Between An Element And Its Predecessor (s) Two Levels Of Traceability 1. Metadata Traceability 2. Data Point Traceability 2010 Vertex Pharmaceuticals Incorporated
Metadata? Metadata: The Information About Data ADaM Programming Specification and Define.xml Provides Meta Data Traceability. 1. Programming Specification 2. Define.xml 2010 Vertex Pharmaceuticals Incorporated
Introduction: Data Point Traceability Enables the Users to Go Directly to the Specific Predecessor (s) Should Be Implemented If Practically Feasible Can Be Very Useful When A Reviewer Is Trying to Trace a Complex Manipulation Path Is Established by Providing Clear Links in the Data to the Data Specific Value Used As Input For Analysis Value May Not Always Practical or Feasible to Provide Data-point Traceability Via Record-identifier Variables From Source Dataset (s) 2010 Vertex Pharmaceuticals Incorporated
How To Build Data Point Traceability? (I) ADaM IG V1.0 recommends numerous ways to build Data Point Traceability. Few Examples: 1. Inclusion of a supportive row or column for traceability even if not required for analysis (VISIT, --SEQ.) 2. PARAMTYP= DERIVED 3. Populating variable DTYPE with derivation method 7
How To Build Data Point Traceability? (II) 4. Populating ANLxxFL to support the accurate selection of records for analysis 5. CRITy/CRITyFL/CRITyFN variables to define and detect the presence of a criterion 6. Combine the Triplet of SRCDOM, SRCVAR, and SRCSEQ with Derived Variables in ADaM IG 1.0 in Page 38 8
Goal to Build Traceability in ADaM Data 1. Facilitate Transparency 2. Build Confidence in Analysis Results 3. Facilitate Programming Validation 4. Speed Up the Overall Review Process from FDA Reviewers 5. Build A Good Relationship with FDA Reviewers 9 2010 Vertex Pharmaceuticals Incorporated
Definition: Using Triplet: SRCDOM, SRCVAR, and SRCSEQ to Build Traceability (I) SRCDOM Source Domain The 2-character identifier of the SDTM domain that relates to AVAL or AVALC. SRCVAR Source Variable The name of the column (in the SDTM domain identified by SRCDOM) that relates to AVAL or AVALC. SRCSEQ Source Sequence Number The sequence number SEQ of the row (in the SDTM domain identified by SRCDOM) that relates to AVAL or AVALC. 2010 Vertex Pharmaceuticals Incorporated
Definition: Using Triplet: SRCDOM, SRCVAR, and SRCSEQ to Build Traceability (II) Per the definition from ADaM IG V1.0 1. SRCDOM should be a SDTM domain. 2. SRCVAR should be a variable from SDTM domain. 3. SRCSEQ should be a SEQ from SDTM domain. 11
Limitation Of SRCDOM, SRCVAR, and SRCSEQ Limitation of the Triplet : One Source SDTM Domain One Source Record One Source Variable Only Direct Copy of Source Value Can NOT Keep Traceability When An Analysis Parameter in ADaM Is Derived from Multiple SDTM Records. 12
Solutions to the Limitation This presentation provides examples of using SRC--- triplet to resolve this limitation by slightly modifying the usage of SRC--- triplet without losing the actual definition. Use Another Method Shown in the Next Slide 13
Another Method: Adding Variable Pair : RLCRIT and RLFACT Use Metadata to Describe the Derivation Rules of the Derived Parameters Add Variables RLCRIT and RLFACT to Connect Derived Records and Their Source Records: RLCRIT - relation criteria that stores data source (ADaM or SDTM data sets) with source variables in order RLFACT - relation factors that stores the values of those source variables in the same order Refer to 1. Zhu, Songhui and Yan, Lin. Methods of Building Traceability for ADaM Data. PharmaSUG 2011 2. Xiangchen (Bob) Cui, Hongyu Liu, and Tathabbai Pakalapati. Examples of Building Traceability in CDISC ADaM Datasets for FDA Submission. SAS Global, April 2012 2010 Vertex Pharmaceuticals Incorporated
Introduction In this presentation, the examples were from efficacy ADaM datasets from Cystic Fibrosis therapeutic area. We will discuss ADSW (Sweat Chloride Analysis Data Set) today. 15
Structure of Sweat Chloride of SDTM.SW Note: Records with even numbers of swseq were from the volume and the records with odd numbers of swseq were used for analysis. 16
Analysis Value of Sweat Chloride (ADSW.AVAL) at An Analysis Visit (AVISIT) in ADaM Per SAP 1. Analysis Value at Each Analysis Visit (AVISIT): Average of Sweat Chloride at Both Left and Right Arms Collected at A Nominal Visit 2. Baseline Analysis Visit: Average of Sweat Chloride from the Pre-dose Scheduled Visits 3. Average On-treatment Sweat Chloride: Average of Sweat Chloride from the On-treatment Scheduled Visits Note: 2 and 3 are the Average of analysis values from 1, i.e. values from ADaM.ADSW, not SDTM.SW. 17
Building Data Point Traceability for These Three Averages in ADSW Create Variable ASWSEQ= Analysis Sequence Number Starting from 1000 to Indicate the Newly Created Records PARAMTYP= DERIVED DTYPE= AVERAGE Populate the Variable SRCDOM, SRCVAR, and SRCSEQ to These Newly Created Records 18
Traceability For Average Sweat Chloride At Both Left And Right Arms At Every Analysis Visit 1. PARAMTYP= DERIVED tells the reviewer these records are derived in ADaM, with sequence numbers above 1000. 2. SRCSEQ=23 for AVISIT= Day 14 suggests that only sweat chloride at left arm is used to populate AVAL, as sweat chloride at right arm is missing. 19 3. SRCSEQ lists sequence numbers, separated by $, of those records in SDTM SW used in deriving AVAL. Hence SRCSEQ is the concatenation of two sequence numbers--- A deviation from IG!
Traceability For Baseline Analysis Visit (I) ASWSEQ = 0.5 more than the sequence number corresponding to the last predose analysis visit (1002+0.5) to be created as a new sorting key in ADSW SRCDOM= ADSW, SRCVAR= AVAL and SRCSEQ= 1001$1002 for AVISIT= Baseline indicating that analysis value (AVAL) is derived, using the AVAL in ADSW, corresponding to sequence numbers 1001 and 1002 (Screening and Day 1 visit, respectively). SRC-triplet above shows another deviation from ADaM IG! 20
Traceability For Average On-Treatment Sweat Chloride Defined as the average of on-treatment analysis values (Day 7, 14, 21, and 28) Its calculation is same as one from Baseline Analysis Visit for traceability with SRCDOM=ADSW and SRCVAR=AVAL. 21
An Example of Traceability For Average On- Treatment Sweat Chloride (I) 22
An Example of Traceability For Average On- Treatment Sweat Chloride (II) How About We Change SRCSEQ= 1005$1006$1007$1008 into More Meaningful SRCSEQ= DAY 7$DAY 14$DAY 21$DAY 28? Better communication and reviewing of the data records 23
An Example of Traceability For Average On- Treatment Sweat Chloride (III) SRCSEQ= DAY 7$DAY 14$DAY 21$DAY 28 SRCSEQ is from VISIT, not ASWSEQ anymore! No sequence numbers are involved! A severe deviation from ADaM IG! 24
An Example of Traceability For Average On- Treatment Sweat Chloride (III) 25
An Example of Traceability For Average On- Treatment Sweat Chloride (I) 26
Snapshot of Traceability For Average On-Treatment Sweat Chloride for Multiple Subjects (I) 27
How Much To Pay For Traceability? No Free Lunch! Write A Clear ADaM Programming Specification Extra SAS Codes 2010 Vertex Pharmaceuticals Incorporated
Write A Clear ADaM Programming Specification Variable Name Variable Label Type Length Controlled Terms or Formats Origin Role Comments Core SRCDOM Source Domain Char 4 Derived Analysis Equal to "ADSW" for derived records (PARAMTYP="DERIVED") and AVISIT is equal to "Baseline" or "Average through Day 28" Equal to "SW" for all other derived records (PARAMTYP ="DERIVED") SRCVAR SRCSEQ Source Variable Source Sequence Number Char 8 Derived Analysis Equal to "AVAL" for derived records (PARAMTYP ="DERIVED") where AVISIT is equal to "Baseline" Equal to "AVAL" for derived records (PARAMTYP ="DERIVED") where AVISIT is equal to "Average through Day 28" Equal to "SWSTRESN" for all other derived records (PARAMTYP="DERIVED") Char 40 Derived Analysis For AVISIT="Baseline" SRCSEQ lists the sequence numbers (SW.SWSEQ) of records that are used to derive the analysis baseline value. Example: if sequence numbers of Screening and Day 1 are 1 and 2 respectively, then SRCSEQ is equal to 1$2 For AVISIT="Average through Day 28" SRCSEQ lists the analysis visits (ADSW.AVISIT) of records that are used to derive the analysis value (average of Day 7, Day 14, Day 21, Day 28). Example: if Day 7, Day 14, Day 21, Day 28 are used to drive the average then SRCSEQ is equal to Day 7$Day 14$Day 21$Day 28 For other derived records SRCSEQ lists the sequence numbers (SW.SWSEQ) of records used to derive the analysis value at each analysis visit. Example: if sequence numbers of SW_CL_L and SW_CL_R at Day 7 are 4 and 5 respectively, then SRCSEQ is equal to 4$5 for AVISIT=DAY 7 Perm Perm Perm 29
Extra SAS Codes for Average On-Treatment: Highlighted in Red Color *** calculate average day 1 through day 28 ( average day 7, 14, 21, and 28 by nominal visit regardless on safety follow up records; proc sort data = sw_cl_m2 out=for_avg; by usubjid swtestcd ADTM visitnum swseq; where visit in ('DAY 7','DAY 14','DAY 21') or (visit='day 28' and SWTPTNUM<=0); run; data average; length DTYPE $20. SRCSEQ $40. SRCDOM $4. SRCVAR $8.; retain total nobs 0 SRCSEQ ''; set for_avg(drop=srcseq); by usubjid SWTESTCD ADTM visitnum; if first.swtestcd then do; total=swstresn; nobs=1; SRCSEQ=strip(visit);end; else do;total=total+swstresn; nobs=nobs+1; SRCSEQ=strip(SRCSEQ) "$" strip(visit); end; if last.swtestcd then do; DTYPE='AVERAGE'; SWSTRESN=total/nobs; SWSEQ=SWSEQ+0.5; SRCDOM='ADSW'; SRCVAR='AVAL'; output; end; run; 30
Is it legal? Is it CDISC ADaM compliant? CDISC ADaM Validation Checks Version 1.2 Rule 180 out of 243 Rules: Only SRCDOM It is illegal! 31
Will We Get Caught Out! We uploaded All ADaM datasets from the study into OpenCDISC Validator for Compliance Checking. No Errors or Warning Messages Reported for SRC-triplet! 32
Will We Get Caught Out! Cont ed OpenCDISC Validator implements total 130 rules from CDISC ADaM Validation Checks Version 1.2 with total 243 rules. OpenCDISC Validator Website: The following is a listing of CDISC ADaM version 1.0 validation rules implemented in OpenCDISC Validator. The rules are based on the validation checks published by CDISC ADaM team. 33
Snapshot of OpenCDISC Validator ADaM Validation Rules from Its Web Site 34
Is it legal? Cont ed Rule 180 Has NOT Been Applied in OpenCDISC Validator Yet. OpenCDISC Validator Can Not Catch It! It seems that we are lucky. 35
The Key Message Here: Should the rule 180 be relaxed into the inclusion of a ADaM domain, in addition to single SDTM domain? Should CDISC ADaM Team Consider to Modify the Rule for SRC-Triplet? 1. SRCDOM: both a SDTM domain and a ADaM domain 2. SRCVAR: both a SDTM Variable and a ADaM Variable 3. SRCSEQ: single SDTM sequence number, multiple SDTM sequence numbers, and other variables 36
Conclusion Incorporating traceability features in ADaM datasets: Helps in effective program validation Speeds up the review process Transparency in submitted analysis data Good Relationship with FDA Applying SRCDOM, SRCVAR, and SRCSEQ to establish traceability is an Art, not a Rocket Science. CDISC ADaM Team Should consider to modify the rule for SRC-Triplet in order to better establishing traceability in ADaM programming! 37
Questions / Comments / Thoughts 38 2010 Vertex Pharmaceuticals Incorporated