IS03: An Introduction to SDTM Part II. Jennie Mc Guirk

IS03: An Introduction to SDTM Part II Jennie Mc Guirk

SDTM Framework 1. Where should the data go? 3. What is the minimum information needed? 2. What type of information should it contain?

SDTM Framework: Re-cap Data Class General Observation Special Purpose Relationship Trail Design Variable Role Identifier Topic Timing Qualifier Core Variables Required Expected Permissible

SDTM Small print Naming Conventions Subject Identifiers Sequence Variable Relationships and Linking Controlled Terminology Dates Formats Reference Start Date & Study Days Handling Text Original & Standard Results Missing & Multiple Values Timing and Timepoints Splitting Domains

Naming Conventions - Datasets 2 letter code, exceptions Split domains SUPP & RELREC SDTM IG Appendix C2, 30 SDTM reserved codes Events (5): AE, CE, MH, DS, DV Findings (12): EG, IE, LB, PC, PE, PP, QS, VS, DA, MB, MS, SC Findings About (1): FA Interventions (3): CM, EX, SU Trial Design (5): TA, TE, TI, TS, TV Special Purpose (4): CO, DM, SE, SV 1 reserved code relative to analysis datasets (AD) SDTM IG has also reserved the code X-, Y-, Z- for sponsor defined domains use / custom domains

Naming Conventions - Variables 8 character limit (40 character label limit) IG defines variables names per dataset class where -- indicates the domain name LBSTAT Fragment names (Appendix D) Guideline for SUPP QNAMs and TESTCDs

Subject Identifiers (USUBJID & SUBJID) Subject should be used where applicable consistent with the recommendation in FDA guidance USUBJID SUBJID generically refers to both patients and healthy volunteers. USUBJID Uniquely identifies a subject Uniquely identifies a subject across Unique trials Subject Identifier across within all studies a trial must be unique for each trial participant (subject) across all trials in the submission. no two (or more) subjects, across all trials in the submission, may have the same USUBJID. the same person who participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials. SUBJID Subject Identifier for the Study

Sequence Variable (--SEQ) Subject 1234 6 conmeds The Sequence Number (--SEQ) uniquely identifies a record for a given USUBJID within a domain. required in all domains (except DM) Conventions for values are sponsor-defined. Values may or may not be sequential depending on data processes and sources. Necessary to link observations between domains such as Linking parent and supplemental qualifier observations Relating records together (RELREC, CO) Sequential number 1 thru 6, uniquely identifying each observation for that USUBJID

Relationships and Linking Relationships within a domain --GRPID Relationships across domains RELREC CO Non-standard questions SUPP

Relationships and Linking: --GRPID Subject 1234 6 conmeds Relationships within a domain CMGRPID represents a relationship between observations. CMSEQ = 1, 2, 3 are related (Combination Therapy 1) CMSEQ = 4, 5, 6 are related (Combination Therapy 2)

Relationships and Linking: RELREC Relationships across domains Related Adverse Event & Disposition Event RELID indicates the relationship identifier RDOMAIN represents the related domains IDVAR & IDVARVAL represents the observations that are related

Relationships and Linking: CO Relationships across domains RDOMAIN represents the related domains IDVAR & IDVARVAL represents the observations that are related

Relationships and Linking: SUPPs LB Family Parent Child Parent = LB Link via LBSEQ Relationship Child = SUPPLB to non standard questions

Controlled Terminology Certain variables are controlled terms Values from a pre-defined list Represented 1 of 4 ways in the IG

Date Formats Dates in SDTM represented in ISO8601 format YYYY-MM-DDThh:mm:ss Dates in SDTM are character, enabling partial dates ISO8601 format

Reference Start Date (RFSTDTC) Subject Reference Start Date (RFSTDTC) designated as Study Day 1 usually relates to the day subject was first exposed to study drug the date preceding designated as Study Day -1 there is no Study Day 0 Day -n Day -2 Day -1 Day 1 Day 2 Day n RFSTDTC

Study Day (--STDY) sequential days relative to a reference point all Study Day values are integers. Calculate Study Day: if --DTC is on or after RFSTDTC --DY = (date portion of --DTC) - (date portion of RFSTDTC) + 1 if --DTC precedes RFSTDTC --DY = (date portion of --DTC) - (date portion of RFSTDTC)

Study Day (--STDY) 13OCT2013 14OCT2013 15OCT2013 16OCT2013 17OCT2013 RFSTDTC -1 +1 Day -2 Day -1 Day 2 Day 1 Day 3 No Study Day 0!!

Handling Text Casing Upper case (recommended) Exceptions include Comments / Free text --TEST in Findings domains External dictionary text (e.g. MedDRA) Unit symbols (e.g. mg/dl) Free Text General Comments Specify values for Result qualifier variables Non-result qualifier variables Topic variables Free text collected on a dedicated CRF page and/or related to one or more SDTM domains will be stored within the CO (Comments) domain Free text responses to specific questions

Handling Specify Text Topic variables Result Non-result qualifier qualifier variables variables Remember the limit for all variables in SDTM is $200

Original & Standard Results --ORRES original result in a Findings domain (e.g. LB) Expected Variable should be populated With exception of 1. STAT = NOT DONE (Status variable) 2. -DRVFL = Y (Result is derived) When --ORRES is populated 1. --STRESC (std character result) must be populated 2. --STRESN (std numeric result) should be populated when result is numeric --STRESC is derived by conversion of values in --ORRES to values with standard units

Missing Values Missing values should be represented by nulls. Note: This is a change from previous versions of the SDTMIG which previously allowed sponsors to define their conventions for missing values. When groups of tests are not performed Variable Value Example --TESTCD --ALL LBALL -- TEST Name of module Labs Data --CAT Name of group of tests Urinalysis --ORRES Null --STAT NOT DONE NOT DONE --REASND If collected Not collected Individual missing TESTCD, will have --STAT = NOT DONE

Multiple Values Type Example Action Result Intervention Topic variable TYLENOL AND BENADRYL Split CMTRT = TYLENOL CMTRT = BENADRYL Event Topic variable HEADACHE AND NAUSEA Split AETERM = HEADACHE AETERM = NAUSEA Findings Result variable ATRIAL FIBRILLATION AND ATRIAL FLUTTER Split EGORRES= ATRIAL FIBRILLATION EGORRES = ATRIL FLUTTER Non Result Qualifier AE LOCATION check all that apply MULITPLE with SUPP AELOC = MUTIPLE SUPPAE.QNAM = AELOC1, 2, n

Timing & Timepoint Variables --STRF used to identify the start of an observation relative to the sponsor-defined reference period. --ENRF used to identify the end of an observation relative to the sponsor-defined reference period. Reference period: RFSTDTC to RFENDTC Values: BEFORE, DURING, AFTER, DURING/AFTER, U (for unknown) BEFORE DURING AFTER RFSTDTC RFENDTC DURING/AFTER

Timing & Timepoint Variables When to use --STRF and ENRF? 1. When CRF collect the below type of information in lieu of a date 2. Some sponsors may wish to derive --STRF and --ENRF for analysis or reporting purposes even when dates are collected. *Sponsors are cautioned not to use STRF & --ENRF for both (1) and (2), as it will blur the distinction between collected and derived values within the domain. **Sponsors wishing to derive for reporting purposes are instead encouraged to use supplemental variables or analysis datasets for this derived data

Timing & Timepoint Variables Represent timing information relative to a specific time point --STRTPT (Start Reference Time Point) --STTPT (Start Time point) COINCIDENT --ENRTPT BEFORE (End Reference Time Point) --STTPT --ENTPT (End Time point) AFTER e.g. Date of withdrawal

Timing & Timepoint Variables REFERENCE VARIABLE START END VALUES --STRF RFSTDTC RFENDTC BEFORE, DURING, DURING/AFTER, AFTER, U --ENRF RFSTDTC RFENDTC BEFORE, DURING, DURING/AFTER, AFTER, U --STRTPT --STTPT BEFORE, COINCIDENT, AFTER, U --ENRTPT --ENTPT BEFORE, COINCIDENT, AFTER, ONGOING, U

Splitting Domains Why split domains? Size restrictions: exceeds limitations Ease of use: store topically related observations together Considerations when splitting domains Split by category (--CAT) e.g. LBCAT = HEMATOLOGY, CHEMISTRY Split dataset names can be up to four characters in length e.g LBHM, LBCH Value of the DOMAIN variable consistent across the separate datasets e.g. DOMAIN = LB in all Variables have the same attributes across the split domains Permissible variables included in one split dataset need not be included in all split datasets. --SEQ must be unique within USUBJID for all records across all the split datasets, and relate in the same way to SUPPs, CO, RELREC

Splitting Domains Study 2 Study 1 In short, if you append the split domains together, they should have the same appearance and work in the same way like you have never split them!

What we covered Naming Conventions Subject Identifiers An Introduction to SDTM Part II Sequence Variable Relationships and Linking Controlled Terminology Dates Formats Reference Start Date & Study Days Handling Text Original & Standard Results Missing & Multiple Values Timing and Timepoints Splitting Domains

An Introduction to SDTM Part II