NCI/CDISC or User Specified CT Q: When to specify CT? CT should be provided for every variable with a finite set of valid values (e.g., the variable AESEV in ADAE can have the values MILD, MODERATE or SEVERE). Even if a variable has one valid value, it may be an advantage to specify a codelist for validation purposes. Multiple variables can reference the same codelist (e.g. the NY codelist is used for ITTFL, PPROTFL, SAFFL, etc.). The column "Controlled Terms, Codelist or Format" in the SDTM IG and the column "Codelist / Controlled Terms" in the ADaM IG are generally a good reference when deciding on the need for a CT for a specific variable. Note that ISO 8601 is a presentation format rather than CT, therefore no CT reference is specified when compiling the metadata for a ISO 8601 formatted datetime or duration. The stylesheet provided with the Define-XML v2.0 package automatically populates the Controlled Terms/Codelist column with "ISO 8601" where applicable. Also note that SAS specific formats (e.g., date9. for ADaM numeric date variables) are considered as Display Formats rather than CT. This type of metadata will be shown in the Length/Display Format column by the stylesheet mentioned above. Variable Label Key Type Length / Display Format Controlled Terms or Format Race Race text 20 < Race> Source/Derivation/ Comment Predecessor: DM.RACE VISIT1DT Date of Visit 1 integer date9. Derived: SV.SVSTDTC when SV. VISITNUM=1, converted to SAS date RFSTDTC Subject Reference Start Date/Time date ISO8601 Predecessor: DM. RFSTDTC For many variables (e.g., SEX, RACE, AESEV) the valid values are defined in the NCI/CDISC Terminology files. For other variables (e.g., SEXN, PARAMCD, PARAM) it is up to the producer to define the valid values of those variables. Q: What if valid values differ based on different values of another variable? Different sets of valid values may be applicable for a given variable in a vertically structured dataset (e.g., ADaM BDS datasets) when looking at different subsets of records defined by where clauses (see example below). Example USUBJID PARAMCD PARAM AVAL AVALC 1 CGI0101 CGI01-Severity of Illness 1 Normal, not at all ill 1 CGI0102 CGI01-Global Improvement 1 Very much improved Codelist (CGI0101N) for AVAL WHERE PARAMCD="CGI0101": 0=Not assessed 1=Normal, not at all ill
2=Borderline mentally ill 3=Mildly ill 4=Moderately ill 5=Markedly ill 6=Severely ill 7=Among the most extremely ill patients Codelist (CGI0102N) for AVAL WHERE PARAMCD="CGI0102": 0=Not assessed 1=Very much improved 2=Much improved 3=Minimally improved 4=No change 5=Minimally worse 6=Much worse 7=Very much worse Note that the decodes in this example also form the valid list of values for AVALC for the given where conditions. Value-level metadata can be used to show which codelist is associated with AVAL/AVALC based on the values of PARAMCD as shown in the table below. Parameter Value List - ADQSCGI [AVAL] Variable Where Type Length / Display Format Controlled Terms or Format Source/Derivation/Comment AVAL AVAL PARAMCD = CGI0101" (CGI01-Severity of Illness) PARAMCD = "CGI0102" (CGI01-Global Improvement) integer 8 CGI0101N Derived: QS.QSSTRESN where integer 8 CGI0102N Derived: QS.QSSTRESN where Parameter Value List - ADQSCGI [AVALC] Variable Where Type Length / Display Format Controlled Terms or Format Source/Derivation/Comment AVALC AVALC PARAMCD = CGI0101" (CGI01-Severity of Illness) PARAMCD = "CGI0102" (CGI01-Global Improvement) text 40 CGI0101 Derived: QS.QSORRES where text 40 CGI0102 Derived: QS.QSORRES where Q: What to specify, the full list of applicable values for the specific study or only the values that actually occurred? All values in the permissible value set for the study should be included, whether they are represented in the submitted data or not (see SDTM IG 3.1.2, Section 4.1.3.3 CONTROLLED TERMINOLOGY VALUES). For example, if there was the possibility to classify severity as MILD, MODERATE or SEVERE, but only events of mild severity were reported, the full list of possible values, i.e. MILD, MODERATE, SEVERE would be included on the define.xml file. The same rule applies to ADaM.
Q: Do we include missing values in CT? Note that a null value should not be included in the permissible value set. A null value is implied for any list of controlled terms unless the variable is Required (see SDTM IG 3.1.2, Section 4.1.3.3 CONTROLLED TERMINOLOGY VALUES). Consequently null values should also not be included in the CT for ADaM. Q: When to use Enumerated Items vs. when to use Codelists? Define-XML provides the possibility of two types of CT definitions: Enumerated Item Lists include a simple list of valid values. Code Lists provide a decode for each valid value. Use the enumerated item lists, if the set of values itself is sufficient for data interpretation (e.g. CT for RACE). Use Codelists, if decodes for the valid values facilitate data interpretation (e.g. CTs for ARMCD, VISITNUM, PARAMCD, ADaM numeric variables with suffix N like SEXN or RACEN for their primary character counterparts). Age Group [CL.AGEGR1] <65 65-80 >80 Age Group (N) [CL.AGEGR1N] 1 <65 2 65-80 3 >80 Q: How to deal with related CTs (e.g., VISIT-VISITNUM, AVISITN-AVISIT, ARMCD-ARM, PARAMCD-PARAM)? Variables that have related CT, such as AVISIT & AVISITN or PARAMCD & PARAM, should have separate CT for each related variable and should populate DECODE with the valid value of the related variable. For example, in the table below, the value of DECODE for PARAMCD and PARAMN matches the submission value for PARAM. CTLIST CTLIST DESCRIPTION XMLTYPE SUBMISSION VALUE DECODE CT Type ADVSPARAMCD Vitals Parameter Code text HEIGHTCM Height (cm) Codelist ADVSPARAMN Vitals Parameter (N) integer 1 Height (cm) Codelist ADVSPARAM Vitals Parameter text Height (cm) Enumerated Item List There is no rule, which precludes the use of codelists (permitted values + display values) for CT with self-explanatory values. For documentation purposes, it is sufficient to list the valid values for variables like AESEV. The display value does not need to be provided.
Severity/Intensity Scale for Adverse Events [CL.AESEV, C66769] MILD [ C41338] MODERATE [ C41339] SEVERE [ C41340] When needed for other operational purposes, e.g., to facilitate display generation when style guides recommends different casing (e.g., mixed case instead of upper case), the valid values could also be provided in a codelist. Severity/Intensity Scale for Adverse Events [CL.AESEV, C66769] MILD [ C41338] MODERATE [ C41339] SEVERE [ C41340] Mild Moderate Severe Correspondingly, there is no requirement that decodes in the codelist for the ADaM numeric code variable values (e.g. AESEVN) must match the spelling/casing of the respective character variable values (e.g., AESEV). AESEVN [CL.AESEVN] 1 Mild 2 Moderate 3 Severe Q: Are there any CT naming conventions? For NCI/CDISC CT, the CodeList Name attribute must exactly match the CodeList Name from the published Controlled Terminology ODM (see Define-XML v2.0, Section 5.3.12). These codelist names should not be used for sponsor specific CT. Examples for NCI/CDISC CT Codelist Reference in SDTM or ADaM Implementation Guide CodeList Name NY OUT AGEU DTYPE No Yes Response Outcome of Event Age Unit Derivation Type Q: Extended CT? If a NCI/CDISC CT is defined as extensible by CDISC, sponsor specific additional values can be added. Before defining an additional value, it should be checked whether the suggested value is not just a synonym for an available CT value. If this is not the case, the sponsor specific value can be added but must be marked as an extended value in define.xml.
Q: How to deal with different subsets of CT? Some variables in SDTM or ADaM datasets share a common NCI/CDISC CT reference (e.g., the standard "No Yes Response" (NY) CT is referenced for the SDTM IESTRESC, --BLFL, etc. and also for the ADaM variables ABLFL, ITTFL, ITTRFL, etc.). However, only a subset of the values defined in the respective NCI/CDISC CT may be applicable for a specific variable (e.g., values N and Y are applicable for ITTFL whereas only Y is applicable for --BLFL or ABLFL). From a user's perspective, it is useful to know which specific set of values are applicable for a certain variable. The information is also useful for validation. That means, different codelist subsets are needed for different sets of valid values for different variables. However, this contradicts the codelist naming business rule that requires the exact Codelist Name as specified in the published NCI/CDISC Controlled Terminology ODM. Therefore the following convention is suggested: Assign unique names for each required subset of a CT. If the CT is defined by NCI/CDISC, the following naming convention is sugested: <Codelist name as published in the NCI/CDISC ODM>< unique subset identifier suffix>. The subset identifier suffix is used to make a distinction between different subsets of CT applicable for different variables. Example showing two subsets of the CDISC No Yes Response CT CodeList CodeList Name CodeList Coded Value CodeList Decode CL.NY No Yes Response N No CL.NY No Yes Response Y Yes CL.NY_Y No Yes Response (Subset for 'Y') Y Yes When the define.xml includes both the full CDISC CT definition as defined in the NCI EVS source and one or more CT subsets, a subset identifier is added to the CodeList Name for each included CT subset. When the only reference to a CT in a define.xml file is to a subset the code list name does not need to include the suffix. Q: Where to specify which NCI/CDISC CT version was used for the trial? Define-XML v2.0 does not include a specific element or attribute for the NCI/CDISC CT version used in the given study. The NCI/CDISC CT version should therefore be included in the study data reviewer's guide for SDTM and in the analysis data reviewer's guide for ADaM. External CT Q: Which External Codelists are commonly included in define.xml files for submissions? References to the MedDRA dictionary for coding of adverse event and medical history and references to the WHO Drug dictionary for coding of concomitant medication are commonly used. Q: Are there any naming conventions for external codelists? There are no naming conventions for external codelist references. Q: Is there any choice when to refer to an external codelist and when to specify a trial specific one?
In the SDTM define.xml example published with the Define-XML v2.0 package the external codelist ISO3166 is referenced for the SDTM variable COUNTRY in the domain DM. Variable Label Key Type Length Controlled Terms or Format Origin Derivation/Comment COUNTRY Country text 3 ISO3166 Assigned External Dictionaries Reference Name External Dictionary Dictionary Version ISO3166 (CL.ISO3166) ISO3166 An alternative option to handle this, particularily for trials that are only conducted in a small number of countries, would be to specify a Codelist reference with name Country (NCI C-Code: C66786) and to provide the individual country values applicable for the trial in a Codelist. This would allow the reviewer to see the countries in which the trial was conducted. Variable Label Key Type Length Controlled Terms or Format COUNTRY Country text 3 [ DEU = Germany, USA = United States ] < Country> Origin Assigned Derivation/Comment Country [CL.COUNTRY, C66786] DEU [ C16636] USA [ C17234] Germany Unites States Q: For which variables do we generally specify an external codelist reference? All variables that are derived from different hierarchy levels of external dictionaries should reference the respective dictionary. Since we need to specify a data type for every external dictionary, we may need to create two different dictionary references to the same dictionary version, i.e. one for the character variables and one for the numeric variables. Variable Label Type Controlled Terms or Format AELLT Lowest Level Term text Adverse Event Dictionary AELLTCD Lowest Level Term Code integer Adverse Event Dictionary (numeric codes) AEDECOD Dictionary-Derived Term text Adverse Event Dictionary AEPTCD Preferred Term Code integer Adverse Event Dictionary (numeric codes) AEHLT High Level Term text Adverse Event Dictionary AEHLTCD High Level Term Code integer Adverse Event Dictionary (numeric codes) AEHLGT High Level Group Term text Adverse Event Dictionary AEHLGTCD High Level Group Term Code integer Adverse Event Dictionary (numeric codes) AEBODSYS Body System or Organ Class text Adverse Event Dictionary AEBDSYCD Body System or Organ Class Code integer Adverse Event Dictionary (numeric codes) AESOC Primary System Organ Class text Adverse Event Dictionary AESOCCD Primary System Organ Class Code integer Adverse Event Dictionary (numeric codes)
External Dictionaries Reference Name External Dictionary Dictionary Version Adverse Event Dictionary (CL.AEDICT) MedDRA 12.0 Adverse Event Dictionary (numeric codes) (CL.AEDICT_N) MedDRA 12.0 Q: What should be specified if multiple different versions of the same dictionary were used, e.g. in integrated analyses? Create a unique reference name for every specific version of the external dictionary used in the datasets. A use case might be when data from multiple studies using different version of a coding dictionary are pooled for integrated analyses. The analyses will be done on the primary variables (e.g., AEDECOD, AEBODSYS) which are mapped to a common dictionary version. Variables like DECORG1, BDSYORG1 provide traceability to original (or prior) analyses. Variable Label Type Controlled Terms or Format AEDECOD Dictionary-Derived Term text Adverse Event Dictionary AEBODSYS Body System or Organ Class text Adverse Event Dictionary DECDORG1 PT in Original Dictionary 1 text Prior Adverse Event Dictionary BDSYORG1 SOC in Original Dictionary 1 text Prior Adverse Event Dictionary External Dictionaries Reference Name External Dictionary Dictionary Version Adverse Event Dictionary (CL.AEDICT) MedDRA 14.1 Prior Adverse Event Dictionary (CL.AEDICTP) MedDRA 12.0