PharmaSUG Paper DS24

Similar documents
It s All About Getting the Source and Codelist Implementation Right for ADaM Define.xml v2.0

Deriving Rows in CDISC ADaM BDS Datasets

An Efficient Solution to Efficacy ADaM Design and Implementation

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

Applying ADaM Principles in Developing a Response Analysis Dataset

Creating an ADaM Data Set for Correlation Analyses

NCI/CDISC or User Specified CT

How to write ADaM specifications like a ninja.

Some Considerations When Designing ADaM Datasets

ADaM Implementation Guide Prepared by the CDISC ADaM Team

Introduction to ADaM standards

ADaM Implementation Guide Status Update

Use of Traceability Chains in Study Data and Metadata for Regulatory Electronic Submission

Programming checks: Reviewing the overall quality of the deliverables without parallel programming

PharmaSUG DS05

From SAP to BDS: The Nuts and Bolts Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, United BioSource Corp., Ann Arbor, MI

Metadata and ADaM.

ADaM Reviewer s Guide Interpretation and Implementation

ADaM Compliance Starts with ADaM Specifications

Traceability in the ADaM Standard Ed Lombardi, SynteractHCR, Inc., Carlsbad, CA

Traceability: Some Thoughts and Examples for ADaM Needs

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

Let s Create Standard Value Level Metadata

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

Introduction to ADaM and What s new in ADaM

Robust approach to create Define.xml v2.0. Vineet Jain

Dealing with changing versions of SDTM and Controlled Terminology (CT)

PharmaSUG Paper DS-24. Family of PARAM***: PARAM, PARAMCD, PARAMN, PARCATy(N), PARAMTYP

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

A Taste of SDTM in Real Time

DCDISC Users Group. Nate Freimark Omnicare Clinical Research Presented on

ADaM and traceability: Chiesi experience

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

An Introduction to Visit Window Challenges and Solutions

PharmaSUG Paper PO21

Leveraging ADaM Principles to Make Analysis Database and Table Programming More Efficient Andrew L Hulme, PPD, Kansas City, MO

PharmaSUG2014 Paper DS09

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania Kim Minkalis, Accenture Life Sciences, Wayne, Pennsylvania

Riepilogo e Spazio Q&A

Beyond OpenCDISC: Using Define.xml Metadata to Ensure End-to-End Submission Integrity. John Brega Linda Collins PharmaStat LLC

esubmission - Are you really Compliant?

Study Data Technical Conformance Guide (TCG)

Automate Analysis Results Metadata in the Define-XML v2.0. Hong Qi, Majdoub Haloui, Larry Wu, Gregory T Golm Merck & Co., Inc.

ADaM for Medical Devices: Extending the Current ADaM Structures

Analysis Data Model Implementation Guide Version 1.1 (Draft) Prepared by the CDISC Analysis Data Model Team

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

PharmaSUG China Paper 70

PharmaSUG 2014 PO16. Category CDASH SDTM ADaM. Submission in standardized tabular form. Structure Flexible Rigid Flexible * No Yes Yes

Xiangchen (Bob) Cui, Tathabbai Pakalapati, Qunming Dong Vertex Pharmaceuticals, Cambridge, MA

What is the ADAM OTHER Class of Datasets, and When Should it be Used? John Troxell, Data Standards Consulting

CDASH MODEL 1.0 AND CDASHIG 2.0. Kathleen Mellars Special Thanks to the CDASH Model and CDASHIG Teams

PharmaSUG Paper PO22

Implementing CDISC Using SAS. Full book available for purchase here.

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

The Implementation of Display Auto-Generation with Analysis Results Metadata Driven Method

SAS Clinical Data Integration 2.6

THE DATA DETECTIVE HINTS AND TIPS FOR INDEPENDENT PROGRAMMING QC. PhUSE Bethan Thomas DATE PRESENTED BY

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

Why organizations need MDR system to manage clinical metadata?

Traceability Look for the source of your analysis results

START CONVERTING FROM TEXT DATE/TIME VALUES

Study Data Reviewer s Guide Completion Guideline

Material covered in the Dec 2014 FDA Binding Guidances

Validating Analysis Data Set without Double Programming - An Alternative Way to Validate the Analysis Data Set

SAS Clinical Data Integration 2.4

Sandra Minjoe, Accenture Life Sciences John Brega, PharmaStat. PharmaSUG Single Day Event San Francisco Bay Area

Study Composer: a CRF design tool enabling the re-use of CDISC define.xml metadata

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Paper FC02. SDTM, Plus or Minus. Barry R. Cohen, Octagon Research Solutions, Wayne, PA

ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

Updates on CDISC Standards Validation

When ANY Function Will Just NOT Do

Avoiding Sinkholes: Common Mistakes During ADaM Data Set Implementation

Paper DS07. Generating Define.xml and Analysis Result Metadata using Specifications, Datasets and TFL Annotation

An Alternate Way to Create the Standard SDTM Domains

The Wonderful World of Define.xml.. Practical Uses Today. Mark Wheeldon, CEO, Formedix DC User Group, Washington, 9 th December 2008

Considerations on creation of SDTM datasets for extended studies

Experience of electronic data submission via Gateway to PMDA

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments...

Managing your metadata efficiently - a structured way to organise and frontload your analysis and submission data

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

PharmaSUG Paper PO03

Generating Define.xml from Pinnacle 21 Community

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

Clarifications About ADaM Implementation Provided in ADaMIG Version 1.1

Edwin Ponraj Thangarajan, PRA Health Sciences, Chennai, India Giri Balasubramanian, PRA Health Sciences, Chennai, India

How Macro Design and Program Structure Impacts GPP (Good Programming Practice) in TLF Coding

ADaM IG v1.1 & ADaM OCCDS v1.0. Dr. Silke Hochstaedter

Pharmaceuticals, Health Care, and Life Sciences. An Approach to CDISC SDTM Implementation for Clinical Trials Data

Best Practice for Explaining Validation Results in the Study Data Reviewer s Guide

SDTM Implementation Guide Clear as Mud: Strategies for Developing Consistent Company Standards

Optimization of the traceability when applying an ADaM Parallel Conversion Method

Creating Define-XML v2 with the SAS Clinical Standards Toolkit 1.6 Lex Jansen, SAS

Automation of STDM dataset integration and ADaM dataset formation

Codelists Here, Versions There, Controlled Terminology Everywhere Shelley Dunn, Regulus Therapeutics, San Diego, California

Define.xml 2.0: More Functional, More Challenging

Transcription:

PharmaSUG 2017 - Paper DS24 ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Datasets Nancy Brucken, inventiv Health, Ann Arbor, MI Karin LaPann, Shire, Lexington, MA ABSTRACT Questionnaires, ratings and scales (QRS) are frequently used as primary and secondary analysis endpoints in clinical trials. The Submission Data Standards (SDS) QRS sub-team has compiled a considerable library of SDTM supplements defining standards for the collection and storage of QRS data. The ADaM ADQRS sub-team has been formed to develop addenda to these supplements, which will define standards for corresponding analysis datasets. This paper represents the current thinking of the ADQRS sub-team regarding basic principles for building QRS analysis datasets. INTRODUCTION The CDISC website provides the following definition of a questionnaire: Questionnaires are named, stand-alone measures designed to provide an assessment of a concept. Questionnaires have a defined standard structure, format, and content; consist of conceptually related items that are typically scored; and have documented methods for administration and analysis. Questionnaires consist of defined questions with a defined set of potential answers. Most often, questionnaires have as their primary purpose the generation of a quantitative statistic to assess a qualitative concept. So, from an analysis viewpoint, questionnaires are a set of related questions that are generally combined in some manner to produce one or more scores, which are then summarized. Ratings are a classification or ranking of someone or something based on a comparative assessment of their quality, standard, or performance. Functional tests are an example of ratings, and have been separated from questionnaires by the QRS team into the FT domain. The same is true for scales, which are defined by a set of criteria that makes up a single measurement, such as clinical classifications. Clinical Classifications instruments are also stored separately from questionnaires, in the Disease Response and Clinical Classification (RS) domain. In this sub-team, we are concentrating primarily on questionnaires in the QS domain, since their analysis is generally more complex. Over the past several years, more than 100 different SDTM QRS supplements have been created to provide standards for the collection and storage of responses from questionnaires, ratings and scales. These supplements contain sample Case Report Forms (CRFs) for use in collecting the data, and controlled terminology for the QSTEST and QSTESTCD values representing the data in the SDTM QS domain. However, they stop short of describing corresponding datasets to be used for summarization and analysis of this data. The ADaM ADQRS sub-team was formed to carry this work forward, by developing basic standards for QRS analysis datasets that could be used by a variety of analysis methods. ADQRS DATASET BASIC PRINCIPLES During the process of developing the first ADaM QRS addendum, the ADQRS team has settled on some basic principles for analysis dataset structure and naming conventions. Even if you are creating analysis datasets for questionnaires, ratings or scales that are not covered by current QRS standards, following these conventions will help ensure that your datasets comply with CDISC standards. In all cases, the intent is to provide traceability from the analysis datasets back to their corresponding records in the SDTM QS domain. DATASET NAMING CONVENTIONS ADaM dataset names are currently limited to a maximum length of 8 characters, and must begin with AD, thus leaving only 6 characters available for the actual dataset name. Thus, we recommend excluding QS, RS and FT from the dataset name, in order to gain 2 more characters, and using the 1

ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Datasets, continued remaining characters to hold an abbreviated version of the QRS name. For example, the QRS addendum for the Geriatric Depression Scale (Short Form) recommends using ADGDSSF for the dataset name. If there is concern that this practice might make it more difficult to distinguish ADQRS datasets from other analysis datasets, the dataset label may be used for this purpose; the QS origin may also be documented in define.xml. This convention also implies that the QS domain should be split into separate analysis datasets for each individual questionnaire, rating or scale. These instruments are generally analyzed separately, and the associated datasets may be rather large. In addition, scoring calculations may vary widely for different questionnaires, so separating them simplifies the programming logic, and makes them easier to use. DATASET VARIABLE CONVENTIONS In the QS domain, QSCAT usually contains the acronym name of the actual questionnaire, rating or scale that is most commonly used. We recommend storing that value in PARCAT1 in the corresponding ADaM dataset, since per ADaM standards, PARCAT1 is a categorization of PARAM. While PARCAT1 will thus be identical on all records in the dataset, its presence will be helpful if multiple QRS analysis datasets are subsequently combined. Similarly, we recommend that PARCAT2 be used to hold the associated questionnaire sub-scale name, if it exists. This practice allows for easier calculation of sub-scale scores, as well as the ability to quickly subset to only questions associated with a particular sub-scale. PARCAT2 can be null on some records if a question is not part of a sub-scale. For maximum traceability back to QS, QSTEST and QSTESTCD values should be copied directly into PARAM and PARAMCD, respectively. QSSTRESN should also be copied over directly from QS into the analysis dataset, so that any changes to AVAL as a result of imputation or recoding can be easily compared back to their original value. Note that in the SDTM standard for QS, QSORRES contains the original text string of the answer, while QSSTRESC contains the character representation of QSSTRESN (e.g., when QSSTRESN = 2 then QSSTRESC = 2 ). It is not necessary to copy QSSTRESC into AVALC, as it is unlikely that the character representation of the numeric code will be used for analysis or reporting. Instead, we suggest storing the value of QSORRES in AVALC, but only if it is needed for analysis. In any case, it may be helpful for traceability to bring QSORRES itself into the analysis dataset. DATASET STRUCTURE CONVENTIONS If sub-scale and/or total scores are included in QS because they were provided with the raw data as captured data, and are subsequently recalculated in the analysis dataset because of missing value imputation, or for any other reason, we recommend keeping the original QS records in the analysis dataset, with QSTEST and QSTESTCD copied into PARAM and PARAMCD. Separate records should then be created for the calculated scores, using the same QS QSTESTCD root for PARAMCD, plus additional characters representing the specific sub-scale or total score. For example, the individual GDSSF questions contain QSTESTCD/PARAMCD values beginning with GDS02, and the total score, if provided in QS, would have QSTESTCD= GDS0216. The calculated GDSSF total score has a PARAMCD value of GDS02TOT. This makes it easier to distinguish between collected and recalculated sub-scale and total scores, and allows for comparison between the two versions. ABLFL (analysis baseline flag) and ANLzzFL (analysis flags) should be set only on sub-scale and total score records, unless individual question records are analyzed separately. This practice helps avoid cluttering the dataset with unnecessary flag values which won t be used for analysis. Additionally, CHG and BASE should only be computed for records that will be analyzed, and not for every individual question. ADDITIONAL CONSIDERATIONS Whether to include responses to all questions in the ADQRS data sets, or only summary records being analyzed, should be decided on a per-questionnaire basis, so this information will be included in the ADQRS addenda for the individual questionnaires. However, regulatory agencies have expressed a preference for including both the individual questions and summary records in the same dataset. They 2

ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Datasets, continued recalculate all of the scores to compare against values provided by the sponsor as part of their review process, and it is easier for them if both individual questions and summary records are all stored in the same dataset. However, large datasets may take longer to process. A compromise is to create the full dataset, containing all individual and summary records, and then pull off only the summary records into a separate dataset, which is then used as input to the analysis and reporting programs. All questionnaire scoring instructions should be documented in define.xml, along with all imputation rules used for handling of missing values. ADQRS DATASET EXAMPLES Here is a partial example of the variable-level metadata being proposed for the GDSSF analysis dataset: Variable Label Type STUDYID USUBJID SITEID ITTFL TRTP Study Identifier Unique Subject Identifier Site Identifier Intent To- Treat Population Flag Planned Treatment Controlled Terms or Format Y, N SDTM codelist C106664 plus PARAM Parameter GDS02-Total Score SDTM codelist PARAMCD Parameter C106665 Code Source/Derivation QS.STUDYID QS.USUBJID ADSL.SITEID ADSL.ITTFL Conditional, must include at least one population flag. Derivation should be defined by sponsor according to analysis needs. Conditional, should include at least one treatment variable, i.e. TRTP, TRTA, TRTSEQP or TRT01P. QS.QSTEST for Responses to Individual Questions copied in from SDTM. Set PARAM to 'GDS02-Total Score' for the derived total score. QS.QSTESTCD for Responses to Individual Questions copied in from SDTM. plus Set PARAMCD to 'GDS02TOT' for the derived total score. GDS02TOT PARAMN Parameter ber the questions in the order they appear on the form, and ber assign a higher value to the derived total score record. PARCAT1 Parameter Set to QS.QSCAT for records that come directly from SDTM and GDS SHORT GDS SHORT FORM for any derived records (i.e. Total Scores, Category 1 FORM Last Observed Value (LOV)). VISIT Visit Name QS.VISIT Recommended for traceability. VISITNUM Visit ber QS.VISITNUM Recommended for traceability. AVISIT AVISIT represents the analysis visit for the record. Refer to the Visit analysis plan for additional details. AVISITN Visit (N) A numeric representation of AVISIT. QSDTC Date/Time ISO 8601 QS.QSDTC Recommended for traceability. 3

ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Datasets, continued Variable Label Type ADT ADY of Finding Date Relative Day Finding in QSORRES Original Units AVAL Value AVALCAT1 BASE CHG DTYPE ABLFL ANL01FL QSSEQ INVNAM Value Category 1 Baseline Value Change from Baseline Derivation Type Controlled Terms or Format Date9. Source/Derivation Date portion of QS.QSDTC converted to numeric and displayed in a format such as DATE9. For derived total score records where PARAMCD="GDS02TOT" this information will just be copied over from the individual records that make up that questionnaire Example derivation: ADT ADSL.TRTSDT + 1 if ADT TRTSDT, else ADT ADSL.TRTSDT if ADT < TRTSDT QS.QSORRES See Value Level Metadata in section 8.2.3. Set Baseline Y Record Flag Record Flag Y 01 Sequence ber Investigator Name "Normal, Possible Depression, Probable Depression" AVERAGE, LOV, MOV Table 1. Sample Variable Metadata Populate on derived total score records (where PARAMCD="GDS02TOT") - If 0<=AVAL<=5 then AVALCAT1="Normal"; else if 6<=AVAL<=9 then AVALCAT1="Possible Depression"; else if 10<=AVAL<=15 then AVALCAT1="Probable Depression". Source: categories described on GDSSF author s website, http://web.stanford.edu/~yesavage/gds.html. BASE=AVAL where ABLFL='Y'. Only the derived total score records where PARAMCD="GDS02TOT"should have BASE populated. Only populate for post-baseline total score records (where PARAMCD="GDS02TOT" and ADY>1) CHG=AVAL-BASE For PARAMCD="GDS02TOT": If the published scoring instructions are followed and 1-5 Responses are missing on the questionnaire then populate DTYPE with 'AVERAGE'. Other missing value imputations for individual questions are shown in the example with DTYPE equals 'MOV, and imputations for a completely missing total score are indicated with DTYPE equals LOV'. Otherwise leave NULL ABLFL='Y' on the record defined as the baseline value. This flag will only be populated on the derived total score records (PARAMCD="GDS02TOT"). Set ANL01FL='Y' for scheduled visits (VISIT is not of the form UNSCHEDULED XXX.XX) associated with the derived total score records (PARAMCD="GDS02TOT") Permissible and useful for selecting records that appear in tables. QS.QSSEQ. QSSEQ is included to provide traceability back to SDTM. Recommended for traceability. ADSL.INVNAM Permissible if needed for analysis. Note that in this dataset, we are keeping both the individual question and total score records. AVALC is not included, as the individual question responses do not have a corresponding text value. DTYPE indicates the type of imputation used for missing values, thus allowing for easy identification of imputed 4

ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Datasets, continued responses. VISITNUM, QSDTC, QSORRES and QSSEQ have all been copied directly from the original SDTM records to provide traceability. Another important aspect of questionnaire data is the need for detailed parameter value-level metadata (VLM). Most questionnaires contain one or more derived parameters. These need to be clearly defined when mapping from SDTM to ADaM, and the derivations for derived scores can be listed in greater detail in a VLM tab on the ADaM specification. Here is an example being proposed for the GDSSF analysis dataset: Variable Where AVAL PARAMCD "GDS02TOT" (GDS02-Total Score) AVAL PARAMCD = "GDS02TOT" (GDS02-Total Score) Type Controlled Terms or Format Table 2. Sample Parameter Value-Level Metadata Source/Derivation QS.QSSTRESN ( Value is the result copied in from SDTM) 1) The sum of QS.QSSTRESN for QS.QSTESTCD in ('GDS0201'- 'GDS0215') if no questions are missing 2) If <=5 responses are missing on a questionnaire then AVAL is 15*(Sum of QS.QSSTRESN for QS.QSTESTCD in ('GDS0201'-'GDS0215')/number of nonmissing questions for QS.QSTESTCD in ('GDS0201'-'GDS0215')). If the result is a fraction, round up to the nearest integer. 3) If more than 5 responses are missing, do not calculate a total score, and do not create a total score record. Complex questionnaires such as the Columbia Suicide Severity Rating Scale (CSSRS) provide additional challenges for ADaM. This scale is collected with different data entry screens for baseline, 1 month, 6 months, and lifetime. The QSTEST/QSTESTCD for these panels have different prefixes for each time point, and will need to be standardized to enable analysis across time points. Thus in this case, individual questions should be mapped to new PARAMCD/PARAM. For example, Non-Specific Suicidal Thought will have a different QSTESTCD value at each time point as follows: QS.CSS0102, QS.CSS0202, QS.CSS0302A, QS.CSS1002A QS.CSS1002B, QS.CSS0302B. These can be aggregated under PARAMCD = NONSPEC with different time points identified in ATPT (Baseline, Month 1, Month 6, Lifetime). QSTESTCD can be kept in the dataset as well to maintain traceability back to SDTM. The same logic applies to PARAM as well. FUTURE PLANS In order to help with the creation of future ADaM addenda, we are planning to create a document template for use in their development. We will also be developing an ADaM version of the current SDTM CDISC Operational Procedure 017 (COP 017), to enable teams to create their own ADaM addenda for ADQRS datasets, and submit them for CDISC approval. In addition, controlled terminology for the ADaM PARAM and PARAMCD values will be registered with the NCI team. CONCLUSION This paper has described some basic principles and approaches for developing analysis datasets for use with data from questionnaires, ratings and scales. Following these recommendations will help ensure that your analysis datasets conform to CDISC standards, in addition to providing traceability between the analysis datasets and the QS domain. REFERENCES CDISC Questionnaires, Ratings and Scales Web Page. 2017. Available at https://www.cdisc.org/foundational/qrs. 5

ADQRS: Basic Principles for Building Questionnaire, Rating and Scale Datasets, continued ACKNOWLEDGMENTS The authors wish to thank the ADaM ADQRS sub-team for their valuable suggestions regarding this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Nancy Brucken inventiv Health Nancy.Brucken@inventivhealth.com Karin LaPann Shire KLaPann0@Shire.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6