THE DATA DETECTIVE HINTS AND TIPS FOR INDEPENDENT PROGRAMMING QC. PhUSE Bethan Thomas DATE PRESENTED BY

Similar documents
An Introduction to Visit Window Challenges and Solutions

An Efficient Solution to Efficacy ADaM Design and Implementation

How to write ADaM specifications like a ninja.

PharmaSUG DS05

PharmaSUG Paper DS24

Beyond OpenCDISC: Using Define.xml Metadata to Ensure End-to-End Submission Integrity. John Brega Linda Collins PharmaStat LLC

Metadata and ADaM.

Dealing with changing versions of SDTM and Controlled Terminology (CT)

Applying ADaM Principles in Developing a Response Analysis Dataset

Optimization of the traceability when applying an ADaM Parallel Conversion Method

One Project, Two Teams: The Unblind Leading the Blind

Automate Clinical Trial Data Issue Checking and Tracking

Programming checks: Reviewing the overall quality of the deliverables without parallel programming

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

Deriving Rows in CDISC ADaM BDS Datasets

It s All About Getting the Source and Codelist Implementation Right for ADaM Define.xml v2.0

Step Up Your ADaM Compliance Game Ramesh Ayyappath & Graham Oakley

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

PharmaSUG2014 Paper DS09

Creating an ADaM Data Set for Correlation Analyses

A Taste of SDTM in Real Time

ADaM Implementation Guide Status Update

PharmaSUG Paper DS-24. Family of PARAM***: PARAM, PARAMCD, PARAMN, PARCATy(N), PARAMTYP

Sandra Minjoe, Accenture Life Sciences John Brega, PharmaStat. PharmaSUG Single Day Event San Francisco Bay Area

Legacy to SDTM Conversion Workshop: Tools and Techniques

Introduction to Define.xml

ADaM Reviewer s Guide Interpretation and Implementation

TS04. Running OpenCDISC from SAS. Mark Crangle

Streamline SDTM Development and QC

Traceability in the ADaM Standard Ed Lombardi, SynteractHCR, Inc., Carlsbad, CA

Use of Traceability Chains in Study Data and Metadata for Regulatory Electronic Submission

Global Checklist to QC SDTM Lab Data Murali Marneni, PPD, LLC, Morrisville, NC Sekhar Badam, PPD, LLC, Morrisville, NC

NCI/CDISC or User Specified CT

Using PROC SQL to Generate Shift Tables More Efficiently

PhUSE EU Connect 2018 SI05. Define ing the Future. Nicola Perry and Johan Schoeman

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

SAS (Statistical Analysis Software/System)

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

Xiangchen (Bob) Cui, Tathabbai Pakalapati, Qunming Dong Vertex Pharmaceuticals, Cambridge, MA

Introduction to ADaM standards

Best Practices for E2E DB build process and Efficiency on CDASH to SDTM data Tao Yang, FMD K&L, Nanjing, China

Optimization of the traceability when applying an ADaM Parallel Conversion Method

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

SAS Online Training: Course contents: Agenda:

Out-of-the-box %definexml

Taming the SHREW. SDTM Heuristic Research and Evaluation Workshop

ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

Validating Analysis Data Set without Double Programming - An Alternative Way to Validate the Analysis Data Set

Some Considerations When Designing ADaM Datasets

Riepilogo e Spazio Q&A

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

The Implementation of Display Auto-Generation with Analysis Results Metadata Driven Method

Conversion of CDISC specifications to CDISC data specifications driven SAS programming for CDISC data mapping

Making the most of SAS Jobs in LSAF

ADaM Compliance Starts with ADaM Specifications

Material covered in the Dec 2014 FDA Binding Guidances

PhUSE US Connect 2019

How Macro Design and Program Structure Impacts GPP (Good Programming Practice) in TLF Coding

Don t Get Blindsided by PROC COMPARE Joshua Horstman, Nested Loop Consulting, Indianapolis, IN Roger Muller, Data-to-Events.

Quality Control of Clinical Data Listings with Proc Compare

Automate Analysis Results Metadata in the Define-XML v2.0. Hong Qi, Majdoub Haloui, Larry Wu, Gregory T Golm Merck & Co., Inc.

Prove It! Importance and Methodologies of Validation in Clinical Trials Reporting

Implementing CDISC Using SAS. Full book available for purchase here.

Common Programming Errors in CDISC Data

PharmaSUG Paper PO22

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

Edwin Ponraj Thangarajan, PRA Health Sciences, Chennai, India Giri Balasubramanian, PRA Health Sciences, Chennai, India

Best Practice for Explaining Validation Results in the Study Data Reviewer s Guide

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Improving CDISC SDTM Data Quality & Compliance Right from the Beginning

SDTM-ETL 3.2 User Manual and Tutorial

How to handle different versions of SDTM & DEFINE generation in a Single Study?

esubmission - Are you really Compliant?

ADaM and traceability: Chiesi experience

Using Proc Freq for Manageable Data Summarization

Mapping and Terminology. English Speaking CDISC User Group Meeting on 13-Mar-08

And check out a copy of your group's source tree, where N is your one-digit group number and user is your rss username

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India

PharmaSUG Paper DS16

From SAP to BDS: The Nuts and Bolts Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, United BioSource Corp., Ann Arbor, MI

CDISC SDTM and ADaM Real World Issues

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

The exam. The exam. The exam 10. Sitting a City & Guilds online examination 11. Frequently asked questions 18. Exam content 20

What is high quality study metadata?

Cleaning up your SAS log: Note Messages

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library

PharmaSUG Paper IB11

Creating Define-XML v2 with the SAS Clinical Standards Toolkit 1.6 Lex Jansen, SAS

What is the ADAM OTHER Class of Datasets, and When Should it be Used? John Troxell, Data Standards Consulting

CDISC Standards and the Semantic Web

Step through Your DATA Step: Introducing the DATA Step Debugger in SAS Enterprise Guide

AZ CDISC Implementation

Make SAS Enterprise Guide Your Own. John Ladds Statistics Canada Paper

Leveraging Study Data Reviewer s Guide (SDRG) in Building FDA s Confidence in Sponsor s Submitted Datasets

Traceability Look for the source of your analysis results

Experience of electronic data submission via Gateway to PMDA

Transcription:

THE DATA DETECTIVE HINTS AND TIPS FOR INDEPENDENT PROGRAMMING QC DATE PhUSE 2016 PRESENTED BY Bethan Thomas

What this presentation will cover And what this presentation will not cover What is a data detective? Writing a validation program Maintaining program independence Identifying data discrepancies Helpful hints and tips Other considerations QC process Complete step-by-step guide 2

What is a Data Detective? And what is independent double programming? Being a programmer often feels like you re a detective Solving problems Identifying root causes Independent double programming Two programmers, one aim A method of thoroughly checking outputs and achieving high quality When those two programmers differ You have two suspects! A detective is needed to solve the mystery! 3

Writing an independent program And maintaining that independence Make no assumptions The man in the balaclava may not have robbed the bank! The man in the smart suit may not be innocent! Create a safety net but don t duplicate work Ensure great programming practice Use all relevant documentation Familiarise yourself with Protocol, CRF, SAP, IGs Refer to them regularly and whenever in doubt Maintain Independence Be an unbiased detective Do not view each other s programs use %INCLUDE Discuss don t dictate 4

Detecting Data Discrepancies 1 Differing order variables Matching numbers of observations No obvious pattern of mismatching observations Mismatching on most variables Both programmers to check key variables 5

Detecting Data Discrepancies 2 Differing order variables or order variables differing? Possibly due to differing order variables E.g. one is using AVISITN, the other ADT or VISITNUM E.g. one is using PARAM, the other is using PARAMCD Possibly differing values of order variables E.g. VISITNUM numbered differently for unscheduled visits Pattern or pairing in mismatching rows 6

Detecting Data Discrepancies 3 Using source data and documentation 1 Aim of independent double programming is not simply for data to match but to be correct. Data should be an accurate reflection of source and conform to necessary formats. Example 1: AVISIT mapping of unscheduled visits when ADPE specification states, Populate for scheduled assessments. Identical except QC has populated AVISIT with Visit 3, whereas the primary dataset has AVISIT set to null in equivalent records. Reference schedule of assessments. Visit 3 is scheduled, however it is not planned to perform a Physical Examination at this visit. Validation programmer populated AVISIT in all cases unless the value of VISITNUM indicated an unscheduled visit (e.g. VISITNUM=4.01), Primary programmer only populated this where a Physical Exam was specifically scheduled. Check SAP to see if it provides more detail on how it classifies unscheduled visits and how they should be handled for analysis 7

Detecting Data Discrepancies 3 Using source data and documentation 2 An example from SDTM. The snapshots below come from Main and QC datasets for a Biospecimen Events (BE) domain. Gene Expression on 8 th January is in the main dataset but is not present in QC. 8

Detecting Data Discrepancies 3 Using source data and documentation 2 Refer to raw data Refer to CRF 9

Inside the detective s toolkit The FREQ procedure When observation counts differ, it can be difficult to know where to start looking. Calculate frequencies by a variable(s) and use it (them) in the ID statement of the PROC COMPARE. Good Choices Test/parameter Visit/timepoint Subject DTYPE/PARAMTYP Grouping Qualifiers Poor Choices Sequence number Date/day Flags Free text Results Add further by variables or subset the data to narrow down to an issue that can be investigated. Also useful to check mappings of coded variables 10

A QC Program and a program to QC Keep it separate There are lots of programmatic ways of identifying discrepancies and their causes: - Subset (variables or records) - Re-sort - Calculate frequencies - Modify data Keep these in a separate program Use temporary datasets do not overwrite data. 11

The LISTALL option And a warning about ID variables The LISTALL option list observations or variables present in one dataset but not the other, as well as comparing observations present in both datasets Coupled with ID variables, this is particularly helpful. E.g. comparing counts by PARAMCD. The output might state that PARAMCD= SYSBP is only found in the Main dataset. If the LISTALL option is used without ID variables, the output would simply state that the last observation is found in the Main dataset only, and not point to the specific parameter. If ID is used without LISTALL, the following misleading output can appear: 12

SAS Shortcut Keys As the same techniques can be used for qc-ing any kind of dataset, you can save time by creating a SAS shortcut or abbreviation. This is very straightforward but different depending on the version of SAS, check out SAS help for details. In SAS Enterprise guide go to the Program menu and into Editor Macros In older versions of SAS this can be found in the Tools menu and into Keyboard macros data qc; data main; set adam.; subject= scan(usubjid,2,'-') '-' scan(usubjid,3,'-'); set qadam.; subject= scan(usubjid,2,'-') '-' scan(usubjid,3,'-'); * if; * if; * where; * where; * keep; * keep; * drop; * drop; run; run; /*proc sort data=main;*/ /* by ;*/ /*run;*/ /*proc freq data=main noprint;*/ /* table subject / out=main (drop=percent);*/ /*run;*/ /*proc sort data=qc;*/ /* by ;*/ /*run;*/ /*proc freq data=qc noprint;*/ /* table subject / out=qc (drop=percent);*/ /*run;*/ proc compare base=main compare=qc listall; /* id subject ;*/ run; 13

Some final hints and tips Remember to visually check for obvious anomalies and to avoid rare cases where both programmers make identical mistakes. Check for: Truncation Missing data Implausible values Incorrect mapping from source Use the relevant validation checkers Compile a QC checklist covering tasks and checks required for each output type (SDTM, ADaM, TFL) to ensure thoroughness and consistency. Add study-specific checks to list if necessary. Continually refer to documentation (Protocol, CRF, SAP, shells, CDISC documentation). 14