OpenCDISC Validator 1.4 What s New? Bay Area CDISC Implementation Network 23 May 2013 David Borbas Sr Director, Data Management Jazz Pharmaceuticals, Inc.
Disclaimers The opinions expressed in this presentation are not the official views or policies of Jazz Pharmaceuticals, Inc. 2
Topics What is Open CDISC? Why is it important? View of the FDA use of OpenCDISC Validator Using the application Understanding the Validation Reports 3
Open CDISC Open Source application Java and xml Can check/validate SDTM, ADaM, SEND and Define.xml files Generate Define.xml limited Based on published standards e.g. Draft Janus validation rules and user suggested rules Currently used in the FDA submission checking process 4
FDA Review Past, Present & Future FDA Review Environment Sponsor Validation Review Past WebSDM Sponsor Data in ectd format for NDA or snda FDA Electronic Document Room Servers Present Open CDISC Validator Future 2013? FDA Review Tools WebSDM/ Empirica, jreview, SAS/JMP Open CDISC Validator Data Fit JANUS Data Views Future Additional Data Visualization Tools - Evaluation of JMP Clinical, ireview, Spotfire 5
Open CDISC Application Website location http://www.opencdisc.org/ PC specs Minimum RAM 1 GB Any Java capable PC Java version required >= ver 1.5 run check-java.bat file to confirm version Command line option available http://www.opencdisc.org/projects/validator/usingopencdisc-validator 6
What is New in Version 1.4-1 Support for SDTM v1.3 and SDTMIG v3.1.3 SDTMIG 3.1.2 amend 1 removed upgraded to SDTMIG 3.1.3 Controlled Terminology Added ability to specify CDISC Controlled Terminology versions ADaM and SEND validation Added ability to specify SEND or ADaM validation types 7
What is New in Version 1.4-2 New SDTM Validation Rules Validator 1.3 Rule Count = 227 Validator 1.4 Rule Count = 360 Updates to SDTM, SEND and ADaM rules Better support for split datasets Specify MedDRA/CDISC Controlled Terminology versions Specify SEND/ADaM validation types 8
GUI Screen View - 1 9
Directories config Note: Version 1.4 on the right panel in this and all following slides 10
Directories - data 11
Directories cdisc New directory Not present in ver 1.3 12
Directories cdisc/sdtm Terminology Directories 13
Directories 2012-12-21 14
Terminology Files Updating Terminology files Create a new directory Name with terminology version date 2013-04-12 newly added after the OpenCDISC validator ver 1.4 was downloaded 15
Updated Terminology File added! 16
MedDRA -1 Same process as Terminology versions 17
MedDRA - 2 Add the MedDRA ascii files for each version to a directory 18
MedDRA versions in the GUI 19
Using MedDRA with Define.xml OpenCDISC Validator is programmed to use the MedDRA Version that is specified in the define.xml file. So running validation with a define.xml file will support checking against that MedDRA version. NOTE: You cannot override the MedDRA version specified in the define.xml file by selecting a different version of MedDRA from the Options menu of the GUI. If you do not use a define.xml file OR the define file does not specify the MedDRA version you can use a drop down list choice during the Options selection phase of the MedDRA GUI. 20
Running OpenCDISC Validator - GUI Start in the main Validator directory where you unzipped the files and find the file client.bat. This is the file that will start the GUI and for most users will be the easiest way to run validation checking of CDISC Standards files. Select client.bat to start the program. 21
Client Application - Options Validate (data) or Generate Define.xml Standard Choices SDTM AdaM Define.xml SEND Custom (Tabular) Source / Input files: select location SAS V5 Transport Delimited (CSV, Tab etc) Configuration Depends on Standard choice Define.xml optional Reports Report settings link: use to customize report filename etc 22
Viewing the Validation Report - 1 Once the validation is complete you will see a brief summary of the validation statistics 23
Viewing the Validation Report - 2 Once the validation is complete you have three options View the validation report - click the View Report button OR Start another validation or define generation activity - click the New Session button OR Exit the GUI application by clicking the File - > Exit or the close window red x and confirm the exit. 24
Validation Report - 1 At the end of a validation run you can choose to view the report. When you select this option the spreadsheet file will open. You can review the report, save the file in another location, or attach to an email. Since version 1.3 the Excel report has been revised to improve readability. The issues summary tab is now grouped by domain The terminology issues are collapsed into instances of distinct terms. This reduces the size of the details tab. 25
Validation Report - 2 Formats CSV file unlimited rows 4 tabbed Excel worksheets 1. Dataset Summary with Error counts Errors (High) Warnings (Med) Infos (Low) 2. Issue Summary by Rule ID with error counts 3. Details dataset, variables, values, rules, messages Contains the complete list of the validation findings sorted by domain name. 4. Rules Rule ID with message, description and type Report file name and headers can be input with Protocol, date, time, other text etc Useful to communicate with vendors and internal team 26
Validation Report - 3 The Details worksheet can be filtered and sorted to identify the individual data items you want to investigate by dataset or by rule or by type of check. There are a two characteristics of this list to highlight that will aid the interpretation of your results. 27
Validation Report - 4 Unique Row Findings If the finding is related to a specific row in the data then a record number will be present. And the specific variable(s) and value(s) will be identified. If character values are placed into numeric fields OR numeric values placed into text fields these will be treated as unique records in the finding list. Global Findings Controlled Terminology findings for a variable that are present across multiple records will be counted for each specific term. 28
Interpreting the Validation Results - 1 What Open CDISC can automate for you dataset structure var names / labels data integrity checks reference to DM subjects presence of baseline flags dates after disposition Results units consistency Terminology checks referential integrity Start date before End date Disposition references - sometimes 29
Interpreting the Validation Results - 2 What you still have to do dataset structure do you have the right vars per spec? custom domains data integrity baseline flags does not see 2 per subject Review Terminology flags, If a codelist is expandable is it correct? data validation more content focused right subjects right dates right codelists right testcodes 30
Interpreting the Validation Results - 3 False positive results may occur Lab tests without units will generate error SD0026 and SD0029 e.g. Urine Ph = 5, specific gravity 1.012 Terminology Extensible codelists with non matches in the terminology file where the sponsor has added codes / values not present E.g. Oxygen Saturation as O2SAT VSTESTCD Some program bugs may generate false error messages Post to Open CDISC forum 31
Interpreting the Validation Results - 4 View of Validation Report 32
OpenCDISC Forum Feedback on this New Rule! - 1 SD1082 -- variable length is too long for actual data graded as an ERROR in the AE domain Remember All Errors should be addressed and fixed if possible prior to esubmission..note, that variable length issue is not SDTM compliance. It s very specific to SAS format used for regulatory submission. 33
OpenCDISC reply - 2 Defining requirements for variable lengths is a very interesting and controversial topic. This week we have productive discussion on FDA/PhUSE Data Validation workgroup meeting. There are several basic and mutually exclusive needs or risks. 1. Datasets should be re-sized (not compressed!) to minimal size because (very long list of reasons, e.g., data transfer, archiving storage, analysis tools limitations, hardware limitations, etc.) 2. Variables in SAS XPORT datasets should have consistent/predefined length to avoid data truncation during data integration. There are many un-perfect and un-complete recommendations how to achieve this. E.g., --TESTCD variables lengths should be always 8 Chars. 3. Variable length should be defined by data collection process. E.g., it can be set to maximum length of value in your data collection or control terminology codelist. 34
Rule change 3 New Release v 1.4.1 Note, that variable length issue is not SDTM compliance. It s very specific to SAS format used for regulatory submission. OpenCDISC validator will remove all current Variable Length related checks. Only one new Rule will be introduced. Before sending data to FDA you need to re-size you data by each variable to actual maximum value. Note, that it s needed only for actual data transfer to FDA. You can do whatever you want before this event. It s very easy to do. Profiles will be updated soon. Watch for v1.4.1 OpenCDISC Validator release in the next weeks. 35
Suggested Process for OpenCDISC Validator Review Setup / Parameters Confirm SDTM Version Confirm MedDRA version compared to Data Mgt Plan Confirm Terminology version Include the define.xml file if present Set report parameters Study Name / Number / Dates / other text / Excel message limit Run validation Review Error report Update Issue Summary tab with Comments Refer to details tab as needed Identify / report any new bugs Consider submitting an update to terminology team If final dataset for submission to CBER then complete word doc with validation findings. CBER doc UCM209772 36
Questions? 37
Thank you! David Borbas RN, MIS Sr. Director, Data Management Jazz Pharmaceuticals, Inc David.Borbas@jazzpharma.com P. 650-496-2637 38