Converting the define.xml to a Relational Database to Enable Printing and Validation Lex Jansen Octagon Research Solutions, Inc. Leading the Electronic Transformation of Clinical R&D * PharmaSUG 2009, Portland, OR 1
Contents Regulatory landscape Data Definition Tables What is XML What is the define.xml Displaying the define.xml Define.xml in the SDTM/ADaM CDISC pilot (printing issue) define.xml as a relational data model Validating the define.xml SAS XML mapper SAS ODS PDF / SAS ODS PS Bookmark support 2
Regulatory landscape 3
Regulatory Landscape (FDA) July 2004 FDA adds Study Data Specifications v1.0 to draft ectd Guidance. This specification references the CDISC SDTM for data tabulation datasets 4
Regulatory Landscape (FDA) March 2005 Study Data Specifications v1.1: Updates Specifications for Data Set Documentation - data definitions - annotated case report forms (CRFs) The specification for the data definitions for datasets provided using the CDISC SDTM is included in the Case Report Tabulation Data Definition Specification (define.xml) developed by the CDISC define.xml Team Data Definition for other data sets follows: Providing Regulatory Submissions in Electronic Format NDA (1999), which is the define.pdf 5
Regulatory Landscape (FDA) 2006 CDISC SDTM / ADaM Pilot Project: Collaborative Pilot project with FDA and industry to test how well the submission of CDISC compliant data sets and associated metadata meets the needs of both medical and statistical FDA reviewers Generation of ICH E3/eCTD clinical study report (CSR) using the CDISC data models Data Definition Tables were provided in XML format (CRT- DDS, define.xml) 6
Regulatory Landscape (FDA) April 2006 FDA issues final Guidance for Industry: Providing Regulatory Submissions in Electronic Format Human Pharmaceutical Product Applications and Related Submissions Using the ectd Specifications Application Table of Contents: XML instead of PDF This guidance now has the following reference for datasets: See the associated document "Study Data Specifications" for details on providing datasets and related files (e.g., data definition file (define.xml), program files) 7
Regulatory Landscape (FDA) September 29, 2006 FDA announces the withdrawal of the three guidances for Providing Submissions in Electronic Submission Format (NDAs, ANDAs, and Annual Reports for NDAs and ANDAs) These guidances are being withdrawn because they are no longer consistent with ectd guidance After December 31, 2007 ectd is the preferred format for electronic submissions going to CDER: - consistent with FDA s technical capabilities - more efficient than other choices 8
Regulatory Landscape (FDA) December 11, 2006 FDA proposes to amend regulations governing the format for data submissions for NDAs, BLAs and ANDAs and their supplements and amendments Proposal mandates that data must be submitted and provided in an electronic format that the FDA can process, review, and archive. Proposal would require the use of standardized data structure, terminology, and code sets contained in current FDA guidance (SDTM) to allow for more efficient and comprehensive data review. Provides a two-year transition period for implementation This would be Notice of Proposed Rulemaking in March 2007 9
Regulatory Landscape (FDA) Currently a date of September 2009 has been given 10
Regulatory Landscape (FDA) To Summarize: Sponsors are submitting clinical study data in electronic format to the FDA Currently version 5 SAS transport files (XML in the future) Data Definition file helps reviewers to understand data Format for Data Definition file has changed from PDF to XML 11
Data Definition Tables in PDF 12
Data Definition Tables - PDF 1999 Guidance: sponsor has to document submitted data by including data definition tables (define.pdf) and annotated case report forms (blankcrf.pdf) 13
Data Definition Tables - PDF 1999 Guidance: sponsor has to document submitted data by including data definition tables (define.pdf) and annotated case report forms (blankcrf.pdf) 14
Data Definition Tables in XML 15
Data Definition Tables - XML Example from the CDISC SDS Metadata Team (2005) 16
Data Definition Tables - XML Example from the CDISC SDS Metadata Team (2005) 17
What is XML 18
XML XML stands for extensible Markup Language XML was designed to transport and store data, not to display data (display and content separated) Like HTML, XML is a markup language that makes use of tags (words bracketed by '<' and '>') and attributes (of the form: name="value") XML does not DO anything; XML tags are not predefined. You must define your own (self-descriptive!) tags 19
XML HTML specifies what each tag and attribute means, and often how the text between them will look in a browser XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it In other words, if you see "<p>" in an XML file, do not assume it is a paragraph. Depending on the context, it may be a price, a parameter, a person, (or maybe something that does not start with with a "p"?) 20
XML An XML file is well-formed if it conforms to the rules of XML syntax A single element (root element) contains all other elements in the document (define.xml : <ODM> ) Elements have to be properly opened and closed Elements do not overlap, e.g. properly nested Attributes are properly quoted The document does not contain illegal characters. Example: if < is part of the content it should be substituted as < A conforming XML parser is not allowed to process an XML document that is not well-formed 21
XML Predefined entities: Character Entity & & < < > > " " ' ' Entities & and < MUST be used within text value of an element or attribute 22
XML An XML file is valid if it conforms to a specific XML schema An XML schema is a description of a type of XML document Defines constraints on the structure and contents of documents of that type Schema defines allowed elements and attributes, order of elements, overall structure, etc A schema might describe that the content of a certain element, that contains a datetime value, is only valid if the value conforms to the ISO8601 standard 23
XML 24
What is the define.xml 25
define.xml Case Report Tabulation Data Speciffication (CRT-DDS, define.xml) Production version: 1.0.0 Based on version ODM version 1.2.1 Maintained by CDISC s XML Technologies Team (formerly known as the ODM team) New version of define.xml expected in 2009 with additional metadata for SDTM and supporting SDTM V3.1.2 26
define.xml 27
define.xml Specifications 28
define.xml 29
define.xml 30
Displaying the define.xml 31
Displaying the define.xml From the CDISC SDS Metadata Team (2007): define.xml + XSL style sheet = html 32
Displaying the define.xml From the CDISC SDS Metadata Team (2007): define.xml + XSL style sheet = html 33
Displaying the define.xml From the CDISC SDS Metadata Team (2007): define.xml + XSL style sheet = html 34
Define.xml in the SDTM/ADaM CDISC pilot 35
Displaying the define.xml SDTM / ADaM Pilot (published 2008): adds Analysis metadata define.xml + XSL style sheet = html 36
Displaying the define.xml SDTM / ADaM Pilot (published 2008): adds Analysis metadata define.xml + XSL style sheet = html 37
Displaying the define.xml SDTM / ADaM Pilot (published 2008): adds Analysis metadata define.xml + XSL style sheet = html 38
Data Definition Tables XML - Printing CDISC SDTM / ADaM Pilot project report: A major issue identified by the regulatory review team was the difficulty in printing the Define file. The style sheet used in the pilot submission package was developed with the primary target of web browser rendering, which is not readily suited to printing. Reviewers who attempted to print the Define file found that the file did not fit on portrait pages, that page breaks were not clean, and that printing only a portion of the file was difficult. Opening the document in another application (e.g., Microsoft Word) provided a work-around, but was not an option that was user friendly or efficient. 39
Data Definition Tables XML - Printing CDISC SDTM / ADaM Pilot project report: This problem could be viewed as an implementation issue that sponsors will need to handle, after discussing the issue with their FDA reviewers. For example, a sponsor might choose to provide two versions of the style sheet XML for viewing and PDF for printing. Ideally, a reminder of the issue would be included somewhere in the CRT-DDS guidance (e.g., a note that consideration be given to how the sponsor will respond to a request from reviewers for a print-friendly version of the style sheet). It should be noted that the regulatory review team for the pilot project emphasized that the ability to print the document would be essential for the future use of XML files. 40
Displaying the define.xml 41
Data Definition Tables XML - Printing The PDF format is the de facto standard for printable documents on the web PDF is platform independent (no browser issues ) How can we create a PDF file from the define.xml??? 42
Define.xml -> define.pdf How to create the PDF rendition of define.xml? Original metadata in SAS SAS ODS PDF define.pdf But what if we only have the SAS.XPT files and the define.xml? Use XML based tools to convert XML to PDF (FOP, XSL Formatting Objects Processor) Possible, but very complicated to develop in-house when not familiar with XML technology. 43 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Define.xml -> define.pdf How to create the PDF rendition of define.xml? FOP - XSL Formatting Objects Processor 44
Define.xml -> define.pdf SAS Solution: Convert the XML hierarchy to a relational data model in the form of (2-dimensional) SAS data sets Once we have the define.xml content in SAS datasets, we can use SAS to create a PDF rendition (with ODS PDF) 45 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Relational data model How to Convert the XML hierarchy to a relational data model in the form of (2-dimensional) SAS data sets Solution: SAS XML Mapper SAS XML Mapper: free stand-alone Java client application available on the SAS product distribution disks Uses XPATH to create a MAP file that maps hierarchical XML to rows and columns in SAS 46 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Define.xml -> define.pdf In SAS: FILENAME DEFINE "C:\Projects\CDISC\Define.xml\define.xml"; FILENAME SXLEMAP "C:\Projects\CDISC\Define.xml\DefineXML.map"; LIBNAME DEFINE XML XMLMAP=SXLEMAP access=readonly; PROC COPY IN=define OUT=outlib; RUN; 47
Define.xml -> define.pdf Solution: SAS XML Mapper + SAS ODS PDF You will need to create a relational data model to convert the XML hierarchy to the 2-dimensional SAS data set 48
define.xml as a relational model 49
50 2009 Octagon Research Solutions, Inc. All Rights Reserved.
SAS XML Mapper 51
Define.xml -> SAS datasets 52 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Define.xml -> SAS datasets 53 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Define.xml -> SAS datasets 54 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Validating the define.xml 55
Define.xml -> Validation Some process used Metadata (data sets, variables, codelists) to create: - define.xml - SAS transport files define.xml Validating the define.xml: 1. well-formedness 2. Against Schema 3. Against CRT-DDS Specification 4. Against SAS transport files 5. Against SDTM spec 56 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Define.xml -> SAS datasets: Validation Some process used Metadata (data sets, variables, codelists) to create: - define.xml - SAS transport files define.xml Use SAS XML Mapper to convert define.xml to SAS data sets VALIDATION define.xml as SAS datasets use SAS ODS to create define.pdf 57 2009 Octagon Research Solutions, Inc. All Rights Reserved.
Define.xml -> define.pdf Creating the PDF file with SAS ODS PDF Tables can be created with PROC REPORT Text can be created with ODS PDF TEXT Text can be created with the ODS DATA Step Object (preproduction in 9.1.3) (Next generation DATA _NULL_) 58
Define.xml -> define.pdf ODS ESCAPECHAR = '^'; FOOTNOTE j=r 'page ^{thispage} of ^{lastpage}'; ODS PDF FILE="DefineXML.pdf" STYLE=printer COMPRESS=1 TITLE="Define.xml to Define.pdf" UNIFORM AUTHOR= Lex Jansen, Octagon Research Solutions, Inc." KEYWORDS="SAS, CDISC, XML, define.xml, CRT-DDS"; DATA _NULL_; SET out.metadataversion; CALL EXECUTE('ODS PDF TEXT="' " ^S={FONT_WEIGHT=BOLD} FILETYPE = ^S={FONT_WEIGHT=BOLD}" Strip(ODM_FileType) '";'); 59
Define.xml -> define.pdf 60
Define.xml -> define.pdf DATA _null_; SET out.referencedoc; RUN; CALL EXECUTE('ODS PDF TEXT="' '^S={CELLHEIGHT=0.5IN CELLWIDTH=8IN ' 'URL=""' Strip(leaf_href) '""}' Strip(title) "(" Strip(leaf_href) ')";'); 61
Define.xml -> define.pdf SAS ODS PDF / PS - Links DEFINE CodeListRef_CodeListOID / DISPLAY WIDTH=30 "Controlled Terminology" STYLE=[URL=$CTLnk.]; COMPUTE ItemDef_Comment; urlstring="#computationalalgorithms"; IF INDEX(ItemDef_Comment, 'See Computational Method:') THEN CALL DEFINE('_c7_','url', urlstring); ENDCOMP; COMPUTE ItemDef_Origin; LENGTH urlstring $200; urlstring= annotatedcrf.pdf"; IF INDEX(ItemDef_Origin, 'CRF') THEN CALL DEFINE('_c6_','url', urlstring); ENDCOMP; 62
Define.xml -> define.pdf SAS ODS PDF / PS - Bookmarks There is not a lot of control in SAS 9.1.3 (better in 9.2) ODS PROCLABEL Proc report CONTENTS option PROC DOCUMENT: move bookmarks... Sorry, not for PROC REPORT (in 9.1.3) 63
Define.xml -> define.pdf SAS ODS PDF / PS - Bookmarks %MACRO REP_Domain(domain, label); ODS PDF ANCHOR="&Domain"; ODS PROCLABEL = "&label (&domain)" ; PROC REPORT DATA=Items CONTENTS=""; WHERE ItemGroupDef_OID="&domain" %MACRO; DATA _null_; SET ItemGroupDef_ItemRef; CALL EXECUTE ('%REP_Domain(' STRIP(ItemGroupDef_OID) "," STRIP(ItemGroupDef_label) ");"); RUN; 64
Define.xml -> define.pdf SAS ODS PDF / PS - Bookmarks Instead of PDF create Postscript (ODS PS) Post-process Postscript to fix bookmarks Use Distiller to create PDF 65
Define.xml -> define.pdf 66
Define.xml -> define.pdf 67
Define.xml -> define.pdf 68
Define.xml -> define.pdf 69
Define.xml -> define.pdf 70
Define.xml -> define.pdf 71
Define.xml -> define.pdf 72
Define.xml -> define.pdf 73
Define.xml -> define.pdf 74
Define.xml -> define.pdf PDF rendition is being used in the 2 nd CDISC FDA Integrated Safety Data Pilot 75
76