Figure 1. Table shell

Similar documents
Statistics without DATA _NULLS_

XML in the DATA Step Michael Palmer, Zurich Biostatistics, Inc., Morristown, New Jersey

A Method to Import, Process, and Export Arbitrary XML Files with SAS

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation

Overview 14 Table Definitions and Style Definitions 16 Output Objects and Output Destinations 18 ODS References and Resources 20

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

Task 2 Guidance (P2, P3, P4, M1, M2)

The Implementation of Display Auto-Generation with Analysis Results Metadata Driven Method

FDA Portable Document Format (PDF) Specifications

HTML for the SAS Programmer

Co. Cavan VEC. Co. Cavan VEC. Programme Module for. Word Processing. leading to. Level 5 FETAC. Word Processing 5N1358. Word Processing 5N1358

The Wonderful World of Define.xml.. Practical Uses Today. Mark Wheeldon, CEO, Formedix DC User Group, Washington, 9 th December 2008

A SAS and Java Application for Reporting Clinical Trial Data. Kevin Kane MSc Infoworks (Data Handling) Limited

Liberate, a component-based service orientated reporting architecture

An Introduction to Analysis (and Repository) Databases (ARDs)

Word processing software

Study Data Reviewer s Guide Completion Guideline

Creating a Patient Profile using CDISC SDTM Marc Desgrousilliers, Clinovo, Sunnyvale, CA Romain Miralles, Clinovo, Sunnyvale, CA

An Efficient Solution to Efficacy ADaM Design and Implementation

How to write ADaM specifications like a ninja.

Run your reports through that last loop to standardize the presentation attributes

This document describes the features supported by the new PDF emitter in BIRT 2.0.

Quick Results with the Output Delivery System

Case Study Update on Structured Content Approaches at Genzyme

Customizing SAS Data Integration Studio to Generate CDISC Compliant SDTM 3.1 Domains

SAS (Statistical Analysis Software/System)

Implementing CDISC Using SAS. Full book available for purchase here.

SAS (Statistical Analysis Software/System)

Managing your metadata efficiently - a structured way to organise and frontload your analysis and submission data

ICH M8 Expert Working Group. Specification for Submission Formats for ectd v1.1

Legacy to SDTM Conversion Workshop: Tools and Techniques

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments...

Lex Jansen Octagon Research Solutions, Inc.

PharmaSUG 2014 PO16. Category CDASH SDTM ADaM. Submission in standardized tabular form. Structure Flexible Rigid Flexible * No Yes Yes

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Biotechnology Industry Organization 1225 Eye Street NW, Suite 400 Washington, DC 20006

Patricia Guldin, Merck & Co., Inc., Kenilworth, NJ USA

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc.

QUERIES BY ODS BEGINNERS. Varsha C. Shah, Dept. of Biostatistics, UNC-CH, Chapel Hill, NC

Guidelines to Format and Upload Final Version of Accepted Paper Land and Poverty Conference 2018

Adobe Experience Manager (AEM) 6.2 Forms Designer Voluntary Product Accessibility Template

Chapter 1 Getting Started with HTML 5 1. Chapter 2 Introduction to New Elements in HTML 5 21

Essentials of the SAS Output Delivery System (ODS)

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

How to Create a Document Template with Microsoft Word

Once the data warehouse is assembled, its customers will likely

CFB: A Programming Pattern for Creating Change from Baseline Datasets Lei Zhang, Celgene Corporation, Summit, NJ

Quark XML Author October 2017 Update with Business Documents

Adobe LiveCycle Designer ES3 Voluntary Product Accessibility Template

ProSystem fx Site Builder. enewsletters

PharmaSUG China 2018 Paper AD-62

PharmaSUG. companies. This paper. will cover how. processes, a fairly linear. before moving. be carried out. Lifecycle. established.

OpenOffice.org Writer

Quark XML Author for FileNet 2.8 with BusDocs Guide

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

Word Template Instructions

Creating Interactive PDF Forms

From ODM to SDTM: An End-to-End Approach Applied to Phase I Clinical Trials

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

XF Rendering Server 2008

Material covered in the Dec 2014 FDA Binding Guidances

Revision of Technical Conformance Guide on Electronic Study Data Submissions

Create Metadata Documentation using ExcelXP

Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design

Post-Processing.LST files to get what you want

Quark XML Author October 2017 Update for Platform with Business Documents

Edwin Ponraj Thangarajan, PRA Health Sciences, Chennai, India Giri Balasubramanian, PRA Health Sciences, Chennai, India

Best Practices for E2E DB build process and Efficiency on CDASH to SDTM data Tao Yang, FMD K&L, Nanjing, China

Quark XML Author for FileNet 2.5 with BusDocs Guide

SAS (Statistical Analysis Software/System)

Adobe LiveCycle PDF Generator ES4 Voluntary Product Accessibility Template

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Data Standardisation, Clinical Data Warehouse and SAS Standard Programs

Create a new document based on default template, other available template like: memo, fax, agenda.

RWI not REI a Robust report writing tool for your toughest mountaineering challenges.

Lex Jansen Octagon Research Solutions, Inc.

Layout Manager - Toolbar Reference Guide

A Taste of SDTM in Real Time

LIBREOFFICE TRAINING PROTOCOL

USER GUIDE MADCAP FLARE Accessibility

Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization

USER GUIDE. MADCAP FLARE 2017 r3. Accessibility

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

CitiDirect for Securities New Features February 2019

OU Campus - Getting Started

Using Microsoft Word. Table of Contents

Alignment matrix of unit standard

Voluntary Product Accessibility Template

Quark XML Author September 2016 Update for Platform with Business Documents

Assessment of Vaisala Veriteq viewlinc Continuous Monitoring System Compliance to 21 CFR Part 11 Requirements

White Paper Assessment of Veriteq viewlinc Environmental Monitoring System Compliance to 21 CFR Part 11Requirements

Tables & Figures Abstracts ANSC 5307

ADaM Compliance Starts with ADaM Specifications

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

AASHTO Materials Standard Template Users Guide

DocOrigin Release 3.1 TECHNICAL PRODUCT OVERVIEW

CCRS Quick Start Guide for Program Administrators. September Bank Handlowy w Warszawie S.A.

Microsoft Word Basic Manually Table Of Contents Level 2007 Add

Transcription:

Reducing Statisticians Programming Load: Automated Statistical Analysis with SAS and XML Michael C. Palmer, Zurich Biostatistics, Inc., Morristown, NJ Cecilia A. Hale, Zurich Biostatistics, Inc., Morristown, NJ ABSTRACT Statisticians often spend more time programming and supervising the programming for tables than they spend on the statistical analyses reported in the tables. Features new in versions 7 and 8 of SAS coupled with XML make it possible to reduce this programming load by automating analysis steps. In the automated system, a statistician prepares table shells and database maps with publishing software that saves both as XML files. SAS programs interpret the XML, retrieve data, and combine the interpreted XML and retrieved data to build XML files. Publishing software renders the XML files to PDF or other formats. The result is that clinical data pass from an analysis data set into a report-quality document without a statistician or programmer writing or setting up table programs or macros. As table content is revised, the system can revise the reportquality tables. Table shells are reusable within a report and across reports. INTRODUCTION The first step in implementing a statistical analysis plan for a clinical trial is purely statistical. Questions of clinical interest are translated into statistical hypotheses, the data are analyzed, and, from the statistical results, inferences are made in the clinical realm about drug safety and efficacy. The second step involves presenting the results in a way that supports the inferences. Statisticians often find that the programming, or supervision of programmers, for results presentation such as statistical tables takes a burdensome amount of time. The availability of new software technology coupled with new features in versions 7 and 8 of the SAS system make it possible to automate some steps in statistical table production and, as a consequence, to reduce the programming load. The new technology is XML (extensible Markup Language). XML is a standard for electronic documents and data exchange. It was developed by the World Wide Web Consortium (W3C ) and released in February 1998. XML is non-proprietary and platform independent. SAS versions 7 and 8 support an experimental XML driver as part of the Output Delivery System (ODS) but the work reported here does not use that driver. Only stable, fully supported features of SAS software were used. The new features in versions 7 and 8 that enabled the work reported here include regular expression functions and 32k length for variables and format decode values. DOCUMENTS REPLACE PROGRAMS Statistical table programming can be largely replaced by two documents. One document is a table shell that contains style and composition information, but no content. The other document is a database map that connects the shell to specific records in a database of table content. These two documents are built with the familiar graphical user interface in commercial publishing software. The result is a classic automation. The labor-intensive programming tasks traditionally required to build statistical tables are replaced by the quicker and cheaper document interface and a higher-quality fully word-processed table is produced. Figure 1. Table shell Figure 1 shows a typical table shell in the automated system. This shell has the style and composition of the final table but instead of any table content, either text or numeric, there are row and column indices. Style features in the shell include increased font size and bolding for the 1,1 entry, center alignment for 1,1 and 2,1, shading for 2,1, and italics and reduced font size for 9,1. Composition features in the shell include borders, variable cell dimensions, and the varied number of columns for each row,. Figure 2. Database map Figure 2 shows a fragment of a database map. Each cell in the table shell has one or more corresponding entries in the map that point to data for the final table. The "DATASET" column in the map identifies a data set for each shell entry. Under the "CONDITION" column, "KEYS=" gives sort keys for the retrieved data in the final table, and "RULES=" gives subsetting criteria to apply to the data set to retrieve data. These two documents contain all of the information needed to produce a table. Style and composition information is in the table shell and content look-up information is in the database map. With the legacy method of creating statistical tables, that is, programming, this information would exist in program code, difficult to locate and difficult to revise or reuse once located. The segregation of style and composition from table content facilitates the revision and reuse of each. The table shell can be used as-is with a different database map, for the same study or a different study. Table content, including text such as titles, row headers, and column labels, can be revised by editing a database instead of modifying a program.

Figure 3. Screen capture of table shell in publishing software The table shell and database map documents were built in Arbortext s Adept Publisher. Adept is a full-featured commercial, off-the-shelf publishing package with a graphical user interface. As Figure 3 shows, Adept resembles Microsoft Word, except for the document map on the left side of the figure. The document map shows the markup language behind the document. Unlike Word, Adept can represent documents in XML and save the documents as XML. XML documents can be passed among applications for processing, between Adept and SAS in this case, and that capability guided the choice of publishing software. The populated, publishable table is built by Zurich Biostatistics' proprietary SAS programs. The software reads, interprets, and implements the XML versions of the table shell and database map documents. The shell and map can be built before the table content database is populated. SIMPLE PROGRAMMING REPLACES COMPLICATED PROGRAMMING Figure 4. Table content database Data set 1 Data set 2 Data set 3 Data set 4 The table content database has a simple structure that is uniform for all data sets in the database (Figure 4). Each record in the table content database has just one element of table content. Along with the table content, there is a set of unique keys and administrative variables such as a date and time stamp. Keys identify features of the statistical analysis, the variables analyzed, and as many other factors as are required to uniquely identify an element of content. The keys are used in the database map to select data for retrieval and to sort the retrieved data for the final table. Every statistical result that will appear in a table is in the database and every text item in a table (titles, row labels, column headings, for instance) has an associated numeric index in the database. The indices are decoded to text when the table is created. The table content database is populated as the final step in statistical analysis programs. To do this, the usual statistical analysis programs are written and results that will be placed in a table are saved with an output data set from the relevant SAS PROC or from ODS. In versions 7 and 8, every PROC in SAS/STAT supports ODS. The output datasets are processed to assign keys and administrative variables to each result and the results are output to the table content database. In the table content database, text exists in numeric variables. Each numeric value is decoded to text when the table is built. The existence of 32k format decodes in versions 7 and 8 makes this a practical way to handle text. This simple programming to dice up the output datasets is all of the programming needed to produce the final tables. SOFTWARE-TO-SOFTWARE TALK REPLACES MEETINGS BETWEEN TABLE PROGRAMMERS AND STATISTICIANS Publishing software, such as Adept, is specialized for the creation and maintenance of report-quality documents. SAS is specialized for statistical analysis, but is a clumsy tool for document creation. Statisticians programming load can be reduced by choosing the best tool for each task in implementing a statistical analysis plan. The methods discussed in this paper make it possible to choose SAS for statistical analysis and publishing software for report creation and maintenance. For this to work, the two tools have to talk to each other, and XML is the language of this conversation. Zurich Biostatistics developed a set of programs written in SAS 2

that read the XML documents prepared in the publishing software, interpret them, implement them, and build styled, composed, populated tables in XML. The finished tables in XML are returned to the publishing software for rendering to paper or electronic formats such as Adobe s PDF. The FDA s recent guidance on electronic submissions recommends PDF for electronic documents submitted to the agency. These programs along with the architecture of table shell, database map, table content database, and finished table are termed Tekoa Technology SM. AUTOMATED WORD-PROCESSED FINAL PRODUCT REPLACES LINE-PRINTER OUTPUT Figure 5. Published table as PDF in Adobe Acrobat Reader REUSE OF DOCUMENTS REPLACES REWORK OF PROGRAMS The separation of table style and composition in reusable table shell documents, on the one hand, and table content, including text, in datasets, on the other hand, eliminates the need to revise programs as tables are revised. Information for a specific table, either how it looks or its content, simply does not exist in program code. Revisions to table content, text or numeric, are edits of the table content database. The pre-existing table shell document is reused. Table shells can be reused within a project or across projects without modification or programming. Tekoa Technology is a classic automation. It s cheaper and quicker than the legacy process of manual programming and it also produces a higher quality product. The word-processed statistical tables automatically produced with Tekoa Technology include the style and composition features available in the commercial publishing software. These are features such as proportional fonts, font sizes, shading, italics, superscripts, subscripts, vertical text alignment, horizontal text alignment, page x of y numbering, and page-specific footnotes. Style features are built into the table shell when it is constructed in the graphical user interface and automatically propagated in the finished table that is built in Tekoa s SAS programs or added from a style sheet. There is no manual editing of the table. CONCLUSION SAS can work with publishing software and XML in an automated system to implement the statistical analysis plan for a clinical trial. The automated system uses documents instead of programming to apply the full range of publishing software style and composition features to statistical results. Compared to the legacy system of manual programming, the automated system produces a higher quality product with less programming. 3

APPENDIX Published table as paper output 4

REFERENCES XML spec: www.w3.org/tr/rec-xml Scientific American article on XML: www.scientificamerican.com/1999/0599issue/0599bosak.html ACKNOWLEDGMENTS SAS is a registered trademark of SAS Institute Inc. XML is a trademark of Massachusetts Institute of Technology. W3C is a registered trademark of the World Wide Web Consortium. Adept is a registered trademark of Arbortext, Inc. Word is a trademark of Microsoft Corporation. Adobe is a registered trademark of Adobe Systems, Inc. Tekoa Technology is a service mark of Zurich Biostatistics, Inc. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Michael Palmer or Cecilia Hale Zurich Biostatistics, Inc. 45 Park Place South PMB 178 Morristown, NJ 07960 Work Phone: 973-727-0025 Email: mcpalmer@zbi.net or cahale@zbi.net Web: www.zbi.net 5