Controlling OpenCDISC using R. Martin Gregory PhUSE 2012 Budapest, October 2012

Similar documents
TS04. Running OpenCDISC from SAS. Mark Crangle

OpenCDISC Validator 1.4 What s New?

Out-of-the-box %definexml

Beyond OpenCDISC: Using Define.xml Metadata to Ensure End-to-End Submission Integrity. John Brega Linda Collins PharmaStat LLC

Programming in Python

SDTM-ETL TM. New features in version 1.6. Author: Jozef Aerts XML4Pharma July SDTM-ETL TM : New features in v.1.6

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments...

PhUSE Paper SD09. "Overnight" Conversion to SDTM Datasets Ready for SDTM Submission Niels Mathiesen, mathiesen & mathiesen, Basel, Switzerland

PhUSE US Connect 2019

Assignment 1: Communicating with Programs

How to Automate Validation with Pinnacle 21 Command Line Interface and SAS

Rundeck. A smart software for lazy people. Claudio IMGT - IGH Montpellier BioInformatics Meeting

Taming the SHREW. SDTM Heuristic Research and Evaluation Workshop

Data Science Services Dirk Engfer Page 1 of 5

DiskBoss DATA MANAGEMENT

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Lex Jansen Octagon Research Solutions, Inc.

Implementing CDISC Using SAS. Full book available for purchase here.

Improving CDISC SDTM Data Quality & Compliance Right from the Beginning

CptS 360 (System Programming) Unit 1: Introduction to System Programming

DiskBoss DATA MANAGEMENT

DiskBoss DATA MANAGEMENT

DiskBoss DATA MANAGEMENT

Study Composer: a CRF design tool enabling the re-use of CDISC define.xml metadata

Organizing Deliverables for Clinical Trials The Concept of Analyses and its Implementation in EXACT

Customizing SAS Data Integration Studio to Generate CDISC Compliant SDTM 3.1 Domains

Package callr. August 29, 2016

UNIX Essentials Featuring Solaris 10 Op System

CS420: Operating Systems. OS Services & System Calls

Manual Script Windows Batch For Loop Files In A Directory

IBM Identity Manager Command Line Interface Adapter White paper

The current topic: Python. Announcements. Python. Python

Linux Command Line Interface. December 27, 2017

Managing CDISC version changes: how & when to implement? Presented by Lauren Shinaberry, Project Manager Business & Decision Life Sciences

Working with Shell Scripting. Daniel Balagué

NETCONF Client GUI. Client Application Files APPENDIX

Introduction to Define.xml

Creating Define-XML v2 with the SAS Clinical Standards Toolkit

Sandra Minjoe, Accenture Life Sciences John Brega, PharmaStat. PharmaSUG Single Day Event San Francisco Bay Area

SAS Clinical Data Integration 2.6

Optimization of the traceability when applying an ADaM Parallel Conversion Method

Linux Essentials. Smith, Roderick W. Table of Contents ISBN-13: Introduction xvii. Chapter 1 Selecting an Operating System 1

Unix Shells and Other Basic Concepts

Command Line Interface

28-Nov CSCI 2132 Software Development Lecture 33: Shell Scripting. 26 Shell Scripting. Faculty of Computer Science, Dalhousie University

Updates on CDISC Standards Validation

BladeLogic Command Line Interface (BLCLI)

Tcl/Tk for XSPECT a Michael Flynn

SDTM Validation Rules in XQuery

Updated after review Removed paragraph mentioned java source code.

/ Cloud Computing. Recitation 5 September 27 th, 2016

Chapter 2. Operating-System Structures

Define.xml 2.0: More Functional, More Challenging

Parameter searches and the batch system

Adding, editing and managing links to external documents in define.xml

Automation of makefile For Use in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

Essential Unix and Linux! Perl for Bioinformatics, ! F. Pineda

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

Scripting. More Shell Scripts Loops. Adapted from Practical Unix and Programming Hunter College

Business & Decision Life Sciences

arxiv:cs/ v1 [cs.ma] 27 Jan 2004

Clinical Metadata A complete metadata and project management solu6on. October 2017 Andrew Ndikom and Liang Wang

R1 Test Case that tests this Requirement Comments Manage Users User Role Management

Oracle Financial Consolidation and Close Cloud

Study Data Reviewer s Guide Completion Guideline

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

How to write a well-behaved Python command line application. PyCon AU 2012 Tutorial Graeme Cross

Marthon User Guide. Page 1 Copyright The Marathon developers. All rights reserved.

Manual Shell Script Linux If Not Equal String Comparison

Java Applets, etc. Instructor: Dmitri A. Gusev. Fall Lecture 25, December 5, CS 502: Computers and Communications Technology

CS Programming Languages: Python

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA Airlines

Using the Scripting Interface

Big Data Applications with Spring XD

Cross-platform daemonization tools.

Chris Simpkins (Georgia Tech) CS 2316 Data Manipulation for Engineers Python Overview 1 / 9

Processes. Shell Commands. a Command Line Interface accepts typed (textual) inputs and provides textual outputs. Synonyms:

Python Interview Questions & Answers

Day 2: NetFPGA Cambridge Workshop Module Development and Testing

Outline. S: past, present and future Some thoughts. The 80s. Interfaces - 60s & 70s. Duncan Temple Lang Department of Statistics UC Davis

Python scripting for Dell Command Monitor to Manage Windows & Linux Platforms

Shells and Shell Programming

Supercomputing environment TMA4280 Introduction to Supercomputing

Cisco Configuration Engine 2.0

QF-Test - The License Server Manual

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

Chapter 2: Operating-System Structures

TECHNICAL BRIEF. Scheduling and Orchestration of Heterogeneous Docker-Based IT Landscapes. January 2017 Version 2.0 For Public Use

Edwin Ponraj Thangarajan, PRA Health Sciences, Chennai, India Giri Balasubramanian, PRA Health Sciences, Chennai, India

Introduction to Shell Scripting

QUICK START GUIDE PROTOCOL DEVELOPMENT INTEGRATION COLLECTION 2016

Additional Management Tools and Interfaces

Webgurukul Programming Language Course

Automatic Creation of Define.xml for ADaM

But before understanding the Selenium WebDriver concept, we need to know about the Selenium first.

Hadoop Streaming. Table of contents. Content-Type text/html; utf-8

Ftp Command Line Commands Linux Example Windows Put

SAS Clinical Data Integration 2.4

CHAPTER 2: SYSTEM STRUCTURES. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Chapter 2: Operating-System Structures

Transcription:

Controlling OpenCDISC using R Martin Gregory PhUSE 2012 Budapest, October 2012

1 Motivation 2 Solution 3 Details 4 Conclusion

Introduction OpenCDISC Validator is a project of OpenCDISC Java application checks compliance with rules defined by CDISC provides a graphical user interface (GUI) provides a command line interface (CLI) allows user extensions to the rules

Introduction We are concerned with: running the validator in a production environment automating the running of the validator We will not cover: interpreting the results evaluating the implementation of the CDISC rules

Features of the validator GUI

Features of the validator CLI The Command Line Interface (CLI) allows control of: whether a validation or generation task is to be run which type of standard to use (SDTM, ADaM, Define.xml, SEND, Custom); the location of the files to be validated; the OpenCDISC configuration file, define file and codelists to use; and the location and format of the report to produce. Full details of the options supported can be obtained by executing the command java -jar.../lib/validator-cli-1.3.jar -help

Limitations of the GUI

Limitations of the CLI all or nothing source specification commands are very long for a given version, paths and some options are fixed a bare directory name cannot be specified

Specify calls in a parameter file one line per call only variable information in the file: files to validate allowing relative paths OpenCDISC validator rules (configuration file) to use name and location of report report type optionally, the version of the validator to use answers all the limitations: inputs identified once selection of files can be processed by specifying a call per file only required information need be specified define.xml used if present

Specify calls in a parameter file Sample parameter file # Fields separated by white space: # source config report-location report-type [version] sdtm/ae.xpt sdtm-3.1.2 report/sdtm_ae.xls Excel sdtm/dm.xpt sdtm-3.1.2 report/sdtm_dm.xls Excel 1.3 adam/*.xpt adam-1.0 report/adam.xls Excel 1.2.1 adam/*.xpt ~config/my-adam-1.0.xml report/my-adam.csv Csv

Use function/script to read the parameter file

Features required in the scripting language independent of operating system good operating system interface to run programs and collect their output manage files anonymous, arbitrary and dynamic data structures support regular expressions hide operating system details

Scripting language candidates Perl satisfies all conditions, but implementations for Microsoft Windows differ Python satisfies all conditions, same implementation on each OS does not require Python to be installed R satisfies all conditions, same implementation on each OS more likely to be already available support for validated environments

R function runocdv() runocdv() signature runocdv(file, ocd.path=c("1.3"="/opt/opencdisc/v1.3", "1.2.1"="/opt/OpenCDISC/V1.2.1"), ocd.def.ver="1.3", ocd.java="/usr/lib/java/bin/java", log=na )

R Script call.runocdv.r Function runocdv() can only be called from R We use an R script call.runocdv.r which can be called from the OS For UNIX-like OS, the path to Rscript is included in the definition of the script: #! /usr/local/bin/rscript and so can be called as call.runocdv.r file On Microsoft Windows it may be necessary to call Rscript explicitly: \path-to-r\bin\rscript call.runocdv.r file

runocdv() log file Sample log file Source: /data/projects/runocdv-v1.0/sdtm/ae.xpt Define: /data/projects/runocdv-v1.0/sdtm/define.xml Standard: /opt/opencdisc/v1.3/config/config-sdtm-3.1.2.xml Report: /data/projects/runocdv-v1.0/report/sdtm-ae.xls Type: Excel OpenCDISC: 1.3 ------------------------------------------------------------ OpenCDISC Validator Messages ------------------------------------------------------------ Beginning validation, please wait... The validation has completed. ------------------------------------------------------------ Log: no log was created.

Details: Overall flow of the function

Overall flow of the function initialise read the parameter file check the parameters and construct 2-D arrays for: command arguments messages execute commands writing messages to the log

Initialisation check parameter file was specified and exists set up some vectors of valid values initialise the log file and write the header writelog(c(paste("note: runocdv.r Revision: 1.0 on server ", Sys.info()["nodename"]), paste("note: processing job list", basename(ocdfile),"at",sys.time()), "")) which produces NOTE: runocdv.r Revision: 1.0 on server green NOTE: processing job list src2_1.3.ocd at 2012-07-29 13:44:53

Reading the parameter file We want the input data as a list of vectors: Vector List element element Call 1 Call 2 Call 3 source sdtm/ae.xpt sdtm/dm.xpt adam/*.xpt config sdtm-3.1.2 sdtm-3.1.2 adam-1.0 report report/sdtm-ae.xls report/sdtm-dm.xls report/adam.xls report type Excel Excel Excel version 1.3 1.3 1.2.1 fid <- file(ocdfile, "r") fid.contents <- readlines(fid,-1,true) fid.jobs <- fid.contents[!grepl("^([[:space:]]*# [[:space:]]*$)", fid.contents)] ocd.param <- strsplit(fid.jobs,"[[:space:]]")

Constructing the command arguments For each call, i.e. element of ocd.params we check that: at least one source file exists: if (length(sys.glob(ocd.param[[i]][1]))==0) { # write error message } else {...} the configuration file exists the report directory exists: if (!file.exists(dirname(ocd.param[[i]][3]))) { # write error message and stop processing this job } else {...}

Forming canonical paths The normalisepath() function constructs a canonically correct path for the operating system Here we construct the java command specifying the appropriate jar file: ocd.args[[i]][5] <- paste(ocd.java, -jar, normalizepath(paste(ocd.path[ocd.param[[i]][5]], "/lib/validator-cli-", ocd.param[[i]][5],.jar,sep=""), mustwork=false), sep= )

Executing the validator The system() function executes a command using the user s default shell (UNIX) or directly (Microsoft Windows) intern=true captures stdout and stderr to character vector ignore.stderr=false is default ocd.jcmd.out <- system(ocd.java.cmd,intern=true) and write the messages to the log file writelog(ocd.jcmd.out)

Conclusion runocdv() allows repeated runs avoids error-prone manual selections enables batch processing saves reports where you want them handles multiple versions of the validator is independent of operating system Additionally, R is eminently suitable as a scripting language Note: Code available on the PhUSE Wiki