XML in the bipharmaceutical

Similar documents
Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

ISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML)

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery

Introduction to XML 3/14/12. Introduction to XML

Paper. Delivering Strong Security in a Hyperconverged Data Center Environment

EUROPEAN ICT PROFESSIONAL ROLE PROFILES VERSION 2 CWA 16458:2018 LOGFILE

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Evaluation of technologies that will improve the UEL IT infrastructure, recommending and advising on strategic improvements

PREPARE FOR TAKE OFF. Accelerate your organisation s journey to the Cloud.

Software-defined storage systems from Lenovo

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Logicalis What we do

XML. Objectives. Duration. Audience. Pre-Requisites

Supporting Customer Growth Strategies by Anticipating Market Change End-to-end Optimization of Value Chains

Transitioning to Symyx

Cloud Computing: Making the Right Choice for Your Organization

SMART. Investing in urban innovation

Making hybrid IT simple with Capgemini and Microsoft Azure Stack

Evolution For Enterprises In A Cloud World

Preparing your network for the next wave of innovation

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

RED HAT ENTERPRISE LINUX. STANDARDIZE & SAVE.

Sistemi ICT per il Business Networking

Bioinformatics Data Distribution and Integration via Web Services and XML

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

> Semantic Web Use Cases and Case Studies

Grid BT. The evolution toward grid services. EU Grid event, Brussels May Piet Bel Grid Action Team

White Paper How IP is impacting Physical Access Control

Web Services Interoperability Organization. Accelerating Web Services Adoption May 16, 2002

ehealth Ministerial Conference 2013 Dublin May 2013 Irish Presidency Declaration

SD-WAN. Enabling the Enterprise to Overcome Barriers to Digital Transformation. An IDC InfoBrief Sponsored by Comcast

Video Surveillance Solutions from EMC and Brocade: Scalable and Future-proof

Cloud solution consultant

NEXT-GENERATION WIRELESS NETWORK INVESTMENT: LTE WILL FILL THE 5G GAP FOR OPERATORS AND VENDORS

Buy don t Build. Use don t Manage.

Cray's YarcData claims early success for graph database appliance

ADVANCED SECURITY MECHANISMS TO PROTECT ASSETS AND NETWORKS: SOFTWARE-DEFINED SECURITY

SDN-Based Open Networking Building Momentum Among IT Decision Makers

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

XF Rendering Server 2008

CYBER SECURITY OPERATION CENTER

D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research

ASSESSMENT SUMMARY XHTML 1.1 (W3C) Date: 27/03/ / 6 Doc.Version: 0.90

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Cloud solution consultant

Taking Back Control of Your Network With SD-LAN

Oracle and Tangosol Acquisition Announcement

STRUCTURED CABLING UK WIDE SERVICES

SDN AT THE SPEED OF BUSINESS THE NEW AUTONOMOUS PARADIGM FOR SERVICE PROVIDERS FAST-PATH TO INNOVATIVE, PROFITABLE SERVICES

First Utility. Deploying Axway API Gateway to secure public APIs, while enabling a low cost-to-serve

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

Official document issued by Task Force IPv6 France, November All rights reserved.

Cloudreach Data Center Migration Services

Smart thinking, clever working

For Healthcare Providers: How All-Flash Storage in EHR and VDI Can Lower Costs and Improve Quality of Care

Modern Systems Analysis and Design Sixth Edition. Jeffrey A. Hoffer Joey F. George Joseph S. Valacich

IP network probe systems increasing role in managing the customer experience

Gain Control Over Your Cloud Use with Cisco Cloud Consumption Professional Services

Enabling efficiency through Data Governance: a phased approach

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

When, Where & Why to Use NoSQL?

InvestIng strategically In advanced technology

Building innovative drug discovery alliances. Migrating to ChemAxon

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

Ambition Market Insights

Hybrid IT for SMBs. HPE addressing SMB and channel partner Hybrid IT demands ANALYST ANURAG AGRAWAL REPORT : HPE. October 2018

Top 4 considerations for choosing a converged infrastructure for private clouds

Hybrid WAN Operations: Extend Network Monitoring Across SD-WAN and Legacy WAN Infrastructure

The Hadoop Paradigm & the Need for Dataset Management

einfrastructures Concertation Event

Technical Consultant. Job Title Technical Consultant. Department Service Department. Reporting to Service Manager

Why Enterprises Need to Optimize Their Data Centers

XML: Extensible Markup Language

xiii A. Hayden Lindsey IBM Distinguished Engineer and Director, Studio Tools Foreword

3. LABOR CATEGORY DESCRIPTIONS

Hosting your success. Best in class cloud services, to keep your business ahead

xml:tm Using XML technology to reduce the cost of authoring and translation

2017 Company Profile

Version 11

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

I D C T E C H N O L O G Y S P O T L I G H T. V i r t u a l and Cloud D a t a Center Management

Update Windows. Upgrade the organisation. Reshaping ICT, Reshaping Business FUJITSU LIMITED. uk.fujitsu.com

When Computing Becomes Human: Automation, Innovation, and the Rise of the All-Powerful Service Provider

COMP9321 Web Application Engineering

Migrating to the new IBM WebSphere Commerce Suite Platform. The Intelligent Approach for the E-Commerce Transition ELLUMINIS CONSULTING GROUP

Call for expression of interest in leadership roles for the Supergen Energy Networks Hub

New Zealand Government IBM Infrastructure as a Service

Data Mining and Warehousing

Embarking on the next stage of hosted desktop delivery for international events management company

Solving the Enterprise Data Dilemma

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

5G Readiness Survey 2017

HPE BRIDGES TRADITIONAL AND NEW IT

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Hospital System Lowers IT Costs After Epic Migration Flatirons Digital Innovations, Inc. All rights reserved.

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

SOLUTION BRIEF RSA ARCHER IT & SECURITY RISK MANAGEMENT

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Transcription:

XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and at a fraction of the cost of other integration solutions. Dr Lara Marks and Emmett Power, Silico Research Limited Make no mistake, XML is coming and it will change the way that biopharmaceutical companies and their partners and suppliers integrate their information systems. At Silico Research, we recently conducted an analysis of the deployment of XML technology in the biopharmaceutical sector. As a result of the research, we concluded that by 2004 XML would be one of the most important technologies used to achieve data integration across pharmaceutical, biotechnology and genomic research. Biopharmaceutical, genomic and technology companies are advised to address the issue of XML and formulate an organisation-wide XML strategy without delay. XML technologies go to the heart of data integration. Over the next five years the pharmaceutical industry will shift from a wet, bench-driven, research model to one driven by information and computers. As that shift takes place, data integration will become a major source of competitive advantage or disadvantage. The Human Genome Project and new emerging technologies - such as bioinformatics, cheminformatics, pharmacogenomics, simulation and modelling - have created a massive growth in the amount of data that pharmaceutical companies have to confront in order to bring drugs to market. Data integration and analysis promise deeper insights into the biology and chemistry underlying drug actions and target diseases, and at the same time, radically shortening development times. This has led biopharmaceutical companies to invest heavily in emerging big-ticket technologies designed to integrate data across the enterprise; these include bioinformatic platform technologies, data warehouses and enterprise information portals. Such efforts have met with mixed results. At times, efforts to integrate have been de-railed by overoptimistic expectations of the technology and poor implementation, combined with dramatically shifting information creation and usage patterns. There are reasons to be optimistic that XML, the latest integration technology, could avoid many of the problems that have beset earlier integration technologies. What is XML? The Extensible Markup Language, XML for short, is a mark-up language - much like HTML which is widely used to present data on the Internet and other networks. Unlike HTML, however, which was designed to describe the format of data, XML has been designed to describe the structure and content of data and to facilitate the transfer of that data between applications and over networks including the Internet. XML achieves this by allowing data to be stored within an XML document in a structured, tagged format; this enables the data to be interpreted and modified by any XML-compatible application. XML is a derivative of the standard generalised mark-up language (SGML), which is the international standard for defining the structure and content of electronic documents. HTML is an application of SGML, but with a very limited scope. XML is extensible. This represents one of its great strengths in that, unlike HTML, the tags used to structure the data can be extended according to the needs of the user. In HTML the tags are fixed Over the next five years the pharmaceutical industry will shift from a wet, benchdriven, research model to one driven by information and computers Innovations in Pharmaceutical Technology 43

XML is extensible. This represents one of its great strengths in that, unlike HTML, the tags used to structure the data can be extended according to the needs of the user, Table 1. Key biochemical and pharmaceutical DTDs. and invariable, whereas in XML they can be extended to suit the needs of the organisation using the format. This is achieved in XML by writing a specific Document Type Declaration (DTD) that maps the structure of the data. The DTD is shared with other users of the document enabling them to read, map and process the data contained in it. Using DTDs, developers can create self-defining text tags to identify a piece of data for use in other applications. So, if for example the deploying company decides that it needs a tag in the document representing Sequence, that tag can be added to the DTD and shared with all applications using the DTD. Taking this principle to its logical conclusion, developers and users are writing their own customised DTDs to suit the needs of the specific data they are working with. Several specialised DTDs have been written for the pharmaceutical, biotechnology and genomic sectors. One example of a specialist DTD is the widely used Bioinformatic Sequence Markup Language (BSML), developed by Visual Genomics (now part of LabBooks) for comparing genetic data from multiple sources and platforms. Using BSML, 44 Innovations in Pharmaceutical Technology

scientists can compare genetic data, run applications across that data and display it in a special BSML browser written for the purpose. Another strength of XML is that it opens the door to data representation through the ability to use the language to model data. This can be done because XML can be used to define the structure of elements and data, and the inter-relationship of data and data elements. Developers can therefore create 2D representations of chemical and biological structures using data from a number of sources, or represent genomic data in a graphical format. This creates powerful opportunities for developers in the pharmaceutical, biotechnology and genomic sectors where complex data is increasingly interpreted visually. The data representation function of XML is fully captured in specialist browsers like the BIOML (Biological Markup Language) browser. XML also facilitates data integration. Most database APIs (application programming interfaces) are defined in a particular programming language, so that results are returned in that language s native data types. While Standard Query Language is a widely accepted method for specifying the database question, there is not yet a language-independent and database-neutral specification for the response. XML affords an opportunity to facilitate this. XML provides an open source solution for data migration between programming languages. This has proved highly attractive in a sector like pharmaceuticals, characterised by a large number of heterogeneous applications using a number of programming languages to derive complex data in a number of formats. XML core technologies Four programming technologies are central to XML: Document type declarations (DTDs) A DTD is a file containing structured data that defines the tags used in an XML document. DTDs define the data schema and structure of the document. For example, the Biological Markup Language (BIOML) DTD uses tags like organism The highest deployment of XML is in the discovery stages of the drug development process Innovations in Pharmaceutical Technology 45

As XML becomes more widely adopted, we expect to see the number of DTDs explode from 50 today to at least 500 label, chromosome label, gene label and DNA label to define and structure the biological data. This allows any XML-enabled application to use the BIOML DTD to access and manipulate the data. An XML document can refer to a DTD in an external document on for example a website, or contain the DTD itself. In order for two applications to share data through XML, they must be using the same tags and labels through the adoption of an agreed DTD or a parser. XML parsers The system receiving XML data needs a parser that can break an XML document down into its data elements. The parser includes a facility that maps elements from the DTD to data structures in the receiving system. The parser extracts the actual data out of the textual representation and creates either events or new data structures from them. Parsers also check whether documents conform to the XML standard and have a correct structure. This is essential for the automatic processing of XML documents. A validating parser checks a document not only for conformance to the general XML rules, but also enforces a certain DTD, checking whether all necessary elements are present and if their order is as specified in the DTD. Namespaces This refers to a collection of names, identified by a Uniform Resource Identifier (URI) reference, and is designed to avoid name collisions, and promote industry standard DTDs. Namespaces are a proposed standard for defining the location of DTDs, so that remote applications can exchange data and remote users can read XML pages. XSL Stylesheets are documents that, when processed with an XSLT transformation engine, can turn one form of mark-up (XML or HTML) into another. This is useful for piping XML documents from one version of a DTD to another. A number of biological stylesheets are hosted by bioxml.org. XML in the biopharmaceutical sector XML deployment is widespread in the pharmaceutical, biotechnology and genomic sectors. Of the executives surveyed by Silico Research, 75% said that they are currently deploying XML as part of their R&D infrastructure or product range. Virtually all those who are not deploying XML today expect to be doing so by 2003. But most deployments are trial deployments. Typically, pharmaceutical and biotechnology companies are trialing the technology in a few sites and across a few applications before implementing full-scale deployment. This is accounted for by two factors. The first is the newness of the technology. XML is in the early stages of development and this is acting as an inhibitor to its full-scale adoption. Both users and vendors are standing back to watch how the technology develops. The second factor is the rapid pace of development in XML. This is making companies adopt a wait-and-see attitude before making a commitment to the technology. The highest deployment of XML is in the discovery stages of the drug development process. There are three reasons for this. First, discovery and development have high integration needs; compared with other parts of the research pipeline, early stage research processes such as discovery have a strong need to integrate and manipulate complex data sets. Second, early stage teams in drug discovery processes typically have better IT skills than later stage teams, for example in clinical trials. This makes it easier for early stage teams to experiment with new technologies. Finally, early stage discovery and development teams are culturally more open to experimenting with new technologies than those involved in later stage functions. Pharmaceutical applications of XML Today, most companies rely upon internally generated DTDs designed to achieve a particular objective or to link specific data sources. Of the publicly available DTDs, three show significant usage: Bioinformatic Sequence Markup Language (BSML) used for the annotation of biopolymer sequence information, BIOpolymer Markup Language (BIOML) for genetic sequences, and Genome Annotation Markup Elements (GAME) for annotating biosequence features. Open issues and constraints A number of open issues and constraints need to be considered with respect to XML-based technologies. The newness of XML makes it difficult to assess its usefulness in the long term. As XML becomes more widely adopted, we expect to see the number of DTDs explode from 50 today to at least 500. This will create DTD name clashes and make it difficult to determine which DTD to adopt and when to update. Within the enterprise, XML is - and will continue to be - simply one of many data formats. Many pharmaceutical companies require the complex splitting and merging of data to and from multiple sources, and the combination of dependent data from relational databases. It is not clear that XML will help in this. XML builds on text-based documents - but text-based documents are poor utilisers of bandwidth. As the throughput of document processing increases, many companies will run into network bandwidth utilisation problems with XML. 46 Innovations in Pharmaceutical Technology

Conclusion XML holds out the opportunity to integrate data across the enterprise and across the network of partnerships and alliances that biopharmaceutical companies are building. Moreover, it holds out the possibility of doing so with little technological dislocation and at a fraction of the cost of other integration solutions. The key to realising these opportunities lies in planning and execution - as always. Dr Lara Marks is a Visiting Senior Research Associate at the CGHPSS Unit at Cambridge University and an honorary Senior Lecturer at the London School of Hygiene & Tropical Medicine. She is undertaking research into the impact of genomics on the pharmaceutical industry. Dr Marks is the author of a number of books and papers on pharmaceutical and healthcare issues, including a recently published and widely reviewed book for Yale University Press on the discovery and development of the oral contraceptive. Emmett Power is a leading European analyst of new information technologies focusing on bioinformatics, data warehousing and other infrastructure and analytical technologies. He advises leading pharmaceutical and technology companies, and is the author of a number of reports covering pharmaceutical information technologies. Note: Silico Research Limited is a pharmaceutical technology-based analysis organisation located in London and Cambridge in the UK. The company can be contacted via its website: www.silico- Research.com. Innovations in Pharmaceutical Technology 47