An End-to-End Web Services-based Infrastructure for Biomedical Applications

Similar documents
Sriram Krishnan

Opal: Simple Web Services Wrappers for Scientific Applications

Opal: Wrapping Scientific Applications as Web Services

GSI-based Security for Web Services

Providing Dynamic Virtualized Access to Grid Resources via the Web 2.0 Paradigm

The Opal Toolkit. Wrapping Scientific Applications as Web Services

GAMA: Grid Account Management Architecture

Extending Grid Protocols onto the Desktop using the Mozilla Framework

UGP and the UC Grid Portals

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

UCLA Grid Portal (UGP) A Globus Incubator Project

Using the MyProxy Online Credential Repository

AMGA metadata catalogue system

Ramachandran Plot. 4ytn. PRO 51 (D) ~l. l TRP 539 (E) Phi (degrees) Plot statistics

Computational Web Portals. Tomasz Haupt Mississippi State University

How to build Scientific Gateways with Vine Toolkit and Liferay/GridSphere framework

Incorporation of Middleware and Grid Technologies to Enhance Usability in Computational Chemistry Applications

Supplementary Information

By Ian Foster. Zhifeng Yun

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Demonstrated Node Configuration for the Central Data Exchange Node

Supporting information

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Michigan Grid Research and Infrastructure Development (MGRID)

X100 ARCHITECTURE REFERENCES:

Regular Forum of Lreis. Speechmaker: Gao Ang

Lisa Banks Distributed Systems Subcommittee

Dynamic Workflows for Grid Applications

UNICORE Globus: Interoperability of Grid Infrastructures

On the Creation of Distributed Simulation Web- Services in CD++

Sriram Krishnan, Ph.D. NBCR Summer Institute, August 2010

Cloud Computing. Up until now

An Eclipse-based Environment for Programming and Using Service-Oriented Grid

A User-level Secure Grid File System

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

Distributed Data Management with Storage Resource Broker in the UK

Giri Narasimhan & Kip Irvine

Java Development and Grid Computing with the Globus Toolkit Version 3

Introduction to Grid Computing

Gatlet - a Grid Portal Framework

Globus GTK and Grid Services

Credentials Management for Authentication in a Grid-Based E-Learning Platform

APBS electrostatics in VMD

glite Grid Services Overview

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

GT-OGSA Grid Service Infrastructure

Installation and Administration

Delivering Data Management for Engineers on the Grid 1

globus online Globus Nexus Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Building and Animating Amino Acids and DNA Nucleotides in ShockWave Using 3ds max

Grid Middleware and Globus Toolkit Architecture

Design The way components fit together

Introduce Grid Service Authoring Toolkit

A Simple Mass Storage System for the SRB Data Grid

Globus Online: File Transfer Made Easy!

Building Cyberinfrastructure for Bioinformatics Using Service Oriented Architecture

Design The way components fit together

Web Interface to Materials Simulations

AutoDockFR (ADFR) - Docking with flexible receptor sidechains

User Tools and Languages for Graph-based Grid Workflows

DSpace Fedora. Eprints Greenstone. Handle System

Service-Oriented Architecture (SOA)

SOMA2 Gateway to Grid Enabled Molecular Modelling Workflows in WWW-Browser. EGI User Forum Dr. Tapani Kinnunen

Delivering Data Management for Engineers on the Grid

At present we use several collaboration (web) tools, like SuperB website Wiki SVN Document management system etc.

Anatomy of the BIRN The Biomedical Informatics Research Network

Goal: Offer practical information to help the architecture evaluation of an SOA system. Evaluating a Service-Oriented Architecture

SERVO - ACES Abstract

DEVELOPER GUIDE PIPELINE PILOT INTEGRATION COLLECTION 2016

MSF: A Workflow Service Infrastructure for Computational Grid Environments

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

PYTHON FOR STRUCTURAL BIOINFORMATICS

Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net

Service Interface Design RSVZ / INASTI 12 July 2006

B. Assets are shared-by-copy by default; convert the library into *.jar and configure it as a shared library on the server runtime.

GLOBUS TOOLKIT SECURITY

Implementing a Ground Service- Oriented Architecture (SOA) March 28, 2006

Adobe ColdFusion 11 Enterprise Edition

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Globus Toolkit 4 Execution Management. Alexandra Jimborean International School of Informatics Hagenberg, 2009

RB-Tree Augmentation. OS-Rank. OS-Select. Augment x with Size(x), where. Size(x) = size of subtree rooted at x Size(NIL) = 0

Tools to Develop New Linux Applications

Vis: Online Analysis Tool for Lattice QCD

5.3 Using WSDL to generate client stubs

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Ellipse Web Services Overview

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Introduction of PDE.Mart

Agent-Enabling Transformation of E-Commerce Portals with Web Services

AIM Enterprise Platform Software IBM z/transaction Processing Facility Enterprise Edition 1.1.0

DESIGN AND IMPLEMENTATION OF SAGE DISPLAY CONTROLLER PROJECT

Greedy Algorithms Huffman Coding

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus

Grid Architectural Models

CA Adapter. CA Adapter Installation Guide for Windows 8.0

SZDG, ecom4com technology, EDGeS-EDGI in large P. Kacsuk MTA SZTAKI

Etanova Enterprise Solutions

(9A05803) WEB SERVICES (ELECTIVE - III)

Interoperable Web Services for Computational Portals

Transcription:

An End-to-End Web Services-based Infrastructure for Biomedical Applications Sriram Krishnan *, Kim K. Baldridge, Jerry P. Greenberg, Brent Stearn and Karan Bhatia * sriram@sdsc.edu

Modeling and Analysis Across Scales and Systems Laptop Molecular Workstation Macromolecular Molecular Assemblies Cellular Clusters Supercomputer Organ Grid

Goals Enabling integration across multi-scale biomedical applications Leveraging geographically distributed, disparate computational and data resources

Functional Requirements Making biomedical applications Gridaware Remote execution on Grid resources Use of Grid-based schedulers Support for multiple concurrent users Access via disparate user interfaces Use of standards-based security mechanisms Integration across multi-scale applications via the use of Workflow tools

Towards a Services Oriented Architecture Applications are wrapped as services Provide transparent execution on Grid resources Users are free to use clients of their choice Multiple standards-based security alternatives to choose from Services exchange strongly typed data defined using XML schemas Aids in the creation of complex workflows

Talk Outline Motivation for a Services Oriented Architecture Overall end-to-end architecture Technical Details and Challenges Sample User Interfaces Status and Evaluation Conclusions

Architecture Overview Gemstone PMV Informnet State Mgmt Application Services Security Services (GAMA) Globus Globus Globus Condor pool SGE Cluster PBS Cluster

Technical Details and Challenges Application Services Data Typing and Operations State Management Scheduling Security

Data Typing Issues ATOM 1 CA ALA 403 302.952 172.086 61.234 0.070 2.275 ATOM 2 CA PHE 404 300.425 169.247 61.082 0.070 2.275 ATOM 3 CA VAL 405 303.060 166.741 60.099 0.070 2.275 ATOM 4 CA HSD 406 302.667 164.496 63.103 0.070 2.275 ATOM 5 CA TRP 407 299.201 163.824 61.775 0.070 2.275 ATOM 6 CA VAL 409 303.586 160.385 60.021 0.070 2.275 ATOM 7 CA GLY 410 301.476 158.708 62.646-0.020 2.275 ATOM 8 CA GLU 3 280.442 157.436 68.520 0.070 2.275 ATOM 9 CA ILE 4 281.745 159.036 71.747 0.070 2.275 ATOM 10 CA VAL 5 281.352 158.451 75.502 0.070 2.275 ATOM 11 CA HSD 6 281.014 161.480 77.762 0.070 2.275 ATOM 12 CA ILE 7 282.779 161.559 81.125 0.070 2.275 ATOM 13 CA GLN 8 281.483 163.609 84.125 0.070 2.275 ATOM 14 CA ALA 9 284.564 163.755 86.377 0.070 2.275 ATOM 15 CA GLY 10 284.153 165.418 89.764-0.020 2.275 ATOM 16 CA GLN 11 281.233 167.582 90.892 0.070 2.275 ATOM 17 CA CYS 12 281.680 170.431 88.465 0.070 2.275 ATOM 18 CA GLY 13 282.299 168.195 85.512-0.020 2.275 ATOM 19 CA ASN 14 279.242 166.471 86.869 0.070 2.275 ATOM 20 CA GLN 15 277.035 169.516 87.517 0.070 2.275 ATOM 21 CA ILE 16 277.798 170.436 83.911 0.070 2.275 ATOM 22 CA GLY 17 276.950 166.924 82.798-0.020 2.275 ATOM 23 CA ALA 18 273.419 167.698 83.872 0.070 2.275 ATOM 24 CA LYS 19 273.486 170.809 81.708 0.070 2.275 ATOM 25 CA PHE 20 275.293 169.251 78.771 0.070 2.275 ATOM 26 CA TRP 21 272.734 166.498 78.760 0.070 2.275 ATOM 27 CA TYR 52 275.692 159.625 72.557 0.070 2.275 ATOM 28 CA PRO 63 273.156 158.690 77.534 0.020 2.275 ATOM 29 CA ARG 64 276.471 156.904 77.238 0.070 2.275 ATOM 30 CA ALA 65 278.499 158.910 79.729 0.070 2.275 ATOM 31 CA ILE 66 280.286 158.199 82.986 0.070 2.275 Lack of common strongly defined data types All applications use custom filebased formats Application integration via third party tools extremely difficult to perform Need application specific transformation scripts

Application Services Requests and responses are strongly typed Use of XML Schemas to define data structures passed around Application functionality exposed as science oriented WSDL operations Available services: APBS, GAMESS, QMView, LigPrep Implementation details Services wrap scientific codes - no (or minimal) modification required to these codes Software tools used - Apache Axis, Jakarta Tomcat

Electrostatic Potential Input Molecule Calculation Parameters Grid Parameters Parallel Parameters Physical Parameters Molecule Name Molecule Radius Atoms [ ] Equation Type Boundary Conditions Charge Discretization Surface Coefficients Energy Output Type Force Output Type Output Types No. of Grid points: {x, y, z} Fine & Coarse Grids Overlap Factor Processor Grid: {x, y, z} SASA Coefficient Solvent Temperature System Temperature Protein Dielectric Solvent Dielectric Ions [ ] Field Name Atom Name Atom Number Atomic Number Element Name Residue Name Residue Number Coordinates Atomic Charge Atomic Radius Symmetry Unique Length: {x, y, z} Center: {x, y, z} Charge Concentration Radius

Unnatural Ligand GAMESS Ligand Prepared Ligand Workflows and Strong Data Typing LigPrep Complex APBS Protein + Natural Ligand Protein PDB2PQR Prepared Protein QMView QMView Ligand-Protein Interaction Baldridge, Greenberg, Amoreira, Kondric GAMESS Service More accurate Ligand Information via GAMESS-XML LigPrep Service Generation of Conformational Spaces PDB2PQR Service Protein preparation APBS Service Generation of electrostatic information QMView Service Visualization of electrostatic potential file Applications: Electrostatics and docking High-throughput processing of ligandprotein interaction studies Use of small molecules (ligands) to turn on or off a protein function

Service Operations Operations can be invoked synchronously, or asynchronously Synchronous Operations: Block until the operation is finished Outputs returned as a response to initial request Suitable for short jobs Asynchronous operations: Return immediately with a jobid Can query for job status and outputs using the jobid Suitable for long running jobs

State Management Application services are stateful Metadata about job inputs and outputs Job status for asynchronous jobs Job history Use of a database for storing/retrieving service state Access to PostgreSQL database via JDBC Future Work: Web Service Resource Framework (WSRF) integration

Scheduling Globus Gatekeeper 1. Job request using RSL via Globus CoG Kit Compute Resources Application Service 4. Scheduler independent job handle 2 3 Scheduler-specific Globus Job-manager

Security GSI-based transport level (SSL) authentication Use of Java CoG libraries and Tomcat to provide a secure socket connection Simple grid-map based authorization provided as an Axis Handler Every Axis request passes through a chain of handlers before the target service is invoked The grid-map Authorization Handler verifies if the client is authorized to access the service by looking up the grid-map using the Client s Distinguished Name (DN). Future Work: SAML-based authorization techniques

Certificate Management: GAMA gama gridportlets DB GridSphere Servlet container Java keystore Portal server 1 Portal server 2 Stand-alone applications create user import user retrieve credential retrieve credential AXIS Web Services wrapper CACL MyProxy CAS Servlet container Java keystore GAMA server

User Interfaces Web services are language and platform independent Can be accessed via a multitude of clients Java Gridsphere-based Web portals Workflow tools: Kepler, Informnet, etc. Python Python Molecular Viewer (PMV) Workflow tools: Vision JavaScript Gemstone: Mozilla-based Web services front-end

Gemstone Registry of Services Data Repository Dynamic User Interface

Initial Evaluation Commodity SOAP toolkits not the most ideal for transferring large inputs and outputs XML representation of molecule data (originally in PQR format) approximately an order of magnitude larger Larger transfer times Axis de-serialization very expensive for large inputs Large memory footprint Very time consuming However, Web service overhead is still at least an order of magnitude smaller than actual execution times

SOAP Performance: Alternatives Parsing techniques Streaming Pull-based Binary XML More compact representation of data More efficient data transport and parsing Smaller memory footprint Data Format Description Language (DFDL) Definition of structure of binary and character files Files transferred in their native formats Smaller sizes, hence faster transfer

Summary An end-to-end infrastructure for Grid-enabling biomedical applications that provides: Remote execution on Grid resources Access to schedulers State management Concurrent access via disparate interfaces Standards-based security Ability to use workflow tools for coupling multiscale biomedical applications

Status, Software and Demos Application services: http://nbcr.net/services Alpha version of APBS service available for download and testing Opal, a toolkit for wrapping legacy scientific applications available for alpha testing GAMESS, QMView, LigPrep services available soon Gemstone: http://grid-devel.sdsc.edu/gemstone GAMA: http://grid-devel.sdsc.edu/gama Demos at SC2005 NCRR Booth (656): 11/14 7-8PM, SDSC Booth (1838): 11/15 5-6PM 11/15 12-2PM, 11/17 12-2PM

Appendix

Computational Infrastructure for Multiscale Modeling Set of Biomedical Applications QMView GAMESS APBS Autodock Rich Clients PMV ADT APBSCommand Vision Infrastructure Continuity Gtomo2 TxBR Web Portals Continuity Telescience Portal Computational Grid Web Services Workflow Middleware

Sample Service: APBS Operations provided: calculatebindingenergy calculatesolvationenergy calculateelectrostaticpotential Operations accept and return strongly typed parameters in XML format Described by an XML Schema Data binding provided by stub generators in various languages WSDL2Java provided by Apache Axis WSDL2PY provided by Python ZSI

PMV APBS Client: Michel Sanner, et al Calculation Parameters Molecule Data URL of APBS Web service Invoke remote Web service

(Incomplete) Acknowledgements Phil Papadopoulos Steve Mock Kurt Mueller Sandeep Chandra Nadya Williams Peter Arzberger Wilfred Li Robert Konecny Michel Sanner Wibke Sudholt APBS Team