Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net

Similar documents
Public Resource Distributed Modelling. Dave Stainforth, Oxford University. MISU, Stockholm 8 th March 2006

Shankersinh Vaghela Bapu Institue of Technology

Geoffrey Fox Community Grids Laboratory Indiana University

Web Programming Paper Solution (Chapter wise)

Second OMG Workshop on Web Services Modeling. Easy Development of Scalable Web Services Based on Model-Driven Process Management

XML Web Service? A programmable component Provides a particular function for an application Can be published, located, and invoked across the Web

Security Principles for Public-Resource Modeling Research

Computational Web Portals. Tomasz Haupt Mississippi State University

Edinburgh Research Explorer

Web-APIs. Examples Consumer Technology Cross-Domain communication Provider Technology

Global Servers. The new masters

04 Webservices. Web APIs REST Coulouris. Roy Fielding, Aphrodite, chp.9. Chp 5/6

Exercise SBPM Session-4 : Web Services

DOC // JAVA TOMCAT WEB SERVICES TUTORIAL EBOOK

USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES. Anna Malinova, Snezhana Gocheva-Ilieva


Overview. Communication types and role of Middleware Remote Procedure Call (RPC) Message Oriented Communication Multicasting 2/36

02267: Software Development of Web Services

Chapter 4 Communication

DS 2009: middleware. David Evans

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Usage of the Astro Runtime

REST Web Services Objektumorientált szoftvertervezés Object-oriented software design

CPET 581 E-Commerce & Business Technologies. Topics

Servlet Performance and Apache JServ

Web Services Amazon Ecommerce Service, Comparison with other Web Services

Service Oriented Architectures Visions Concepts Reality

Web Engineering (CC 552)

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Java Applets, etc. Instructor: Dmitri A. Gusev. Fall Lecture 25, December 5, CS 502: Computers and Communications Technology

Enterprise Java Unit 1- Chapter 3 Prof. Sujata Rizal Introduction to Servlets

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013

A short introduction to Web Services

Copyright 2014 Blue Net Corporation. All rights reserved

Introduction. Software Trends. Topics for Discussion. Grid Technology. GridForce:

Remote Procedure Calling

Presented By: Ian Kelley

Java and i. A Salesforce Recipe: Integrating Java and RPGLE

Index Introduction Setting up an account Searching and accessing Download Advanced features

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

The CEDA Web Processing Service for rapid deployment of earth system data services

3C05 - Advanced Software Engineering Thursday, April 29, 2004

ReST 2000 Roy Fielding W3C

StratusLab Cloud Distribution Installation. Charles Loomis (CNRS/LAL) 3 July 2014

Adaptation of Web service architecture in distributed embedded systems

AIM Enterprise Platform Software IBM z/transaction Processing Facility Enterprise Edition 1.1.0

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

SSC - Web applications and development Introduction and Java Servlet (I)

Introduction to Web Services & SOA

13. Databases on the Web

WWW, REST, and Web Services

Develop Mobile Front Ends Using Mobile Application Framework A - 2

The EC Presenting a multi-terabyte dataset MWF via ER the web

Incorporating applications to a Service Oriented Architecture

How A Website Works. - Shobha

Christian Benjamin Ries 1 and Christian Schröder 1. Wilhelm-Bertelsmann-Straße 10, Bielefeld, Germany. 1. Introduction

Traditional Web Based Systems

A RESTful Approach to the Management of Cloud Infrastructure. Swit Phuvipadawat Murata Laboratory

AWS Lambda: Event-driven Code in the Cloud

REST Easy with Infrared360

Outline. S: past, present and future Some thoughts. The 80s. Interfaces - 60s & 70s. Duncan Temple Lang Department of Statistics UC Davis

Chapter 17 Web Services Additional Topics

Research and Design Application Platform of Service Grid Based on WSRF

The Challenge of Volunteer Computing With Lengthy Climate Model Simulations

Introduction to Web Services & SOA

Developing Web Applications

PROVENANCE Contract Number: Enabling and Supporting Provenance in Grids for Complex Problems. Title: Functional Prototype

The Umbilical Cord And Alphabet Soup

Introduction to Worklight Integration IBM Corporation

Data exchange using EDAMIS. Item 2.1 of the agenda SISAI-Metadata Working Group meeting 2015

Web Services and SOA. The OWASP Foundation Laurent PETROQUE. System Engineer, F5 Networks

IST 220: Application Layer

The COLDEX Metadata Synchronisation Service (MSS) and other services

Web System and Technologies (Objective + Subjective)

NonStop as part of a modern state of the art IT Infrastructure

Lecture 1: January 22

Avid Interplay Production Web Services Version 2.0

Flash: an efficient and portable web server

2001 Sun Microsystems

Copyright 2016 Pivotal. All rights reserved. Cloud Native Design. Includes 12 Factor Apps

Oracle9i Application Server Architecture and Com

Seminar report Google App Engine Submitted in partial fulfillment of the requirement for the award of degree Of CSE

Cloud Computing & Visualization

Introduction to Grid Computing

The Astro Runtime. for data access. Noel Winstanley Jodrell Bank, AstroGrid. with the part of Noel played by John Taylor, IfA Edinburgh/AstroGrid

Distribution and web services

Opal: Wrapping Scientific Applications as Web Services

Application of TRIE data structure and corresponding associative algorithms for process optimization in GRID environment

THUR 9:00 AM UTILIZING BPM FOR MODERNIZATION

Leveraging Web Services Application Integration. David S. Linthicum CTO Mercator

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Notes. Submit homework on Blackboard The first homework deadline is the end of Sunday, Feb 11 th. Final slides have 'Spring 2018' in chapter title

Today: World Wide Web! Traditional Web-Based Systems!

Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough

Distributed Systems Principles and Paradigms

Using Data Science to deliver Workforce & Labour Market Insights. Gary Gan Co-Founder, JobKred

CICS TS V4.2 - Connectivity

Today: World Wide Web

Transcription:

Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net Neil Massey 1 neil.massey@comlab.ox.ac.uk Tolu Aina 2, Myles Allen 2, Carl Christensen 1, David Frame 2, Daniel Goodman 1, Jamie Kettleborough 3, Andrew Martin 1, Stephen Pascoe 3, David Stainforth 2 1: Computing Laboratory, 2: Department of Physics, 3: Rutherford Appleton Laboratory EGU 2005, 24 th to 29 th April, Vienna.

Contents Overview of project Analysing data over distributed data nodes Using web services to access and analyse data Notification of results in an asynchronous workflow Using web services to wrapper legacy code Presenting interfaces to data

climateprediction.net The Goals Harness the power of idle PCs to help quantify uncertainty in predictions of the 21st century climate To improve public understanding of the nature of uncertainty in climate prediction The Method Invite the public to download a full resolution, 3D climate model and run it locally on each PC Use each PC to run a single member of a massive, perturbed physics ensemble Provide visualisation software and educational packages to maintain interest and facilitate school and undergraduate projects etc.

Distribution of client users...... ~100,000 volunteers, 130 countries

Distribution of data nodes ~100,000 volunteers, 130 countries 5 data nodes

Network

The client Public donated computing time running under the BOINC infrastructure Clients download a full GCM, integrate for 45 years, ~2 to 6 weeks Clients can track progess with a visualisation program Produce ~600MB of data of which ~12MB is uploaded directly to a data node Also upload meta-data to a central database

The client

Data nodes Rely on donated server space Data is federated across the nodes Do not want end users to FTP the raw data Instead: Provide a secure, robust, efficient, scalable environment for data discovery and analysis Transparently provide access to the meta-data and data simultaneously Allow multiple interfaces to access the data

Distributed analysis of data Design problems / goals: Data set is federated across data nodes Only one copy of the data set Data set is large Servers are donated, potentially no root access Analysis are computationally expensive, may take several days Design decisions: Copy data set instead of moving Just in time data access Distributed analysis Service oriented architecture Asynchronous service request, delivery Web services for infrastructure

Web services to analyse data Why web services? Lightweight way to build grid like infrastructure Open, standardised protocols XML RPC without firewall problems Security capabilities present in the software stack Lots of industry input (Microsoft, IBM, Sun, etc.) Momentum in the UK academic community (WSRF, OMII) Technology Java Virtual Machine Apache HTTP server, Apache Tomcat, Apache Axis ~30MB install, not including JVM

Composing analysis Decompose analysis into component parts Example: computing normalised differences» Collect data as URIs by querying database» Compute the difference» Normalise the difference Present an API to scientists to allow the composition Popular components as modules Method to chain them together Incorporate data and meta-data Allows analysis to be distributed Must move and copy data between the parts

Moving and copying data Data too large to move around as SOAP message Move data via URI reference to data Move data only when it is needed Analysis will produce intermediate results Store these in a transitory manner Copy persistent data but move intermediate data Present results to scientists in the same transitory store Garbage collection When to collect How big a cache?

Notification of results Results could take a long time to generate Architecture is asynchronous Several options for notifying that computation is complete: Email Dynamically changing web site RSS feed Results are then available via FTP web service

Wrappering legacy code in web services Many legacy applications that can be used for analysis Too much effort to re-implement them in WS Most applications can be wrappered C++, FORTRAN, Python, Perl, even command line tools SOAP libraries exist for many languages However, full WS stack does not This includes security And WSDL generation Example: using a CDAT function as a web service

Wrappering legacy code: localhost web service If SOAP libraries exist for the legacy language: Write a simple Java interface to the function Write the actual function in the legacy language, wrappering it within SOAP function calls The Java interface, calls the legacy function using SOAP, over localhost and using a designated port Secure port using iptables Advantages: Easy to implement Fits web services model Disadvantages: Have to dump the data to the file system Have to write serializers in both languages Cannot use with command line tools

Wrappering legacy code: Java Native Interface JNI allows running of native code To run a Python program:» Embed the Python interpreter in a C program» Import and run the Python from the embedded interpreter» Write a Java wrapper» Use JNI to generate the glue Advantages: Don t need to dump data: can just stream or pass a pointer! Can run command line tool Can use externals Disadvantages: Tricky to get right Have to ensure data is in compatible format for each language

Connecting Interfaces Different interfaces for different levels of users Scientists School children Custom software Write the web services first: Avoids replication Almost anything can connect to them (Java, Python, C++, PHP 5, Perl, etc.) Increases maintainability: one common API Increases flexibility of web service API

Connecting interfaces

Connecting Interfaces Example Simple web portal to query parameter combinations from the database <url masked for web presentation> Workflow: User connects to portal with browser. Python server page generates dynamic content User selects parameters, clicks submit query Form data is sent via HTTP POST to CGI Python CGI script CGI script processes POST data, creates webservice call Web service does processing and returns SOAP to the CGI script CGI script caches results and creates an ID for the cache CGI script calls another PSP page with returned data PSP script presents data to the user as HTML User selects List Runs HTML generated by PSP calls another PSP script with cache ID PSP generates HTML to present to user

References climateprediction.net: http://www.climateprediction.net Public Computing: Reconnecting People to Science: Dr. David P. Anderson, http://boinc.berkeley.edu/madrid.html BOINC: A System for Public-Resource Computing and Storage Dr. David P. Anderson, http://boinc.berkeley.edu/grid_paper_04.pdf OMII: http://www.omii.ac.uk/mission.htm Java Native Interface: http://java.sun.com/docs/books/tutorial/native1.1/ Python http://www.python.org neil.massey@comlab.ox.ac.uk