Grid and Component Technologies in Physics Applications

Similar documents
GRID AND COMPONENT TECHNOLOGIES IN PHYSICS APPLICATIONS

Remote Visualization, Analysis and other things

Overview Interactive Data Language Design of parallel IDL on a grid Design of IDL clients for Web/Grid Service Status Conclusions

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Distributed systems. Distributed Systems Architectures. System types. Objectives. Distributed system characteristics.

Sriram Krishnan

Enabling a SuperFacility with Software Defined Networking

The National Fusion Collaboratory

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net

Distributed Systems Architectures. Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 12 Slide 1

Extreme I/O Scaling with HDF5

JAVA COURSES. Empowering Innovation. DN InfoTech Pvt. Ltd. H-151, Sector 63, Noida, UP

CAS 703 Software Design

Geoffrey Fox Community Grids Laboratory Indiana University

Appendix A - Glossary(of OO software term s)

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

The ITAPS Mesh Interface

Grid Middleware and Globus Toolkit Architecture

Research and Design Application Platform of Service Grid Based on WSRF

Introduction to Grid Computing

Service Oriented Architectures Visions Concepts Reality

Opal: Wrapping Scientific Applications as Web Services

Agent-Enabling Transformation of E-Commerce Portals with Web Services

High Throughput WAN Data Transfer with Hadoop-based Storage

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Center for Scalable Application Development Software: Application Engagement. Ewing Lusk (ANL) Gabriel Marin (Rice)

USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES. Anna Malinova, Snezhana Gocheva-Ilieva

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Chapter 18 Distributed Systems and Web Services

Grid Service for Visualization and Analysis of Remote Fusion Data

Supporting Application- Tailored Grid File System Sessions with WSRF-Based Services

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Chapter 1: Distributed Information Systems

SOFTWARE ARCHITECTURES ARCHITECTURAL STYLES SCALING UP PERFORMANCE

FuncX: A Function Serving Platform for HPC. Ryan Chard 28 Jan 2019

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

Programming with MPI

DS 2009: middleware. David Evans

OpenACC Course. Office Hour #2 Q&A

A QoS-aware CCM for DRE System Development

Sistemi ICT per il Business Networking

System types. Distributed systems

Distributed Systems Theory 4. Remote Procedure Call. October 17, 2008

CA464 Distributed Programming

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

IOS: A Middleware for Decentralized Distributed Computing

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Verteilte Systeme (Distributed Systems)

Service-Oriented Architecture

What is it? What does it do?

Presented By: Ian Kelley

Lecture 15: Frameworks for Application-layer Communications

Understanding StoRM: from introduction to internals

Lecture 15: Frameworks for Application-layer Communications

C exam. IBM C IBM WebSphere Application Server Developer Tools V8.5 with Liberty Profile. Version: 1.

RedGRID: Related Works

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Two Phase Commit Protocol. Distributed Systems. Remote Procedure Calls (RPC) Network & Distributed Operating Systems. Network OS.

Data Transfer and Sharing within Web Service Workflows

1.264 Lecture 16. Legacy Middleware

Chapter 4 Communication

Kepler Scientific Workflow and Climate Modeling

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski

The MOSIX Scalable Cluster Computing for Linux. mosix.org

Grid Computing with Voyager

Unit 5: Distributed, Real-Time, and Multimedia Systems

An Overview of CSDMS, the Community Surface Dynamics Modeling System and the Earth System Bridge Project

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Communication. Distributed Systems Santa Clara University 2016

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Design The way components fit together

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Semantic SOA - Realization of the Adaptive Services Grid

Functional Requirements for Grid Oriented Optical Networks

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

Distributed Systems Principles and Paradigms

Scibox: Online Sharing of Scientific Data via the Cloud

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

A Survey Paper on Grid Information Systems

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

A Grid-Enabled Component Container for CORBA Lightweight Components

Distributed Data Management with Storage Resource Broker in the UK

Web Services for Visualization

ebusiness Suite goes SOA

Parallel I/O and Portable Data Formats I/O strategies

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Introduction of PDE.Mart

TOPLink for WebLogic. Whitepaper. The Challenge: The Solution:

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

General Plasma Physics

N. Marusov, I. Semenov

Index Introduction Setting up an account Searching and accessing Download Advanced features

Edinburgh Research Explorer

Transcription:

Grid and Component Technologies in Physics Applications Svetlana Shasharina Nanbor Wang, Stefan Muszala and Roopa Pundaleeka. sveta@txcorp.com Tech-X Corporation Boulder, CO ICALEPCS07, October 15, 2007

Outline - our experience in bridging computer science and physics Company Intro Grids Fusion Grid Service Components and CCA Babel Components in Fusion Simulation Project Grids and Components for Physics 2 ICALEPCS 2007

Disclaimer All definitions are not strict and rather personal Opinions are based on our experience in bringing Grids and Components into physics applications Check with the official web sites and papers for rigorous definitions of the technologies and their success stories Grids and Components for Physics 3 ICALEPCS 2007

Tech-X Corporation (www.txcorp.com) bridges physics and computer science Founded in 1994, now ~50 people Accelerator physics (traditional and advanced) Fusion theory Plasma modeling Computer Science C++, Java, Python Grids, Globus Web Services CORBA XML Visualization Data management Workflows Components Mission-critical systems (Real Time and Fault Tolerant apps) Small Business Innovation Research (DOE, NASA, DOD) SciDAC program Lead in FACETS Significant in COMPASS TASCS (CCA) Grids and Components for Physics 4 ICALEPCS 2007

Grids are aggregated and secured resources +??? Grids are computers connected by network (WAN, otherwise we deal with cluster or super computing - not our subject). This aggregation of resources is used by a certain group (Virtual Organization) so that more resources are available to each member and the outsiders are separated. Needs for certificates (authentication and authorization) If this aggregation is just a sum of constituents, it is not valuable Why do I need to install Globus, learn how to use it, get and update my certificate if I can login to any machine, run a job and scp my data? Grid should add value, hence one needs middleware that adds value by providing default services and allows adding new ones Grids and Components for Physics 5 ICALEPCS 2007

Services make Grids Services do anything not trivial on distributed resources Some are provided by Grid software (for example, file transfer and job managing services of Globus) Some can be built using Grid software (example is coming) Services are implementations of abstractions so that the resources are virtualized Data represented by an attribute (give me all files with current larger than x) or computers represented by baselines (run my job anywhere as long as the node is configured right) Services can: Save storage (no need to replicate data) Save travel funds (no need to travel to access resources) Allow new types of research (global data and global network analysis and optimization) Create complex data access and distribution system needed for large experiments like LHC and ITER Grid - many resources for many users securely using for something non-trivial Grids and Components for Physics 6 ICALEPCS 2007

Fusion Grid Service (H5WS) is an example of an service built using Globus to provide services for fusion community Motivation Data is generated remotely but needs to be analyzed locally Only subsets of data are needed HDF5 is popular (file format and API) but works on local files Hence the goal is a service With methods mimicking HDF5 API Bringing data into client memory with optional caching Efficient data transfers (not SOAP) Desired API Query dataset for its metadata (rank, dimensions and type) Bring a dataset Bring a hyberslab (a cube with specified stride) Bring a whole file Globus (4.0.2) was chosen as technology GridFTP C bindings (for using with HDF5 C API) Political favorite at the time Grids and Components for Physics 7 ICALEPCS 2007

Start service development by defining interface (types and functions) in WSDL User-defined data type for dataset metadata Using integers (for type and rank) and sequence types (for dimensions) Get metadata Using WSDL native types and our user-defined type define a functions Metadata is obtained through a SOAP call Need something else (not SOAP) for data itself so the method (in the client wrapper) is composite: Treat all data as 1D array (reshape as needed) Ask for the metadata Extract data on the server side into a file (do not know how to do from-memory-to-memory transfers) Allocate the memory on the client Use GridFTP C API to dump the file into the client memory Grids and Components for Physics 8 ICALEPCS 2007

H5WS has C++ client wrapper for abstraction and delegation to GridFTP and server delegating to HDF5 API Grids and Components for Physics 9 ICALEPCS 2007

We compared performance with other technologies Our H5WS (based on Globus Web Service with GridFTP) CORBA (TAO-1.5.2) Allows binary data type (octet) C++ bindings gsoap-2.7 Allows DIME attachments C++ bindings Performance tests: Establish connection Get metadata Allocate memory Extract dataset (with extra step of temporary file for Globus) Close connection Grids and Components for Physics 10 ICALEPCS 2007

Three test settings of typical use LAN - 2 Tech-X Linus boxes (grid and storage2) ESnet - NERSC (davinci) and PPPL (grid0) WAN - NERSC to Tech-X Setup Bottleneck bandwidth (Mbyte/sec) RTT (msec) BDP LAN 12.5 0.27 3.4 KB ESnet 125.0 72.0 9.0 MB WAN 0.19 162.0 31.4 KB RTT - round trip time BDP - bandwidth delay product defines how much data is in the pipe Grids and Components for Physics 11 ICALEPCS 2007

In LAN, H5WS is too expensive (security and temp file), CORBA saturates connection, gsoap is in between 15 Data Access Throughput over LAN Throughput (MB/s) 10 5 0 CORBA gsoap H5WS 0 5 10 15 20 25 30 35 40 45 Data Size (MB) Grids and Components for Physics 12 ICALEPCS 2007

In ESNet H5WS wins with 2 streams, all technologies do not reach the bottleneck (need larger buffers) 1.00 Data Access Throughput over ESNET Throughput (MB/s) 0.80 0.60 0.40 0.20 0.00 0 5 10 15 20 25 30 35 40 45 Data Size (MB) H5WS/P1 CORBA gso ap H5WS/P2 Grids and Components for Physics 13 ICALEPCS 2007

Many streams of H5WS work well for wide long pipe (ESNet) Grids and Components for Physics 14 ICALEPCS 2007

H5WS wins in WAN and is close to bottleneck, CORBA and gsoap underperform 0.20 Data Access Throughput over WAN Throughput (MB/s) 0.16 0.12 0.08 0.04 0.00 H5WS CORBA gsoap 0 5 10 15 20 25 30 35 40 45 Data Size (MB) Grids and Components for Physics 15 ICALEPCS 2007

Conclusions-1 Using Globus C WS with GridFTP gave the best overall performance (LAN is not important) but Imposition of their build system and directory structure Hacking to create Makefiles to add INCLUDES and LIBS Putting implementation into the generated directory (or one can use dlopen with implementations in shared libs). What should be in the repo then? Had to ask for a patch that would allow use of service by people who did not deployed it Debugging nightmare (you compile first, then log in as globus user and deploy; might deploy or not, might not run - start over as you) Running all services in one container You might find other solutions that are: Lighter (gsoap, glite,?) Easier to develop (?) Have better support for scientific data types than WSDL (CORBA and CCA/Babel/RMI) Grids and Components for Physics 16 ICALEPCS 2007

Components: compose, reuse and interchange parts Interact through interfaces (encapsulation) Separate implementations from interfaces (can swap implementation) Can be deployed independently Distinguish between two kinds of interfaces Provides-port (part of the interface that describes what is offered and can be used by other components) Uses-port (part of the interface describing what is needed) Provides and Uses ports connect to compose an executable This requirement separates components from objects (which can support first two requirements through good design, abstract classes and inheritance, can be deployed as libraries, use pointers to connect) Grids and Components for Physics 17 ICALEPCS 2007

n-linear diffusion equation can be solved using three components "#(x,t) "t = " % "# ( ' D($,x) * "x & "x ) Driver holds time and current value of the field and goes through cycles (starts by being called by the framework to Go). It asks Solver to Advance. Solver knows how to Advance the field (given time step and diffusion coefficient) but needs Transport to EvalTransort (calculate D) Transport model knows how to EvalTransort Grids and Components for Physics 18 ICALEPCS 2007

Using three components makes sense if you need to solve many diffusion equations One can have many components with the same ports but different implementation Work with multiple solvers (different discretization schemes, different implicitness) Use multiple diffusion models (glf23, mmm95 for anomalous transport in tokamaks, for example) Swap components without disturbing all of them as the interface and obtain many combinations Grids and Components for Physics 19 ICALEPCS 2007

Scientific applications have more requirements Support for multiple languages (integrated modeling requires combining multiple legacy codes in one executable). Fortran, specifically! Support for High Performance Computing (everything is petascale nowadays) Remote invocation (separation of remote HPC part from local data analysis and visualization) Grids and Components for Physics 20 ICALEPCS 2007

Commercial components do not satisfy additional criteria CORBA supports most of the requirements but HPC multilanguage implementations of CCM (CORBA Component Model) - only Java and C++. Fortran even for regular CORBA objects. Java Beans language specific no HPC COM is platform specific (Microsoft) Plus natural mistrust of scientists to commercial products Grids and Components for Physics 21 ICALEPCS 2007

Scientific community started their own frameworks ESMF (Earth Systems Modeling Framework) Domain specific C++ two types of interfaces SWMF (Space Weather Modeling Framework) Domain specific F90 two types of interfaces CCA (Common Component Architecture) TASCS SciDAC Satisfies all required and desired criteria Grids and Components for Physics 22 ICALEPCS 2007

CCA is the winner - supports all possible features ESMF SWMF CCA CCM Interface Exclusively In and Out Interfaces Multiple Interfaces Events ~ Asynchronous Calls Attributes Internal State and Multiple Instances Language Neutrality Separation of Implementation Distributed Components High-Performance Support Grids and Components for Physics 23 ICALEPCS 2007

Babel tool can be useful in mixed language applications Part of CCA Bindings for C, C++, Java, Python and Fortran Great support from Babel developers Uses Scientific Interface Definition Language (SIDL) for describing interfaces SIDL is valuable for language independent description of interfaces even if you do not generate binding SIDL is similar to C++ but adds in, inout, out qualifiers for arguments (similar to CORBA IDL) : package newpackage version 1.0{ class newclass{ void dowork(inout double varx ); } }; Grids and Components for Physics 24 ICALEPCS 2007

Babel is a key tool for multilanguage interoperability and location transparency Implementation in F compiled as a library: recursive subroutine newpackage_newclass_dowork_mi (self, varx, exception)!implementation goes here end subroutine newpackage_newclass_dowork_mi Called from C++ as: main(){ newpackage::newclass B = newpackage::newclass::_create( ); B.doWork(5.); //Done by Fortran } Called from Java on a remote machine: public static void main (String args []) { Sidl.rmi.ProtocolFactory.addProtocol ("simplehandle", "simple.rmi.simhandle"); newpackage.newclass obj = newpackage.newclass.connect ("simhandle://hostname:9000/1000"); obj.dowork (100.0);} Grids and Components for Physics 25 ICALEPCS 2007

Fusion Simulation Project aims for whole device modeling and uses ~components Core and Edge - FACETS (Framework Application for Core-Edge Transport Simulations) MHD and RF - SWIM (Simulation of rf Wave Interaction with Magnetohydrodynamics) Edge - CPES (Center for Plasma Edge Simulations) Look at components but use very loose definition of those SWIM has Python wrappers for codes FACETS has C++ wrappers and uses Babel to call out to Fortran transport, edge and wall codes CPES uses Kepler (loosely coupled Java objects) to describe workflows Europeans (ITM - ITER Tokamak Moleling): XML for interfaces Kepler for orchestration Grids and Components for Physics 26 ICALEPCS 2007

FACETS tested Babel before committing and confirmed good performance Calling F from C Grids and Components for Physics 27 ICALEPCS 2007

Transport modules are wrapped into SIDL in FACETS C++ Class Interfaces C++ Methods C++ Methods BABEL BABEL Fortran WallPSI Fortran FMCFM Interfaces w 1,w 2,...,w N Interfaces f 1,f 2,...,f M mmm95 glf23 tglf Grids and Components for Physics 28 ICALEPCS 2007

Conclusions-2 Many ways to define components, many levels of commitment to component model CCA is the most promising framework Migration to CCA is not easy Define and standardize interfaces using the codes native languages Express interfaces in SIDL Can now use in multilanguage applications and might stop here, or Define Uses and Provides ports using SIDL Compose using CCA tools Ready to go! Migration to CCA will be easier soon using Bocca (no need to know SIDL in details) For workflows you might want to look at Kepler. Grids and Components for Physics 29 ICALEPCS 2007