A Component Framework for HPC Applications

Similar documents
ICENI: An Open Grid Service Architecture Implemented with Jini Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington

A Grid-Enabled Component Container for CORBA Lightweight Components

Evaluating the Performance of Skeleton-Based High Level Parallel Programs

Metadata for Component Optimisation

Workflow, Planning and Performance Information, information, information Dr Andrew Stephen M c Gough

Offloading Java to Graphics Processors

Creating and Running Mobile Agents with XJ DOME

g-eclipse A Contextualised Framework for Grid Users, Grid Resource Providers and Grid Application Developers

Reflective Java and A Reflective Component-Based Transaction Architecture

Evaluation of Parallel Programs by Measurement of Its Granularity

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

Web Applications: A Simple Pluggable Architecture for Business Rich Clients

Delivering Data Management for Engineers on the Grid 1

Visual Modeler for Grid Modeling and Simulation (GridSim) Toolkit

Developing Software Applications Using Middleware Infrastructure: Role Based and Coordination Component Framework Approach

Performance Cockpit: An Extensible GUI Platform for Performance Tools

New Optimal Load Allocation for Scheduling Divisible Data Grid Applications

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance

MSF: A Workflow Service Infrastructure for Computational Grid Environments

AssistConf: a Grid configuration tool for the ASSIST parallel programming environment

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Using Everest Platform for Teaching Parallel and Distributed Computing

Swing Based Remote GUI Emulation

Recording Actor State in Scientific Workflows

Triangle Strip Multiresolution Modelling Using Sorted Edges

SUPPORTING EFFICIENT EXECUTION OF MANY-TASK APPLICATIONS WITH EVEREST

Verification and Validation of X-Sim: A Trace-Based Simulator

An Engineering Computation Oriented Visual Grid Framework

APM. Object Monitor. Object Lab. Richard Hayton & Scarlet Schwiderski

ICENI Dataflow and Workflow: Composition and Scheduling in Space and Time

Checkpoint. Object. Object Stub. Proxy. Request. Request. Proxy

Challenges in component based programming. Lena Buffoni

A Planning-Based Approach for the Automated Configuration of the Enterprise Service Bus

Component-based Architecture Buy, don t build Fred Broks

SABLE: Agent Support for the Consolidation of Enterprise-Wide Data- Oriented Simulations

University At Buffalo COURSE OUTLINE. A. Course Title: CSE 487/587 Information Structures

On-Line Monitoring of Multi-Area Power Systems in Distributed Environment

Scalable Hybrid Search on Distributed Databases

Storage Formats for Sparse Matrices in Java

Towards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Towards the Performance Visualization of Web-Service Based Applications

AALib::Framework concepts

Top-down definition of Network Centric Operating System features

Get Started on SOA. People Entry Point Interaction and Collaboration Services. Case for an SOA Portal

Introduction of PDE.Mart

A Model for Scientific Computing Platform

Using JBI for Service-Oriented Integration (SOI)

GUI framework communication via the WWW

Design and Implementation of XML DBMS Based on Generic Data Model * 1


Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

IBM Research Report. A Web-Services-Based Deployment Framework in Grid Computing Environment

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

Synthesizing Communication Middleware from Explicit Connectors in Component Based Distributed Architectures

An Object Model for Multiparadigm

Stefan WAGNER *, Michael AFFENZELLER * HEURISTICLAB GRID A FLEXIBLE AND EXTENSIBLE ENVIRONMENT 1. INTRODUCTION

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers

THE DEGREE OF POLYNOMIAL CURVES WITH A FRACTAL GEOMETRIC VIEW

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Feasibility Study for Methods of Effective Memoization Optimization

Generation Rules in POMA Architecture

Composability Test of BOM based models using Petri Nets

An RDF NetAPI. Andy Seaborne. Hewlett-Packard Laboratories, Bristol

BCS Higher Education Qualifications. Diploma in IT. Object Oriented Programming Syllabus

Invasive Software Composition

Towards The Adoption of Modern Software Development Approach: Component Based Software Engineering

Grid-Based PDE.Mart: A PDE-Oriented PSE for Grid Computing

Distributed Objects. Object-Oriented Application Development

Use Case: Integrating Salesforce with SAP

A Capabilities Based Communication Model for High-Performance Distributed Applications: The Open HPC++ Approach

Internet Application Developer

Automatic Code Generation for Non-Functional Aspects in the CORBALC Component Model

Extended Dataflow Model For Automated Parallel Execution Of Algorithms

A Finite State Mobile Agent Computation Model

HETEROGENEOUS COMPUTING

Promoting Component Reuse by Separating Transmission Policy from Implementation

Hierarchical Replication Control

Monitoring System for Distributed Java Applications

Software Engineering

Produced by. Design Patterns. MSc in Communications Software. Eamonn de Leastar

Chapter 13: Architecture Patterns

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Invasive Software Composition. Uwe Aßmann Research Center for Integrational Software Engineering (RISE) PELAB IDA Linköpings Universitet

A Digital Library Framework for Reusing e-learning Video Documents

JOURNAL OF OBJECT TECHNOLOGY Online at Published by ETH Zurich, Chair of Software Engineering. JOT, 2002

High Level Abstractions for Implementation of Software Radios

GrADSoft and its Application Manager: An Execution Mechanism for Grid Applications

OBJECT ORIENTED SYSTEM DEVELOPMENT Software Development Dynamic System Development Information system solution Steps in System Development Analysis

An Approach to Software Component Specification

Review Sources of Architecture. Why Domain-Specific?

Design and Implementation of DAG-Based Co-scheduling of RPC in the Grid

Towards a Web-centric Legacy System Migration Framework

A nodal based evolutionary structural optimisation algorithm

University of Groningen. Towards Variable Service Compositions Using VxBPEL Sun, Chang-ai; Aiello, Marco

Automatic Transformation from Geospatial Conceptual Workflow to Executable Workflow Using GRASS GIS Command Line Modules in Kepler *

WhitePaper. Accelerating Web Services Integration With IONA XMLBUS & Altova xmlspy 2002 Altova GmbH and IONA Technologies. markup your mind!

Monday, 12 November 12. Matrices

Transcription:

A Component Framework for HPC Applications Nathalie Furmento, Anthony Mayer, Stephen McGough, Steven Newhouse, and John Darlington Parallel Software Group, Department of Computing, Imperial College of Science, Technology and Medicine 180 Queen s Gate, London SW7 2BZ, UK icpc-sw@doc.ic.ac.uk http://www-icpc.doc.ic.ac.uk/components/ Abstract. We describe a general component software framework designed for demanding grid environments that provides optimal performance for the assembled component application. This is achieved by separating the high level abstract description of the composition from the low level implementations. These implementations are chosen at run time by performance analysis of the composed application on the currently available resources. We show through the solution of a simple linear algebra problem that the framework introduces minimal overheads while always selecting the most effective implementation. Keywords: Component Composition, Grid Computing, Performance Optimisation, High Level Abstraction. 1 Introduction Within high performance and scientific computing there has been an increasing interest in component based design patterns. The high level of abstraction enables efficient end-user programming by the scientific community, while at the same time encapsulation encourages software development and reuse. With the emergence of computational grids [1] it is increasingly likely that applications built from components will be deployed upon a wide range of heterogenous hardware resources. Component based design reveals both potential difficulties and opportunities within such a dynamic environment. Difficulties may arise where a component s performance is inhibited by the heterogenous nature of the environment. For example, an application consisting of tightly coupled components deployed across distant platforms will suffer performance penalties not present when executed locally. However the abstraction provided by the component methodology allows for a separation of concerns which enables specialisation and optimisation without reducing the flexibility of the high level design. Such resource and performance aware optimisation helps offset the difficulties of grid-based computation. Research supported by the EPSRC grant GR/N13371/01 on equipment provided by the HEFCE/JREI grants GR/L26100 and GR/M92455 R. Sakellariou et al. (Eds.): Euro-Par 2001, LNCS 2150, pp. 540 548, 2001. c Springer-Verlag Berlin Heidelberg 2001

A Component Framework for HPC Applications 541 Algorithmic skeletons [2] have illustrated how a high level abstraction can be married to an efficient low-level high performance code, utilising the knowledge of the abstraction to optimise the implementation composition. We propose a component architecture that exploits this meta-data to optimise performance. In this paper we describe an implementation of this architecture, which consists of an abstract component description language incorporating composition and performance information and a run time framework, together with a simple example of a component application and its native implementations. We first consider an overview of the architecture, and then examine different aspects of the framework. 2 Overview of the Component Architecture We present a layered component architecture consisting of a high level language (an XML schema [3]), an execution framework and a component repository containing the component implementations and meta-data [4]. The run time framework executes the application, selecting implementations from the repository according to the meta-data, resource availability and the application description. The design and deployment cycle using this component framework is determined by its layered abstraction. 1. Component Construction Component interfaces are designed and placed within the repository, together with any implementations which may be written in native code. The developer places meta-data describing the implementation and its performance characteristics in the component repository. 2. Problem The end-user builds an application within a graphical problem solving environment (PSE). The PSE is connected to the component repository and is used to produce a high level XML representation of the problem. This representation is implementation independent, using only the component interface information. Fig. 1. GUI showing repository and composed application

542 Nathalie Furmento et al. 3. Run Time Representation The XML representation is automatically converted into a set of executable Java objects, which act as the run time support for the application. This utilises the meta-data. 4. Implementation Selection The run time representation is deployed into an execution environment. Here available implementations are selected according to performance models and the available resources. High performance native code is loaded and executed by the run time representation. Implementation selection may occur more than once for a given component: changing circumstances, either through a changing resources or computation, may force the selection of a more efficient implementation. 3 XML Representation The XML representation is the highest level of abstraction. The end-user composes the XML representation by means of an application builder tool. While we have provided a graphical user interface that enables rapid application development (see Figure 1), our Component XML (or CXML) provides a suitable target language for a number of possible end-user tools, without imposing constraints on their configuration. <application> <network> <instance componentname="source" componentpackage="icpc.linearsource" id="1"> <property name="degrees of freedom" value="100"/> </instance> <instance componentname="solver" componentpackage="icpc.linearsolver" id="2"/> <instance componentname="displayvector" componentpackage="icpc.matrix" id="3"/> <dataflow sinkcomponent="2" sinkport="matrix" sourcecomponent="1" sourceport="matrix"/>... </network> </application> The application builder produces a CXML application description document, as shown in the example above. This consists of the <network> and <repository> information. The network represents a composition of component instances together with any user customisations. The composition is a simple typed port and network mechanism, consisting of <instance> elements together with <dataflow> connectors. The connectors are attached between source and sink ports according to the component type. The user may customise the component by specifying simple values that are recorded as <property> elements. The choice of components within the builder tool is determined by those contained within the repository. The repository CXML data provides the interface information for the <component> types, specifying <port> and <property> elements. Thus ensuring that connections in a network correspond to a meaningful composition. The types and default properties specified in the repository CXML allow customisation where required. The repository component information also indicates component inheritance and package information. The repository also contains information needed to create the run time representation, namely the meta-data for the methods in the component implementation. These are represented by <implementation> and <action> elements. The

A Component Framework for HPC Applications 543 <action> elements specify the bindings to ports and the location of corresponding performance data. The package and location information for the executables is stored alongside the implementation data in the repository. These are referred to with <object> elements. <repository> <component package="icpc.linearsource" name="source" version="1"> <property type="external" name="degrees of freedom" value="1000"/> <port behaviour="out" objectpackage="icpc.matrix" objectname="dgerc" portname="matrix"/> <port behaviour="out" objectpackage="icpc.matrix" objectname="vector" portname="vector"/> <implementation language="java" platform="java" url="file:."> <action portname="matrix"> <binding method="getmatrix">... </binding> <classperformancemodel type="initial" url="http:" /> </action> </implementation> <implementation language="c" platform="linux" url="file:.">... </implementation> </component> <object package="icpc.matrix" name="dgerc" version="1"> <method name="getmatrix" type="action"> <argument mode="out" typename="dgerc" typepackage="icpc.matrix" /> </method> </object> </repository> 4 Run Time Representation When deployed onto a resource the CXML application description is converted into a run time representation, consisting of a network of Java proxy objects (JPOs), corresponding to the component instances in the application description document. This abstraction of the component application enables the dynamic selection of implementations, cross-component optimisation and implementation independent management. Each JPO provides a black box abstraction of the component s implementation and acts as the run time interface for the component. Any interaction between components occurs at the level of the JPO. This provides a means to trap method calls and select the appropriate implementation when required. The network of JPOs is created and linked by means of automatic code generation. The complete application description, including the relevant repository Implementation Object Binding Code P1 Port Object P1 P2 PMs PMs BC BC Execution Object Java Proxy Object P1 P2 PMs PMs BC BC Execution Object Native Code Object P2 Port Object Data Object Native Data Object Performance Models Fig. 2. A component s Java Proxy Object

544 Nathalie Furmento et al. data, is compiled from CXML into Java source code. The JPOs, together with the connections and property information, are represented by this glue code. The JPO contains Port Objects, Implementation Objects, Data Objects and Execution Objects. See Figure 2. Port Objects (PDO) The CXML <port> element has an attribute type that specifies whether it is an IN or OUT port. Each OUT <port> element has a corresponding PDO. Each IN <port> is represented by a reference to a PDO of the appropriate type. When the JPOs are created they are connected according to the application description by setting each IN port to refer to the appropriate PDO on the connected component. Each OUT <port> may offer many methods, which may then be called by any code within the component containing the corresponding IN port. It can be seen that this system provides a pull model of computation. During execution calls to a PDO are trapped, and an implementation is selected by examining the relevant IDOs. Implementation Objects (IDO) Each IDO is associated with a port. There are likely to be a number of different IDOs for each port, each representing a different implementation of the port s methods. The IDO contains the performance model meta-data for a given implementation, together with binding code that maps the ports methods to those of the actual implementation. When a method is called the PDO evaluates the relevant performance models and makes a decision as to which implementation to use. The binding code within the IDO maps the methods of the PDO to the actual implementation within the Execution Object. Execution Objects (EO) Whereanimplementationiswritteninanativelanguage (such as C), the execution object is a lightweight wrapper that makes use of the Java Native Interface [5]. This is responsible for the loading, unloading and execution of the native libraries. For pure Java implementations the EO acts as an intermediary between the actual implementation codes and the IDO. It should be noted that while there are possibly many IDOs for each port (corresponding to the different implementation options) there may also be many IDOs for a given EO. This is because a particular implementation may make use of shared objects and libraries appropriate for Port 1 Port 2 Port Objects C1 IDO J1 IDO C2 IDO J2 IDO Implementation Objects C++ Java Execution Objects Fig. 3. Port, Implementation and Execution Objects

A Component Framework for HPC Applications 545 a number of different method calls from distinct ports. To maintain consistency a component will only select one EO at any given time. This is illustrated in Figure 3. Data Objects (DDO) To enable the migration of the component code between resources, and to allow the implementation selection to be altered during execution persistent data is stored outside the execution object. Thus the JPO needs to retain a reference to any data that may be used in subsequent method invocations. The format of the data clearly depends upon the implementation used to generate it. This reference together with mappings between the data format and Java are stored in a DDO. Access to the persistent data is by method calls exposed in a PDO. By forcing calls to use the port mechanism, data access (often a significant part of a computation) is taken into account by the performance modelling and selection system. Where data is unavailable the PDO examines the respective IDOs as described above. Property values (which are stored as pure Java objects) are refered to by a DDO in the same way as any other data type. When the JPOs are created from the application description property values are assigned directly. 5 Implementation Selection An application begins executing with an initial method call. The relevant PDO is responsible for the choice of implementation. By forwarding the relevant performance models from the IDOs to the framework a decision, based upon estimated running time, is made as to which implementation to use. The performance model usually refers to other calls the component needs to make. These calls are then forwarded to their respective PDOs, where they are trapped and the performance models returned. Thus a composite performance model for the application is built. The port mechanism exposes data movement between components allowing the data transfer time to be assessed and subsequently optimised to eliminate unecessary data copying. The technique is entirely deterministic, and results in a dry run of the application. As each JPO may possess a number of possible implementations this procedure results in a call graph. The performance model may also refer to data within the component, for example a property representing the number of entries in a matrix. As these Component instance Data type Linear Equation Matrix Vector Solver Vector Display Dataflow connectors Fig. 4. Linear Equation Solver

Time (s) Time (s) 546 Nathalie Furmento et al. properties may change during execution, or may not be known at start-up, the implementation decision can only take place during execution. The run time representation allows such a change of implementation with little overhead. 6 Example & Results A composite application that solves a set of linear equations consists of three component instances; a simple linear equation generator, a solver, and a simple display component. See Figure 4. This example was used to provide the XML fragments already shown, and is now used to provide experimental results. A set of real linear equations is generated from a component with a Java implementation. These equations are solved by one of two linear solvers written in C. The first is a direct solver using LU factorisation, whilst the second is an iterative Biconjugate Gradient method (BCG) [6]. These two algorithms have different time and memory requirements allowing the JPO to select the most appropriate implementation for the user-defined problem size. In this example the display component is only used to verify that the solutions are correct. All experiments were performed on a single 900MHz PC running Linux. The Java Native Interface [5] was used to allow access to the C implementations. Figures 5 and 6 illustrate the performance models for the LU and BCG implementations. The performance models are generated by fitting curves to actual performance results. In both cases two curves are fitted to the data, each covering a range of problem sizes. Note that for the LU solver only the second curve is shown, as the first curve deals only with small matrix sizes and is not visible on the graph. Figure 7 illustrates the execution time for the network, along with the solution times for a benchmark BCG and LU solver. For small matrix sizes the LU solution is more efficient, thus the JPO selects this implementation. It can be seen from the graph that both the network and LU execution times are close. For matrices of size 725 and larger the BCG method becomes more efficient and 160 140 120 100 80 60 Time to Solve using LU factorisation Performance Model (size > 420) 14 12 10 8 6 40 20 4 2 Time to Solve using BCG Performance Model (size 310) Performance Model (size < 310) 0 500 1000 Matrix Size 1500 2000 0 500 1000 Matrix Size 1500 2000 Fig. 5. Run time for LU solver Fig. 6. Run time for BCG solver

Time (s) % of execution time A Component Framework for HPC Applications 547 16 14 12 10 8 6 4 2 Network BCG LU 100 10 1 0.1 Overhead 0 500 1000 Matrix Size 1500 2000 0 500 1000 Matrix Size 1500 2000 Fig. 7. Runtimefortheexamplesystem Fig. 8. Overhead from component network the JPO selects this approach. This is again visible on the graph as the network results follow the BCG solution. The overhead incurred from the use of the network solution is shown in Figure 8. This is the proportion of execution time that is not spent in computation. It can be seen from this graph that initially the overheads are high, with the value decreasing to less than 0.1% as the matrix size increases. The large overheads, and variations in results, for small matrices are partly due to the inaccuracy in timings of the system. Fluctuations in the BCG solution times can be attributed to the variable number of iterations required which effects the relative proportion of the overhead. 7 Conclusion We have shown that the solution of a problem described as a set of interacting components can be implemented successfully. Furthermore it has been shown that the cost of performing the computation through such a framework is negligible, in comparison to the overall execution, when the problem size is large. In this case, the overhead appears to be independent of the problem size. Hence it is clear that the separation of concerns together with layered abstraction provides flexibility without sacrificing the performance HPC applications demand. The techniques developed here are not restricted to the example system, but are applicable to arbitrary component networks and application domains. We are continuing to develop this framework for distributed parallel components. References 1. I. Foster and C. Kesselman, editors. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999. 540 2. J. Darlington, P. Au, M. Ghanem, and Y. Guo. Co-ordinating heterogenous parallel computation. In Proceedings of International Euro-Par Conference, pages 601 614. Springer, August 1996. Distinguished Paper. 541

548 Nathalie Furmento et al. 3. W3 Consortium. XML: extensible Markup Language. http://www.w3c.org/xml. 541 4. S. Newhouse, A. Mayer, and J. Darlington. A Software Architecture for HPC Grid Applications. In Proceedings of International Euro-Par Conference, pages 686 689. Springer, August/September 2000. 541 5. R. Gordon. Essential JNI, Java Native Interface. Prentice Hall, 1998. 544, 546 6. William H. Press. Numerical recipes in C : the art of scientific computing. Cambridge University Press, 1988. 546