Top-down definition of Network Centric Operating System features

Similar documents
19/05/2010 SPD 09/10 - M. Coppola - The ASSIST Environment 28 19/05/2010 SPD 09/10 - M. Coppola - The ASSIST Environment 29. <?xml version="1.0"?

The Implementation of ASSIST, an Environment for Parallel and Distributed Programming

M. Danelutto. University of Pisa. IFIP 10.3 Nov 1, 2003

TOWARDS THE AUTOMATIC MAPPING OF ASSIST APPLICATIONS FOR THE GRID

Adaptative Behavior with GCM

LIBERO: a framework for autonomic management of multiple non-functional concerns

The cost of security in skeletal systems

Multi-Channel Clustered Web Application Servers

The ASSIST Programming Environment

Grid Scheduling Use Cases

Euro-Par Pisa - Italy

Components, GCM, and Behavioural skeletons

COMPONENTS FOR HIGH-PERFORMANCE GRID PROGRAMMING IN GRID.IT

An abstract schema modeling adaptivity management

COMPONENTS FOR HIGH-PERFORMANCE GRID PROGRAMMING IN GRID.IT *

Joint Structured/Unstructured Parallelism Exploitation in muskel

A Model for Scientific Computing Platform

Towards the Performance Visualization of Web-Service Based Applications

Algorithmic skeletons meeting grids q

Marco Danelutto. May 2011, Pisa

Using peer to peer. Marco Danelutto Dept. Computer Science University of Pisa

Optimization Techniques for Implementing Parallel Skeletons in Distributed Environments

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance

Design and Evaluation of a Socket Emulator for Publish/Subscribe Networks

FROM ORC MODELS TO DISTRIBUTED GRID JAVA CODE

Functional Requirements for Grid Oriented Optical Networks

Two Fundamental Concepts in Skeletal Parallel Programming

Optimization Techniques for Implementing Parallel Skeletons in Grid Environments

UNICORE Globus: Interoperability of Grid Infrastructures

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

Crisis and paradox in distributed-systems development

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences

Managing CAE Simulation Workloads in Cluster Environments

Self-Configuring and Self-Optimising Grid Components in the GCM model and their ASSIST Implementation

ROCI 2: A Programming Platform for Distributed Robots based on Microsoft s.net Framework

Automatic mapping of ASSIST applications using process algebra

High Performance Computing Cloud - a PaaS Perspective

Creating and Running Mobile Agents with XJ DOME

ELFms industrialisation plans

Lecture 1: January 22

Revisiting Join Site Selection in Distributed Database Systems

Performance Cockpit: An Extensible GUI Platform for Performance Tools

CHAPTER 3 GRID MONITORING AND RESOURCE SELECTION

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

A WSN middleware for security and localization services

Personal Grid Running at the Edge of Internet *

Skeletons for multi/many-core systems

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Building Interoperable Grid-aware ASSIST Applications via Web Services

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

Consistent Rollback Protocols for Autonomic ASSISTANT Applications

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

An efficient Unbounded Lock-Free Queue for Multi-Core Systems

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation

Knowledge Discovery Services and Tools on Grids

How to Run Scientific Applications over Web Services

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

DREMS: A Toolchain and Platform for the Rapid Application Development, Integration, and Deployment of Managed Distributed Real-time Embedded Systems

Panel: Pattern management challenges

Adaptive Cluster Computing using JavaSpaces

Lecture 1: January 23

Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks

Advanced Grid Technologies, Services & Systems: Research Priorities and Objectives of WP

A Digital Library Framework for Reusing e-learning Video Documents

Mobile robots and appliances to support the elderly people

Using IKAROS as a data transfer and management utility within the KM3NeT computing model

1.264 Lecture 16. Legacy Middleware

ICD Wiki Framework for Enabling Semantic Web Service Definition and Orchestration

CSE 5306 Distributed Systems. Course Introduction

processes based on Message Passing Interface

A Component Framework for HPC Applications

Towards ParadisEO-MO-GPU: a Framework for GPU-based Local Search Metaheuristics

An Annotation Tool for Semantic Documents

Improving Separation of Concerns in the Development of Scientific Applications

Autonomic Features in GCM

Mitigating Data Skew Using Map Reduce Application

MPI in 2020: Opportunities and Challenges. William Gropp

A Simulation Model for Large Scale Distributed Systems

THE IMPACT OF E-COMMERCE ON DEVELOPING A COURSE IN OPERATING SYSTEMS: AN INTERPRETIVE STUDY

Tools and Services for Distributed Knowledge Discovery on Grids

Investigating F# as a development tool for distributed multi-agent systems

Real-time & Embedded Systems Workshop July 2007 Building Successful Real-time Distributed Systems in Java

Hungarian Supercomputing Grid 1

GCM Non-Functional Features Advances (Palma Mix)

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Delivering Data Management for Engineers on the Grid 1

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract.

High Level Architecture and Agent Technology based Astronautics Simulation Platform and Cluster Computing Environment s Construction

An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.

NUSGRID a computational grid at NUS

An Active Resource Management System for Computational Grid*

Connecting Sensor Networks with TCP/IP Network

Introduction to Grid Technology

INFS 214: Introduction to Computing

Transcription:

Position paper submitted to the Workshop on Network Centric Operating Systems Bruxelles 16-17 march 2005 Top-down definition of Network Centric Operating System features Thesis Marco Danelutto Dept. Computer Science Univ. Pisa mailto://marcod@di.unipi.it http://www.di.unipi.it/~marcod Network (and grid, if we view grids as networks with specific middleware on top of the component processing node operating systems) programming is an hard task due to the necessity of taking into account network latencies and faults, dynamic node availability and load, performance/efficiency requirements and constrains, etc. Figure 1 Factorizing our long experience in the field of high performance, structured parallel programming models for workstation clusters/networks and grids, we propose to derive a sort of reduced instruction set to be included in a network centric operating system to support layered responsibility in the development of network/grid applications. We claim that the basic features needed to run grid applications is only a small set of features, including efficient data/file transfer, resource inventory/discovery, remote commanding, accounting and communication. We also claim that these few mechanisms can be exploited using a mix of static techniques (compiler based) and dynamic techniques (run time system based) to implement the programming model at hand, much in the sense the RISC processor instructions was exploited to implement completely different high level programming languages [1][2]. Therefore we propose to study an approach like the one summarized in Figure 1. Typical, effective grid applications should be studied to carefully understand which is the set of grid mechanisms exploited. In the meanwhile, typical operating

systems have to be analyzed to understand which mechanisms suitable to support grid programming they provide and which is their efficiency. Then, carefully analyzing the results achieved in these two steps, the set of mechanisms actually needed in a network centric operating system can be derived. Such mechanisms can be added to existing operating systems or a new operating system including them can be designed. Whether the former or the latter choice is to be taken, it depends on the number of mechanisms needed and in the cost involved in their introduction in (one of the) existing operating systems. Such an approach should guarantee that we do not incur the risk of fixing too many features in the operating system. We have experience in the GRID.it project that supports this approach. In this project we developed a high level, structured, parallel grid-programming environment that is compiled on top of existing Globus middleware. The features of both Globus and underlying operating system actually used to run the programs developed using this environment are actually a very small set of mechanisms. Almost no one of the policies supported by Globus were actually used. This is possible as the implementation of the programming environment is highly structured, as show in Figure 2. Figure 2 In the GRID.it framework we used the following features of underlying middleware/os: Resource discovery: the compiler generates an XML file with the specification of the computing resources needed to implement the parallel grid program at hand. Each resource eventually runs a GRID.it process implementing a single node of the overall application graph. Possibly, constrains are included in the XML file specifying nodes features required (CPUs, memory, disk, network bandwidth, etc). Compiler tools and run time/loader software perform the mapping of logical resources to physical ones. Policies concerning process mapping and scheduling are taken by the compiler/run time/loader associated with the programming language, rather than being demanded to the middleware or to the underlying operating system. The resources needed to execute the program are automatically discovered at run time. Code/data staging: process code is generated suitable for the resources chosen to execute the program and eventually it is staged to these resources for the execution. Data needed at the different resources can be also staged before actual computation start. Remote commanding: processes whose code has been staged to remote resources have to be started, of course. But remote commanding is also needed to both monitor and control the execution of these processes. Sensors, or callbacks, or events, can also be very useful to implement application monitoring. Inter-PE communication: the processes deployed and run at the remote resources need to communicate and synchronize each other. Communications are scheduled in the process code by compiler and/or adjusted by the run time. Authentication: remote commands, as well as staging commands should be executed in an authenticated framework. All the transactions needed for authentication are actually performed by the code generated by the compiler. We currently borrow Resource discovery, code staging, remote commanding and authentication from Globus, while communications are performed through plain TCP/IP.

In a Network Centric Operating System Perspective, we would like: To have a better, faster communication mechanism, in particular supporting efficient collective communications as well as small data communications, the former being currently emulated in TCP, unless the unreliable UDP protocol is used, the latter being quite costly due to usual high latency network overheads. To have a more efficient (that is: faster and reliable) discovery mechanism. Resource discovery is needed at the very beginning to find out the resources necessary to run the application but it is also needed in case the set of resources used to execute the application is to be dynamically restructured to adapt to changing features (e.g. load, accessibility, etc.) of the available nodes. Therefore, faster, more responsive mechanisms than those provided by services such as Globus MDS are necessary. Safe and secure staging and remote commanding high performance features are needed to combine distributed execution and security. Some of these features can be provided as extensions of existing operating system. Faster communication mechanisms, for instance can be added to existing operating systems. In case point-to-point communications are the main interaction mechanisms between the nodes participating to the distributed grid computation, different mechanisms than those assumed by plain TCP can be adopted, taking into account that TCP was designed and it has been developed and optimized with a different communication paradigm in mind. Other features can either be included in the operating system or kept outside, on a further middleware layer. Discovery mechanism can be implemented on top of an extended operating system supplying fast, reliable collective communications, for instance. In this case discovery will be a part of the run time support of the programming environment rather than a mechanism (possibly with policies) of the operating system/middleware. Background and interest The Department of Computer Science of the University of Pisa is actively participating to several projects related to grids. Marco Vanneschi is currently the scientific coordinator of the SSA GridCOORD, Marco Danelutto leads the Virtual Institute on Programming models in the CoreGRID NoE. Both are involved in the Italian national FIRB research project GRID.it, the former being the national coordinator, the latter being the responsible of the Programming environment work package. Several other groups in the Department participate in the above mentioned or different research projects related to grid. One group is actively making research on algorithms for web and grid related problems, another group is active making research on knowledge mining on grid, different groups are active in programming different applications on grid architectures, including, as and example, biomedical applications The Dept. of Computer Science of Pisa hosts about 70 professors (full and associate) and associate researchers. The Vanneschi/Danelutto group is active in the design and implementation of programming environments suitable to be used for the implementation of efficient, parallel applications for grids. In particular, the approach adopted is based on a layered implementation of the programming environment: a compiler layer performs static optimizations, a run time layer implements dynamic optimizations and eventually a grid abstract machine layer abstracts from the existing middleware those features that are needed to allow the run time to control (handle) all the grid related features (problems) in the application execution. The compiler and run time layers, in conjunction, implement autonomic control procedures that allow leaving the programmer almost completely unaware of the existence of the grid (invisible grid). Overall the grid abstract machine is supposed to provide the schema of a possible GRID OS. Preliminary results have already been achieved following these intuitions in the framework of the three year Italian national research project GRID.it [10] that s going to end in 2005. One of the project results will be an implementation of a prototype, component based, parallel programming environment exploiting the layered structure described above [5][6][7][9]. Previous experience of our group is also on the development of efficient communication mechanisms as extensions of existing operating systems [3][4], as well as in the design of efficient network-wide shared memory/caching mechanism [8]. In particular, we designed a Linux module dealing processing short messages through IP RAW sockets in an efficient way (10% to 15% better that plain TCP/IP) [3].

Interests We want to study the feasibility of deriving a grid operating system design according to a top down approach: first the features needed by an effective programming environment for grids are analyzed, experimented and assessed. Then the features and mechanisms needed in the run time supporting the programming environment are fixed and assessed. Eventually the mechanisms needed in the operating system are distilled. We think that the mechanisms we need to efficiently implement at the operating system level to support grid computing can be derived looking at the way the typical grid applications exploit current grid middleware features. We also believe that it is fundamental to design a clear hierarchy of responsibilities that assign to each layer the implementation of the proper mechanisms and policies. As an example, we firmly believe that the grid operating system layer has only to provide basic mechanisms (staging, communication, commanding, resource inventory), leaving the implementation of policies to upper layers (run time system and compiler). Expectation from a GridOS A European coordinated/integrated research activity on Network Centric (Grid?) operating system can lead to the definition of a basic set of mechanisms that have to be efficiently implemented in the operating system layer in a fully layered scenario. In this scenario middleware directly accesses the operating system but user/programmer access to the middleware layer is completely mediated by proper compiler layers providing the programmer with higher programming abstractions. Once the set of necessary mechanism has been individuated, suitable extensions of existing, open source operating systems can be designed that efficiently implement such mechanisms. We point out that efficiency of a very basic operation set is the most important feature expected. Mechanism that can be used to implement/emulate the mechanisms needed to implement grid or network parallel and distributed applications are already present in any one of the existing operating systems. What s needed is to have a very efficient implementation of such mechanisms. To be clearer, we don t need simply another a discovery mechanism to look for available resources. TCP/IP (UDP, possibly) can be used to do that, provided suitable numbers/ranges of firewall safe ports are used. We need an efficient, possibly dedicated communication mechanism to perform resource discovery. (Group) References [1] M. Danelutto, M. Vanneschi. A RISC approach to GRID, January 24, 2005 TR-05-02, Dept. Computer Science, University of Pisa, http://compass2.di.unipi.it/tr/ [2] M. Danelutto, "A RISC approach to the GRID", talk at the IFIP 10.3 workshop, London, Nov, 1st, 2003, slides available at http://www.di.unipi.it/~marcod/papers/ifiplon.pdf [3] M. Danelutto and A. Rampini, Fast short messages on a Linux cluster, Proceedings of the Hpcn2001 Conference, June 2001, (High Performance Computing and Newtworking, Hertzberger, Hoekstra and Williams editors, LNCS Springer Verlag, No. 2110, pages 393--402) [4] M. Danelutto and C. Pucci A compact, thread-safe communication library for efficient cluster computing, Proceedings of the HPCN'2000 Conference, Springer Verlag, LNCS No. 1823, 2000 [5] M. Aldinucci, M. Coppola, M. Danelutto, M. Vanneschi, C. Zoccolo. ASSIST as a Research Framework for High-performance Grid Programming Environments (Chapter in) Grid Computing: Software environments and Tools, Jose C. Cunha and Omer F. Rana (Eds), Springer Verlag, 2004 [6] M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, M. Danelutto, P. Pesciullesi, L. Potiti, R. Ravazzolo, M. Torquati, M. Vanneschi, C. Zoccolo, ASSIST demo: A High Level, High Performance, Portable, Structured Parallel Programming Environment at Work, Proceedings of Euro-Par 2003, H. Kosh, L. Boszormenyi and H. Hellwagner editors, LNCS No. 2790, Springer Verlag, pages 1295-1300 [7] M. Danelutto, HPC the easy way: new technologies for high performance applications deploymnet Journal of Systems Architecture, Vol. 49, Issues 10-11, Nov. 2003, pages 399-419, Elsevier [8] M. Aldinucci, M. Torquati. Accelerating Apache farms through ad-hoc distributed scalable object repository. In Proc. of Intl. Conference EuroPar2004: Parallel and Distributed Computing, Pisa, Italy, LNCS n. 3149, Springer, Sept. 2004.

[9] M. Aldinucci, S. Campa, M. Coppola, S. Magini, P. Pesciullesi, L. Potiti, R. Ravazzolo, M. Torquati, C. Zoccolo. Targeting heterogeneous architectures in ASSIST: experimental results In Proc. of Intl. Conference EuroPar2004: Parallel and Distributed Computing, Pisa, Italy, LNCS n. 3149, Sept. 2004. [10] GRID.it project web site, 2005, http://www.grid.it