Position paper submitted to the Workshop on Network Centric Operating Systems Bruxelles 16-17 march 2005 Top-down definition of Network Centric Operating System features Thesis Marco Danelutto Dept. Computer Science Univ. Pisa mailto://marcod@di.unipi.it http://www.di.unipi.it/~marcod Network (and grid, if we view grids as networks with specific middleware on top of the component processing node operating systems) programming is an hard task due to the necessity of taking into account network latencies and faults, dynamic node availability and load, performance/efficiency requirements and constrains, etc. Figure 1 Factorizing our long experience in the field of high performance, structured parallel programming models for workstation clusters/networks and grids, we propose to derive a sort of reduced instruction set to be included in a network centric operating system to support layered responsibility in the development of network/grid applications. We claim that the basic features needed to run grid applications is only a small set of features, including efficient data/file transfer, resource inventory/discovery, remote commanding, accounting and communication. We also claim that these few mechanisms can be exploited using a mix of static techniques (compiler based) and dynamic techniques (run time system based) to implement the programming model at hand, much in the sense the RISC processor instructions was exploited to implement completely different high level programming languages [1][2]. Therefore we propose to study an approach like the one summarized in Figure 1. Typical, effective grid applications should be studied to carefully understand which is the set of grid mechanisms exploited. In the meanwhile, typical operating
systems have to be analyzed to understand which mechanisms suitable to support grid programming they provide and which is their efficiency. Then, carefully analyzing the results achieved in these two steps, the set of mechanisms actually needed in a network centric operating system can be derived. Such mechanisms can be added to existing operating systems or a new operating system including them can be designed. Whether the former or the latter choice is to be taken, it depends on the number of mechanisms needed and in the cost involved in their introduction in (one of the) existing operating systems. Such an approach should guarantee that we do not incur the risk of fixing too many features in the operating system. We have experience in the GRID.it project that supports this approach. In this project we developed a high level, structured, parallel grid-programming environment that is compiled on top of existing Globus middleware. The features of both Globus and underlying operating system actually used to run the programs developed using this environment are actually a very small set of mechanisms. Almost no one of the policies supported by Globus were actually used. This is possible as the implementation of the programming environment is highly structured, as show in Figure 2. Figure 2 In the GRID.it framework we used the following features of underlying middleware/os: Resource discovery: the compiler generates an XML file with the specification of the computing resources needed to implement the parallel grid program at hand. Each resource eventually runs a GRID.it process implementing a single node of the overall application graph. Possibly, constrains are included in the XML file specifying nodes features required (CPUs, memory, disk, network bandwidth, etc). Compiler tools and run time/loader software perform the mapping of logical resources to physical ones. Policies concerning process mapping and scheduling are taken by the compiler/run time/loader associated with the programming language, rather than being demanded to the middleware or to the underlying operating system. The resources needed to execute the program are automatically discovered at run time. Code/data staging: process code is generated suitable for the resources chosen to execute the program and eventually it is staged to these resources for the execution. Data needed at the different resources can be also staged before actual computation start. Remote commanding: processes whose code has been staged to remote resources have to be started, of course. But remote commanding is also needed to both monitor and control the execution of these processes. Sensors, or callbacks, or events, can also be very useful to implement application monitoring. Inter-PE communication: the processes deployed and run at the remote resources need to communicate and synchronize each other. Communications are scheduled in the process code by compiler and/or adjusted by the run time. Authentication: remote commands, as well as staging commands should be executed in an authenticated framework. All the transactions needed for authentication are actually performed by the code generated by the compiler. We currently borrow Resource discovery, code staging, remote commanding and authentication from Globus, while communications are performed through plain TCP/IP.
In a Network Centric Operating System Perspective, we would like: To have a better, faster communication mechanism, in particular supporting efficient collective communications as well as small data communications, the former being currently emulated in TCP, unless the unreliable UDP protocol is used, the latter being quite costly due to usual high latency network overheads. To have a more efficient (that is: faster and reliable) discovery mechanism. Resource discovery is needed at the very beginning to find out the resources necessary to run the application but it is also needed in case the set of resources used to execute the application is to be dynamically restructured to adapt to changing features (e.g. load, accessibility, etc.) of the available nodes. Therefore, faster, more responsive mechanisms than those provided by services such as Globus MDS are necessary. Safe and secure staging and remote commanding high performance features are needed to combine distributed execution and security. Some of these features can be provided as extensions of existing operating system. Faster communication mechanisms, for instance can be added to existing operating systems. In case point-to-point communications are the main interaction mechanisms between the nodes participating to the distributed grid computation, different mechanisms than those assumed by plain TCP can be adopted, taking into account that TCP was designed and it has been developed and optimized with a different communication paradigm in mind. Other features can either be included in the operating system or kept outside, on a further middleware layer. Discovery mechanism can be implemented on top of an extended operating system supplying fast, reliable collective communications, for instance. In this case discovery will be a part of the run time support of the programming environment rather than a mechanism (possibly with policies) of the operating system/middleware. Background and interest The Department of Computer Science of the University of Pisa is actively participating to several projects related to grids. Marco Vanneschi is currently the scientific coordinator of the SSA GridCOORD, Marco Danelutto leads the Virtual Institute on Programming models in the CoreGRID NoE. Both are involved in the Italian national FIRB research project GRID.it, the former being the national coordinator, the latter being the responsible of the Programming environment work package. Several other groups in the Department participate in the above mentioned or different research projects related to grid. One group is actively making research on algorithms for web and grid related problems, another group is active making research on knowledge mining on grid, different groups are active in programming different applications on grid architectures, including, as and example, biomedical applications The Dept. of Computer Science of Pisa hosts about 70 professors (full and associate) and associate researchers. The Vanneschi/Danelutto group is active in the design and implementation of programming environments suitable to be used for the implementation of efficient, parallel applications for grids. In particular, the approach adopted is based on a layered implementation of the programming environment: a compiler layer performs static optimizations, a run time layer implements dynamic optimizations and eventually a grid abstract machine layer abstracts from the existing middleware those features that are needed to allow the run time to control (handle) all the grid related features (problems) in the application execution. The compiler and run time layers, in conjunction, implement autonomic control procedures that allow leaving the programmer almost completely unaware of the existence of the grid (invisible grid). Overall the grid abstract machine is supposed to provide the schema of a possible GRID OS. Preliminary results have already been achieved following these intuitions in the framework of the three year Italian national research project GRID.it [10] that s going to end in 2005. One of the project results will be an implementation of a prototype, component based, parallel programming environment exploiting the layered structure described above [5][6][7][9]. Previous experience of our group is also on the development of efficient communication mechanisms as extensions of existing operating systems [3][4], as well as in the design of efficient network-wide shared memory/caching mechanism [8]. In particular, we designed a Linux module dealing processing short messages through IP RAW sockets in an efficient way (10% to 15% better that plain TCP/IP) [3].
Interests We want to study the feasibility of deriving a grid operating system design according to a top down approach: first the features needed by an effective programming environment for grids are analyzed, experimented and assessed. Then the features and mechanisms needed in the run time supporting the programming environment are fixed and assessed. Eventually the mechanisms needed in the operating system are distilled. We think that the mechanisms we need to efficiently implement at the operating system level to support grid computing can be derived looking at the way the typical grid applications exploit current grid middleware features. We also believe that it is fundamental to design a clear hierarchy of responsibilities that assign to each layer the implementation of the proper mechanisms and policies. As an example, we firmly believe that the grid operating system layer has only to provide basic mechanisms (staging, communication, commanding, resource inventory), leaving the implementation of policies to upper layers (run time system and compiler). Expectation from a GridOS A European coordinated/integrated research activity on Network Centric (Grid?) operating system can lead to the definition of a basic set of mechanisms that have to be efficiently implemented in the operating system layer in a fully layered scenario. In this scenario middleware directly accesses the operating system but user/programmer access to the middleware layer is completely mediated by proper compiler layers providing the programmer with higher programming abstractions. Once the set of necessary mechanism has been individuated, suitable extensions of existing, open source operating systems can be designed that efficiently implement such mechanisms. We point out that efficiency of a very basic operation set is the most important feature expected. Mechanism that can be used to implement/emulate the mechanisms needed to implement grid or network parallel and distributed applications are already present in any one of the existing operating systems. What s needed is to have a very efficient implementation of such mechanisms. To be clearer, we don t need simply another a discovery mechanism to look for available resources. TCP/IP (UDP, possibly) can be used to do that, provided suitable numbers/ranges of firewall safe ports are used. We need an efficient, possibly dedicated communication mechanism to perform resource discovery. (Group) References [1] M. Danelutto, M. Vanneschi. A RISC approach to GRID, January 24, 2005 TR-05-02, Dept. Computer Science, University of Pisa, http://compass2.di.unipi.it/tr/ [2] M. Danelutto, "A RISC approach to the GRID", talk at the IFIP 10.3 workshop, London, Nov, 1st, 2003, slides available at http://www.di.unipi.it/~marcod/papers/ifiplon.pdf [3] M. Danelutto and A. Rampini, Fast short messages on a Linux cluster, Proceedings of the Hpcn2001 Conference, June 2001, (High Performance Computing and Newtworking, Hertzberger, Hoekstra and Williams editors, LNCS Springer Verlag, No. 2110, pages 393--402) [4] M. Danelutto and C. Pucci A compact, thread-safe communication library for efficient cluster computing, Proceedings of the HPCN'2000 Conference, Springer Verlag, LNCS No. 1823, 2000 [5] M. Aldinucci, M. Coppola, M. Danelutto, M. Vanneschi, C. Zoccolo. ASSIST as a Research Framework for High-performance Grid Programming Environments (Chapter in) Grid Computing: Software environments and Tools, Jose C. Cunha and Omer F. Rana (Eds), Springer Verlag, 2004 [6] M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, M. Danelutto, P. Pesciullesi, L. Potiti, R. Ravazzolo, M. Torquati, M. Vanneschi, C. Zoccolo, ASSIST demo: A High Level, High Performance, Portable, Structured Parallel Programming Environment at Work, Proceedings of Euro-Par 2003, H. Kosh, L. Boszormenyi and H. Hellwagner editors, LNCS No. 2790, Springer Verlag, pages 1295-1300 [7] M. Danelutto, HPC the easy way: new technologies for high performance applications deploymnet Journal of Systems Architecture, Vol. 49, Issues 10-11, Nov. 2003, pages 399-419, Elsevier [8] M. Aldinucci, M. Torquati. Accelerating Apache farms through ad-hoc distributed scalable object repository. In Proc. of Intl. Conference EuroPar2004: Parallel and Distributed Computing, Pisa, Italy, LNCS n. 3149, Springer, Sept. 2004.
[9] M. Aldinucci, S. Campa, M. Coppola, S. Magini, P. Pesciullesi, L. Potiti, R. Ravazzolo, M. Torquati, C. Zoccolo. Targeting heterogeneous architectures in ASSIST: experimental results In Proc. of Intl. Conference EuroPar2004: Parallel and Distributed Computing, Pisa, Italy, LNCS n. 3149, Sept. 2004. [10] GRID.it project web site, 2005, http://www.grid.it