Design of it : an Aldor library to express parallel programs Extended Abstract Niklaus Mannhart Institute for Scientic Computing ETH-Zentrum CH-8092 Z

Size: px

Start display at page:

Download "Design of it : an Aldor library to express parallel programs Extended Abstract Niklaus Mannhart Institute for Scientic Computing ETH-Zentrum CH-8092 Z"

Dorothy Manning
5 years ago
Views:

1 Design of it : an Aldor library to express parallel programs Extended Abstract Niklaus Mannhart Institute for Scientic Computing ETH-Zentrum CH-8092 Zurich, Switzerland mannhart@inf.ethz.ch url: October 31, 1997 Abstract We describe the implementation of asynchronous function calls and remote partial evaluated function calls in it. Both constructs are high level constructs that hide low level programming details. In the talk we present examples and show results. 1 Introduction it (pi-it) is an Aldor [10] library for parallel computation, currently under development at ETH Zurich. It is designed to be architecture independent (heterogeneous) and runs on network of workstations, where a workstation can be a single processor or a multiprocessor machine (shared or distributed memory). The library is portable across dierent Unix platforms and can be easily compiled on all systems that Aldor supports. The goals of it are To be independent from the architecture (portability). Parallel program using it can be eciently executed. High level in order to hide low level programming details (especially synchronization of parallel programs). The design of it is close 1 to the design of PAC++ [7], Paclib [5], Sugarbush [4] or Parsac/DTS [3]. Dierences are the language (Aldor is high level language well dened for computer algebra and more ecient) and the parallel programming model. The current parallel programming model is based on a data parallel paradigm using Aldor's generator for parallel iterator (map) and reduction. 1 Overview of actual systems is found in [9]. 1

2 In this paper we concentrate on the design of it, its current status and the future work. The next section gives an overview of the implementation of the parallel programming model of it. The third section focuses on the specications for the implementation of full features needed by the data parallel programming model of it (remote call to partial evaluated function). 2 Parallel iterator with it The Aldor language denes the concept of generator. Each collection of data (list, array) in the Aldor library has its own generator to iterate on each element of the data structure. For example, a list data structure looks like L: List(R) := createfromacomputation(); -- result from a computation G: Generator(R) := generator(l); -- get the generator of item of L for item in G repeat { -- iteration using the generator -- do something } it extends this data structure by using generators to split collections of data and apply parallel map and reduction. The following example shows the implementation of a parallel map of function f and the reconstruction of a vector. V: Vector(R) := createfromcomputation(); G: Generator SubVector(R) := split(v,k); -- split V in k blocks Vnew:= merge(pmap( f, G )); Note that a merge function accepts a generator of the sub vector in input and returns a vector. The semantic of a pair of split/merge functions depend on their implementation. Several map functions are implemented. They dene semantics on the sequence of generated elements. For example, in order to have a to generate the sequence of results with the same order as of the inputs. it is able to handle speculative parallelism (innite generator) by using generators. In such a case a control structure (pmapctrl) which controls the generated parallel tasks of computation is returned. The implementation of parallel map is based on remote function calls and synchronization described in the next section. 3 Remote function call Remote procedure calls (rpc) [1] isawell known method for transferring control form one part of a process to another. After the call, the return value is sent back to the caller. From the point of view of the user, a remote procedure call looks like a local procedure call. Unfortunately the caller is blocked until the remote procedure call returns. To get rid of this limitation we can use the fork/join paradigm in the sense of the PRAM [6] model. In order to distinguish (blocked) remote procedure calls form a fork/join method we call the latter asynchronous function calls. 2

3 3.1 it features When a remote function is called, it stores the function with the arguments in an internal list and returns a join variable. Hence the ow of control continues to run. The internal data structure is a job which consist of a function plus arguments plus a location for the result. Jobs are executed by the scheduler which takes the submitted job and decides which processor has to do the work. The scheduler packs the arguments and sends the job to the scheduler of another processor. There the scheduler reconstructs the job and puts it in its job queue. The scheduler handles the communication of arguments and the result of execution to the specied location. When the ow of control needs to be synchronized with the end of the computation, it calls the synchronization routine on the forked call. The ow is blocked only if the result of the computation is not already written. The Aldor code for a remote function call looks as follows: jv: JoinVariable; -- jv: join variable result: R; -- location where to write result jv := fork(f, a, result, jv); -- implicit mapping more code join(jv); -- synchronisation -- here the result is written A join variable object can be associated with several forked sub computation (it is close to a counter of forked computation). Note that this example accepts a simple writing. fr := f(a); -- return a futur of type Futur(R)... result:= fr::r -- coercion to a value R The concept described above has the following advantages: 1. The user doesn't have to care on which machine the remote function is executed. 2. On dierent machines dierent schedulers can be implemented without aecting any user's code. This is important because on SMP machines a dierent scheduler has to be implemented than on single processor machines. Of course, the user can explicitly force a function to be executed on a specic machine. In the current implementation, arguments are passed by values and the called function is expected to be side eect free. Schedulers are list based and implemented in a greedy, centralized and work stealing [2] version. 3.2 Partial Evaluated Functions In Aldor types and functions are rst class, that is, both types and functions may be manipulated and constructed in the same way as any other value, ie. functions return new functions or new types. Example: Given the application f(a: Integer)(b: Integer):Integer. The statement g := f(5) assigns the partial evaluated function f to g. To get the nal result g has to be applied with an integer argument. Of course we would like it to support remote partial evaluated functions, e.g. g should be executed on a dierent machine. Because partial evaluated functions are created at runtime (dynamically), the asynchronous function call described in the previous section doesn't work for partial evaluated functions (pef). In order to send pef over the net, closures have to be sent. A closure represents a function and the bindings of the function's free 3

4 variables [8]. In other words a closure consists of compiled code of a function plus its environment (set of name-value pairs). Unfortunately, the current version of Aldor does not give access to the internal closure structure and thus another solution has to be found. it solves the problem as follows: When f(i) is applied, the function will not partial evaluate. Instead, the arguments are stored in the internal structure and a handle is returned. When g(j) is applied, then the function is distributed like an asynchronous function call. On the other machine, the function is rst partially evaluated and then called with the argument j. The result is sent back like in the asynchronous function call. Additionally any evaluation is cached on the processor. If g(j) is applied on a processor where g has not been evaluated already, then we evaluate g := f(i) before g(i). Hence f(i) can be evaluated on many dierent processors and can induce side-eects. 4 Conclusion and future work We have presented a solution for remote function invocation and remote partial evaluated function calls that it supports. First examples have shown good results and we are working on more complex examples to see how it behaves in such cases. We also plan to make compiler or runtime changes in Aldor in order to send closures over the net. The limitation of partial evaluated functions described in the previous would then disappear. We also plan to implement distributed virtual shared memory based on variables. This would save communication time because huge data structures do not have tobesent over the net. 5 Acknowledgment I am grateful to Thierry Gautier for his careful comments on this extended abstract and for the fruitful collaboration during the implementation period of it described in this paper. Iwould like to thank Preda Mihailescu for reading earlier drafts of this paper. References [1] A. D. Birrell and B. J. Nelson. Implementing Remote Procedure Calls. ACM Transactions on Computer Systems, 2(1):39{59, Feb [2] R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An Ecient Multithreaded Runtime System. In 5th ACM SIG- PLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'95), Santa Barbara, California, pages 207{216, Jul [3] Tilmann Bubeck, Martin Hiller, Wolfgang Kuchlin, and Wolfgang Rosenstiel. Distributed Symbolic Computation with DTS. In Afonso Ferreira and Jose Rolim, editors, Proceedings of Parallel Algorithms for Irregularly Structured Problems, LNCS 980. Springer, Sep

5 [4] Burce Char. Progress report on a system for general-purpose parallel symbolic algebraic computation. In International Symposium on Symbolic and Algebraic Computation, ISSAC, [5] Hoon Hong et al. PACLIB User Manual. Research Institute for Symbolic Computation (RISC-Linz), Oct [6] S. Fortune and J. Wyllie. Parallelism in Random Access Machines. In Symposium on Theory of Computing, pages 114{118. ACM, [7] Th. Gautier and J. L. Roch. PAC++ system and parallel algebraic numbers computation. In Hoon Hoong, editor, Parallel Symbolic Computation, volume 5 of Lecture Notes Series on Computing, pages 145{153. World Scientic Publishing Co. Pte. Ldt., Sep [8] S.L. Peyton and D. R. Jones. Implementing Functional Languages. Prentice Hall, Englewood Clis, [9] J. L. Roch and G. Villard. Parallel computer algebra. Jul ISSAC'97 tutorial. [10] Stephen M. Watt, Peter Broadbery, Samuel S. Dooley, Pietro Iglio, Scott C. Morrison, Jonathan M. Steinbach, and Robert S. Sutor. A First Report on the A # Compiler. In International Symposium on Symbolic and Algebraic Computation, ISSAC, pages 25{31. ACM,

another, e.g. (n: Integer, m: IntegerMod(n)). Dependent mappings are functions where the type of the result depends on the value of an argument, e.g.

another, e.g. (n: Integer, m: IntegerMod(n)). Dependent mappings are functions where the type of the result depends on the value of an argument, e.g. 4.1.3 Aldor 265 Aldor is a programming language originally intended to develop compiled libraries for computer algebra. The design of the language was influenced by several factors: It had to be expressive