SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science.

Size: px

Start display at page:

Download "SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science."

Arlene Wilcox
5 years ago
Views:

1 ALEWIFE SYSTEMS MEMO #12 A Synchronization Library for ASIM Beng-Hong Lim (bhlim@masala.lcs.mit.edu) Laboratory for Computer Science Room NE January 9, 1992 Abstract This memo describes the functions in the synchronization library provided for programs written for ASIM and acts as a user's manual. Mul-T provides futures and binary semaphores as primitive synchronization mechanisms. For experimenting with other synchronization constructs, we have extended the language to include J-structures, L-structures, mutualexclusion locks, counting semaphores and barriers. An extension of futures to allow thread placement directives is also provided. 1 Introduction A synchronization library is provided in ASIM for users to experiment with various synchronization mechanisms. This memo assumes knowledge of programming in the ASIM environment. See Alewife Memo 13 for a description of the ASIM environment. The library contains implementations of mutual-exclusion locks, counting semaphores, J- structures, L-structures, and barriers. These supplement the synchronization mechanisms already present in Mul-T on ASIM, viz., futures, and binary semaphores. This memo will briey describe the synchronization mechanisms and the user callable functions associated with each mechanism. These functions are automatically linked with the user program by the ASIM compilation process. The Alewife project is funded in part by NSF Experimental Systems grant # MIP , in part by DARPA contract # N K-0825, in part by a NSF Presidential Young Investigator Award, and in part by LSI Logic and IBM. 1

2 2 Waiting for synchronization Some of the functions described below include a <cost-limit> parameter. This parameter is used to control the method of waiting for failed synchronizations. The following describes how to use the cost-limit parameter. Traditionally, when a synchronization attempt fails, the thread executing the synchronization either spins or blocks. The synchronization mechanisms implemented in the library gives the user control over the waiting method. Besides always spinning and always blocking, the user can make the thread spin for some amount of time before blocking the thread if the synchronization condition has not yet been satised. This is specied via a \spin-cost" threshold. The thread will spin until the cost of spinning is above the \spin-cost" threshold, and then block. Specifying a threshold of 0 yields an \always block" algorithm, while a threshold of mostpositive-xnum will eectively yield a \always spin" algorithm (unless you can aord to wait for ASIM to simulate most-positive-xnum cycles). Specifying a threshold equal to the cost of blocking yields a strongly 2-competitive algorithm, which means that the cost of waiting will be guaranteed to be not more than 2 times the cost of an optimal o-line waiting algorithm. The variable *blocking-ovh* is set to 1000 by default and is used as the default \spin-cost" threshold. One point to be aware of is that it is possible for the cost of an \always block" algorithm or an \always spin" algorithm to be less than twice optimal, so that the 2-competitive algorithm is not guaranteed to be the best alternative. A more detailed description of two-phase waiting strategies can be found in [4]. 3 Mutual Exclusion Locks Mutual exclusion locks can be atomically acquired and released. This can be used to protect access to critical sections of code. (make-lock) { creates and returns a lock object (lock? l) { returns #t if l is a lock object (lock-failed? l) { successful. tries to acquire lock l. Returns #t if it failed to acquire the lock, #f if (spin-lock l) { tries to acquire lock l, using a 2-competitive waiting algorithm with spinning. Context switching is disabled while in the spin phase. (%spin-lock l cost-limit) { tries to acquire lock l, spinning until spinning cost cost-limit. Context switching is disabled while in the spin phase. (sspin-lock l) { tries to acquire lock l, using a 2-competitive waiting algorithm with switchspinning. (%sspin-lock l cost-limit) { tries to acquire lock l, switch spinning until spin cost costlimit. 2

3 (unlock l) { unlocks l, and releases all waiters. 4 FIFO Mutual Exclusion Locks FIFO locks work like mutual exclusion locks except that there is a value associated with the lock. Successful lock attempts lock and return the value of the lock. Failed lock attempts immediately block and queue the thread on a rst-in-rst-out queue. Releasing the lock writes a new value into the lock and also releases the waiter at the head of the FIFO queue, if any. (make-fo-lock) { creates and returns a fo lock with an initial value of 0. (lock-fo-lock l) { tries to acquire lock l. Returns the FIFO lock value when lock is successfully acquired. (release-fo-lock l value) { FIFO queue, if any exist. writes value into the lock and releases the rst waiter in the 5 Binary Semaphores (make-semaphore) { creates and returns a semaphore object. (semaphore? x) { returns #t if x is a semaphore object, #f otherwise. (semaphore-p sem) { wait by switch spinning if semaphore value is 0, set value to 0 and return if semaphore value is 1. (semaphore-v sem) { sets semaphore value to 1. (semaphore-conditional-p sem) { returns #t if semaphore value is 0, returns #f otherwise and sets value to 0. 6 Counting semaphores Counting semaphores are semaphores that can take on values that are nonnegative integers. Although binary semaphore can be used to implement counting semaphores, the implementation provided here takes advantage of the hardware full/empty bits for a more ecient implementation. (make-counting-semaphore initval) { creates and returns a semaphore with value set to initval. (c-semaphore-p c-sem) { decrements semaphore value if > 0 and return, otherwise wait until value is positive, then decrement. (%c-semaphore-p c-sem cost-limit) { specied. like c-semaphore-p but the spin cost-limit can be 3

4 (c-semaphore-v c-sem) { increments semaphore value. (%c-semaphore-v c-sem cost-limit) { specied. like c-semaphore-v but the spin cost-limit can be (fetch-and-add c-sem value) { pre-incremented value. increments semaphore value by <value>, and returns the (%fetch-and-add c-sem value cost-limit) { like fetch-and-add, but the spin cost-limit can be specied. 7 Barriers There are 2 implementations of barriers: a simple barrier, and a software combining tree barrier. In a simple barrier, all threads arriving at the barrier increment the barrier count and wait at a single release ag except for the last arrival. In a tree barrier, the count and release ag is distributed throughout a tree. A description of a combining tree barrier can be found in [6]. The simple barrier implementation is not as scalable as the tree barrier due to potential contention. For barriers with more than 4 participants, the tree barrier performs better if all the participants arrive at approximately the same time. 7.1 Simple Barrier (create-barrier nthreads) { create and return a barrier for <nthreads> threads. (barrier b) { wait at barrier b for all processes to reach it. (%barrier b cost-limit) { like barrier, but cost-limit can be specied. 7.2 Tree Barrier (make-tree-barrier <nthreads> <bf>) { create and return a barrier for <nthreads> threads organized in a combining tree with a maximum branching factor of <bf>. (make-dist-tree-barrier <nthreads> <tpp> <bf>) { create and return a barrier for <nthreads> threads organized in a combining tree with a maximum branching factor of <bf*tpp>. The nodes of the tree are distributed more or less evenly among the processors. (tree-barrier <b> <thread-id> cost-limit) { wait at barrier <b> using <thread-id>. This implies that the user will need to have some scheme for uniquely numbering the threads participating in the tree barrier from 0 to <nthreads>01. 8 J-structures J-structures are one-dimensional vectors with presence bits associated with each element. A newly allocated J-structures has the presence bit turned o for each element. A write to the 4

5 J-structure using iset turns the presence bit on. A reference to a element with the presence bit turned o suspends the referencing task, while a reference with the presence bit turned on acts like a normal vector reference. This provides for synchronization between producers of J-structure values and consumers of J-structure values. A description of J-strucutres can be found in [4]. J-structures can be used to implement I-structures [1]. One major dierence between I- structures and J-structures is that J-structures can be reset and reused. reset-istruct unsets all the presence bits in the J-structure. (make-jstruct <len>) { - Create and return an J-structure of length <len>. (jref js i) { - Reference element i in J-structure js. Wait for value if presence bit is unset using a competitive waiting algorithm. (set (jref js i) val) { - Set the value of element i in J-structure js to value val. Also set the presence bit for that element and release any waiters. (reset-jstruct js) { - Reset the presence bits for each element in J-structure js. Signal an error if there are any waiters waiting on any of the J-structure elements. It is the programmer's responsibility to ensure that there are no longer any waiters on any J-structure element before resetting the J-structure. When waiting for an J-structure element, the maximum spinning cost can be set by using set-max-jref-sspin. (set-max-jref-sspin cycles) { - Switch spin until cost is greater than cycles before blocking on the J-structure element. 9 L-structures L-structures are also one-dimensional vectors with presence bits associated with each element. However the read and write operations on L-structures are dierent. L-structures support 3 operations: a locking read, an unlocking write, and a non-locking read. A locking read waits until a slot is full before emptying the slot and returning the value. An unlocking write writes a value to an empty slot, and sets it to full, releasing any waiters. A non-locking read returns the value found in a slot if full; otherwise it returns an invalid value. An L-structure therefore allows mutually exclusive access to each of its slots. The synchronizing L-structure reads and writes can be used to implement M-structures [2]. However, L-structures are dierent from M-structures in that they allow multiple non-locking readers. (make-lstruct <len>) { - Create and return an L-structure of length <len>. (lref ls i) { - Read and lock element i in L-structure ls. Wait if presence bit is unset using a competitive waiting algorithm. (lpeek ls i) { - Read element i in L-structure ls. If presence bit is set, return the value read, otherwise return an invalid lpeek value. 5

6 (invalid-lpeek? value) { - Returns #t if value is an invalid value returned by lpeek, #f otherwise. (lset ls i val) { - Set the value of element i in L-structure ls to value val. Also set the presence bit for that element and release any waiters. When waiting for an L-structure element, the maximum spinning cost can be set by using set-max-lref-sspin. (set-max-lref-sspin cycles) { - Switch spin until cost is greater than cycles before blocking on the L-structure element. 10 Futures A description of futures can be found in [3]. Although Mul-T already provides futures as a thread spawning and synchronizing primitive, we extended the Alewife environment to provide explicit placement directives for futures. (future-on <pnum> <body>) { - Enqueue a new task on processor <pnum> to be executed by that processor. Using a value of nil for <pnum> causes the future to be spawned locally. When waiting for an unresolved future, the maximum spinning cost can be set by using set-max-future-sspin. (set-max-future-sspin cycles) { - Switch spin until cost is greater than cycles before blocking on the future. References [1] Arvind, R. S. Nikhil, and K. K. Pingali. I-Structures: Data Structures for Parallel Computing. In Proceedings of the Workshop on Graph Reduction, (Springer-Verlag Lecture Notes in Computer Science 279), September/October [2] Paul S. Barth, Rishiyur S. Nikhil, and Arvind. M-structures: Extending a parallel, non-strict, functional language with state. In Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture, August [3] David A. Kranz, R. Halstead, and E. Mohr. Mul-T: A High-Performance Parallel Lisp. In Proceedings of SIGPLAN '89, Symposium on Programming Languages Design and Implementation, June [4] Beng-Hong Lim and Anant Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. Technical report, MIT VLSI Memo , February

7 [5] Eric Mohr, David A. Kranz, and Robert H. Halstead. Lazy task creation: A technique for increasing the granularity of parallel programs. In Proceedings of Symposium on Lisp and Functional Programming, June [6] Pen-Chung Yew, Nian-Feng Tzeng, and Duncan H. Lawrie. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Transactions on Computers, C-36(4):388{395, April

Low-Cost Support for Fine-Grain Synchronization in. David Kranz, Beng-Hong Lim, Donald Yeung and Anant Agarwal. Massachusetts Institute of Technology

Low-Cost Support for Fine-Grain Synchronization in Multiprocessors David Kranz, Beng-Hong Lim, Donald Yeung and Anant Agarwal Laboratory for Computer Science Massachusetts Institute of Technology Cambridge,