Höllische Programmiersprachen Hauptseminar im Wintersemester 2014/2015 Determinism and reliability in the context of parallel programming

Höllische Programmiersprachen Hauptseminar im Wintersemester 2014/2015 Determinism and reliability in the context of parallel programming Raphael Arias Technische Universität München 19.1.2015 Abstract Parallel computation is an essential concept in modern programming. As microprocessor clock frequencies have almost ceased to become faster, currently performance improvements are made mostly by parallelizing algorithms or programs to run on multiple cores, processors, or machines. Common parallelization models provide means such as semaphores or monitors to synchronize different parallel threads or processes of the same application, but parallel programming using these models is extremely easy to get wrong, and subtle errors are bound to emerge. In a deterministic-by-construction model of parallel programming, non-determinisms caused by scheduling differences or race conditions are guaranteed to not exist. However, previous models using singleassignment variables limit the variety of programs that can be written. This paper describes a recent approach of multiple-assignment variables that use so-called monotonic writes and threshold reads. 1 Introduction Parallel computation is an essential concept in modern programming. As microprocessor clock frequencies have almost ceased to become faster, currently performance improvements are made mostly by parallelizing algorithms or programs to run on multiple cores, processors, or machines. Common parallelization models provide means such as semaphores or monitors to synchronize different parallel threads or processes of the same application, but parallel programming using these models is extremely easy to get wrong, and subtle errors are bound to emerge. This difficulty provides a strong motivation for deterministic-by-construction models. In these models, valid programs are deterministic and thus guaranteed to not produce race conditions or dead-locks, which are otherwise very common. 1

A common model for deterministic-by-construction parallelism is that of I- variables (IVars) or I-structures, as proposed by Arvind et al. [1]. I-structures are single-assignment variables, that serve to communicate between different parallel processes or threads and have two fundamental states: uninitialized and initialized. While uninitialized, I-structures will block when being read, until a different process fills them with content. Once initialized, they can never be assigned a new value again. While I-structures are doubtlessly useful, they have a rather narrow application range and some algorithms or programs cannot make use of them. For instance, Kuper and Newton present a graph problem [3] that cannot be efficiently solved using I-structures. They introduce a more generalized model of LVars, which are lattice-based data structures. In this model, multiple assignments to the same variable are allowed, as long as the variable changes monotonically with respect to a specific lattice. Throughout the remainder of this paper, LVars will be presented in further detail. For this, some basic concepts will be introduced first. Section 2 will elaborate on I-structures, section 3 defines the concept of a lattice. In section 4, LVars will be examined in more detail. Section 4.3 provides some examples of commonly used data structures and how they fit into the LVar paradigm. Finally, section 5 summarizes the most important points and some conclusions are drawn. 2 A closer look at I-structures As was mentioned above, I-structures or IVars are single-assignment variables, meaning they are assigned a value only once and can thereafter never be changed. When an uninitialized IVar is accessed to be read, it blocks the reading process or thread until another process or thread initializes th IVar. This prevents that different executions of the same program lead to different values being read if processes or threads access the variables in varying order. An I- structure will always have the same value, independent of time or thread/process of access. I-structures are useful to model parallel computations where intermediary results can be shared between threads or processes, but are never changed. A simple example is that of a matrix computation, where each value in the matrix depends on the elements north, northwest and west of it [1]. Arvind et al. call this a wavefront as an analogy to the way the computation progresses over the matrix, as can be seen in fig. 1. The computation instructions are: A[1, j] = 1 A[i, 1] = 1 A[i, j] = A[i 1, j] + A[i 1, j 1] + A[i, j 1] 2

Figure 1: An illustration that shows the wavefront characteristic of the computation [1]. This scenario is perfectly suited for IVars, as each entry in the matrix is computed exactly once and will never need to be changed again. In contrast, each entry will be needed for various computations of the elements east, southeast and south of it. The order in which it is accessed does not matter and if it is accessed for reading before being populated with a value, the read will block. In the following section, a case where such a direct correspondence of IVar concept and algorithm is not possible will be examined. This will serve as a motivation for the generalization to LVars. 2.1 An imperfect scenario for IVars This scenario is presented by Kuper and Newton [3] and shows some limitations of the IVar model. The concrete problem they examine is a graph algorithm for finding connected components in directed graphs. Specifically, they phrase the problem as follows [3]: In a directed graph, find the connected component containing a vertex v, and compute a (possibly expensive) function f over all vertices in that component, making the set of results available asynchronously to other computations. Note that the arguably most important requirement is not specifically mentioned in that problem statement: computation needs to be deterministic. According to the authors most existing parallel approaches to the problem use a nondeterministic traversal of the graph. This does not really negatively influence the outcome of these approaches (that is still deterministic), but it is not an admissible solution when working in a deterministic-by-construction model. It is unclear how I-structures could be used to help solve this problem. They are not useful for accumulating intermediate results or to mark already visited nodes [3]. 3

The problem itself can be solved without too much difficulty, when ignoring the requirement of asynchronously making the set of results accessible to others. When this requirement is taken into account, purely functional (and thus deterministic) attempts to solve the problem don t provide acceptable solutions. LVars present an elegant solution to this problem. Before they are introduced in section 4, some basic terminology and definitions regarding lattices will be presented. 3 Lattices To properly understand the inner workings of LVars it is necessary to examine lattices first. In this section we give a introduction to this concept. There are two ways to define lattices and both of them are used, depending on the context. Lattices can be defined in an algebraic and in a relational way. Both definitions can be proven equivalent [2]. Here, we start with the relational characterization of lattices, as it seems slightly more intuitive. 3.1 A relational characterization of lattices When defining lattices according to their relational characteristics, the relation observed is always a partial order of some sorts. 3.1.1 Definition: Semilattice A join-semilattice is a set with a partial order, such that every subset of elements (in particular any subset of size two) have a least upper bound, or join, also represented by. In contrast, a meet-semilattice is a set with a partial order, such that every subset of elements have a greatest lower bound, or meet, also sometimes represented as. 3.1.2 Definition: Lattice A set with a partial order that makes it both a join- and a meet-semilattice (each subset of elements have both a least upper and greatest lower bound) is called a lattice. 3.2 An algebraic characterization of lattices Here it is assumed that the reader has some preliminary knowledge of algebraic terminology. We again define semilattices first, as lattices are easiest defined along the lines of semilattices. 4

3.2.1 Definition: Semilattice A semilattice is an algebraic structure E, (consisting of a set E and an operation ) in which the binary operation is associative (meaning a (b c) = (a b) c, for all a, b, c E), commutative (a b = b a), for all a, b E), and idempotent (a a = a, for all a E). Depending on the choice of operation and the partial order it induces on the set, one can speak of join- or meet-semilattices. For instance, substituting for, where induces the partial order such that a b a b = b, we obtain a join-semilattice. 3.2.2 Definition: Lattice According to the algebraic definition, a lattice is an algebraic structure E,,, such that E, is a join-semilattice and E, is a meet-semilattice. 3.3 Boundedness of semilattices A semilattice E, is called bounded if E contains the identity element 1 such that a E.a 1 = a. Intuitively this means that a bounded join-semilattice contains a minimal and a bounded meet-semilattice contains a maximal element. This notion is important, since LVars are defined on top of bounded joinsemilattices, as will be discussed in section 4. 3.4 Some intuitive examples In this section, some intuitive examples of lattice-like structures will be presented. 1. Boolean lattice B,,. Note that and here refer to the actual logical operators. Obviously the operations fulfill the (semi)lattice axioms defined in section 3.2: both and are associative, commutative and idempotent. The semilattices are also bounded; Let B = {T, F }; then the join-semilattice B, has a minimal element F such that b B. b F = b. This holds dually for B, and its maximal element T. 2. Power set lattice 2 X,,. 2 X denotes the power set of X and and are set union and intersection, respectively. As above, the axioms hold trivially and the minimal and maximal elements can be easily identified as and X, with respect to the set inclusion as ordering. In this example, the relational definition and the correspondence with the algebraic one 5

shines through quite clearly. For instance, let X = {1, 2, 3}. Then, in 2 X,, the least upper bound with respect to the -relation of the elements {1} and {3} is obtained by taking the union of both. 4 The LVar, a lattice-based data structure So far this paper has introduced the basis for LVars by first examining I- structures in section 2 and then lattices in section 3. In this section LVars will be formally defined and explained. Kuper and Newton define λ LVar, a parallel call-by-value λ-calculus extended by a store and put- and get-operations. This store is shared among threads or processes and can be written to or read from using the put- and get-operations, respectively. As the store only contains LVars, the determinism is preserved. LVars are Kuper and Newton s generalization of IVars, with the important difference that LVars can be overwritten multiple times, as long as the value grows monotonically with respect to a specific ordering, that is, the value written to the LVar and the current value of the LVar have a valid least upper bound. 4.1 LVars and lattices The λ LVar definition actually depends on D, a bounded join-semilattice augmented with a greatest element (recall that bounded join-semilattices are only guaranteed to have a minimal, not a maximal element). The axioms for the semilattice D are defined by the authors as follows: D has a least element, corresponding to the LVar s empty, uninitialized state. D has a greatest element, corresponding to an error state, when conflicting updates to an LVar are made. Conflicting updates means that a value is being written to an initialized LVars which is incompatible with its current value. There exists a partial ordering for D such that d D. d There is a join or least upper bound for every pair of elements in D. Two processes or threads that independently compute an update for an LVar will cause it to contain the least upper bound or join of the two updates. Using the least upper bound guarantees that updates put) to an LVar always have a deterministic outcome, regardless of the order they are performed in. This becomes obvious, if recalling that not only every pair of states has a least upper bound, but every finite subset of states has one. Thus performing all the updates in an arbitrary order will lead to exactly this least upper bound. Now that updates have been discussed, the mechanism for accessing LVars for reading should be examined in more detail. 6

Figure 2: A visualization of some lattices. (a) is the visualization of the lattice corresponding to an LVar that simulates IVar behavior. (b) corresponds to the visualization of pairs of binary-valued IVars. Note the getsnd call (basically a wrapper for get on the second tuple component) will block until the state of the LVar crosses the tripwire (so until the second component is assigned). (c) represents a lattice corresponding to a natural-number-valued LVar [3]. 4.2 Threshold reads LVars use a concept called threshold reads for accessing the value of an LVar. This is probably the only thing that requires some getting used to. Recall that IVars, the single-assignment variables, block when being read, as long as they have not yet been initialized. With LVars, there is a similar blocking mechanism. However, LVars might need to block on different initialized states, as well, depending on the actual content of the LVar and the form of the content the developer is interested in. The developers of LVars solve this using threshold reads. The programmer of the LVar structure specifies a threshold set Q, corresponding to a nonempty, pairwise incompatible (meaning that the least upper bound for any pair of elements in the set is ) subset of D. This set contains thresholds for reading the LVar. Performing a get operation on an LVar will block, as long as the value of the LVar is below (with respect to the lattice) every element in Q. Once it is at or above such an element d, get unblocks and returns that {d }. Note that this means that get always returns an element of the threshold set. 4.3 Commonly used data structures examined in the context of LVars In this section, we will examine some commonly used data structures and point out how they can function as LVars. One of the simplest such structures is a natural-number-valued IVar. Recall that an IVar only allows the assignment of one value, thus any update (put) to another value should be considered conflicting and should result in the error 7

state. The semilattice D must then look as follows: D = {, } N,, where d D. d and d D. d d. It is also possible to phrase pairs of such IVars as LVars. This means that at the beginning both elements or the pair as a whole are uninitialized or. Then, either component can be updated independently. A repeated update of the same component, however, will lead to an error state. The semilattice must look as follows: D = {, } N 2,, where (a, b), (c, d) D. (a, b) (c, d) (a = c b = d) (b = d a = c) The approach for tuples can be generalized to arbitrary-size arrays, and, consequently to arbitrary matrices. Another very intuitive structure are sets. If an update to a set-valued LVar is made, the store will take the least upper bound or join of the current state and the update. Recall from the example in section 3.4 that lattices on sets work well with set union as join operation. The ordering is the usual set inclusion. Thus the semilattice for an LVar of Integer sets looks like this: D = {, } 2 Z,. Note that, other than in the IVar case above, there are no conflicting updates to an LVar of this type. The state is unreachable, as all elements have a least upper bound in 2 Z. 5 Conclusion This paper presented a introduction into deterministic parallel programming. After its importance was motivated, the ideas and features of LVars were discussed. In order to achieve that, some basic concepts of lattices were introduced, and the idea of IVars (of which LVars are a generalization) was presented. Then, some examples were examined, to show the practical applicability of LVars to very common data structures. It is quite clear that LVars are a very interesting concept. They seem to be of more use than IVars or I-structures in general. The monotonic writes are a very useful addition. It remains to be seen, whether LVars will find great acceptance throughout the developer community. That will obviously also depend on whether the λ LVar calculus will be made available to developers in other programming languages. 8

References [1] Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst., 11(4):598 632, October 1989. [2] Rudolf Berghammer. Ordnungen, Verbände und Relationen mit Anwendungen. Springer Vieweg, Wiesbaden, 2012. [3] Lindsey Kuper and Ryan R. Newton. Lvars: Lattice-based data structures for deterministic parallelism. In Proceedings of the 2Nd ACM SIGPLAN Workshop on Functional High-performance Computing, FHPC 13, pages 71 84, New York, NY, USA, 2013. ACM. 9