A Polylogarithmic Concurrent Data Structures from Monotone Circuits

Size: px
Start display at page:

Download "A Polylogarithmic Concurrent Data Structures from Monotone Circuits"

Transcription

1 A Polylogarithmic Concurrent Data Structures from Monotone Circuits JAMES ASPNES, Yale University HAGIT ATTIYA, Technion KEREN CENSOR-HILLEL, MIT The paper presents constructions of useful concurrent data structures, including max registers and counters, with step complexity that is sublinear in the number of processes, n. This result avoids a well-known lower bound by having step complexity that is polylogarithmic in the number of values the object can take or the number of operations applied to it. The key step in these implementations is a method for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an m-valued max register is constructed from one-bit multi-writer multi-reader registers at a cost of at most log m atomic register operations per write or read. An unbounded max register is constructed with cost O(min(log v, n)) to read or write a value v. Max registers are used to transform any monotone circuit into a wait-free concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. One application is a simple, linearizable, waitfree counter with a cost of O(min(log n log v, n)) to perform an increment and O(min(log v, n)) to perform a read, where v is the current value of the counter. For polynomially-many increments, this becomes O(log 2 n), an exponential improvement on the best previously known upper bounds of O(n) for exact counting and O(n 4/5+ɛ ) for approximate counting. Finally, it is shown that the upper bounds are almost optimal. It is shown that for deterministic implementations, even if they are only required to satisfy solo-termination, min( log m, n 1) is a lower bound on the worst-case complexity for an m-valued bounded max register, which is exactly equal to the upper bound for m 2 n 1, and min(n 1, log m log ( log m + k)) is a lower bound for the read operation of an m-valued k-additive-accurate counter, which is a bounded counter in which a read operation is allowed to return a value within an additive error of ±k of the number of increment operations linearized before it. Furthermore, even in a solo-terminating randomized implementation of an n-valued max register with an oblivious adversary and global coins, there exist simple schedules in which, with high probability, the worstcase step complexity of a read operation is Ω(log n/ log log n) if the write operations have polylogarithmic step complexity. Categories and Subject Descriptors: D.1.3 [Software]: Programming Techniques Concurrent programming; E.1 [Data]: Data Structures; F.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity Nonnumerical Algorithms and Problems General Terms: Algorithms, Theory Additional Key Words and Phrases: distributed computing, shared memory, max registers, counters, monotone circuits A preliminary version of this paper appeared in Proceedings of the 28th Annual ACM Symposium on Principles of Distributed Computing, pages 36-45, James Aspnes is partially supported by NSF grant CNS Hagit Attiya is partially supported by the Israel Science Foundation (grant number 953/06). Keren Censor-Hillel is supported by the Simons Postdoctoral Fellows Program. Part of this author s work was done as a Ph.D. student at the Department of Computer Science, Technion, and supported in part by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY USA, fax +1 (212) , or permissions@acm.org. c YYYY ACM /YYYY/01-ARTA $10.00 DOI /

2 A:2 James Aspnes et al. ACM Reference Format: J. ACM V, N, Article A (January YYYY), 25 pages. DOI = / INTRODUCTION A critical aspect of programming contemporary multiprocessing systems is the implementation of concurrent data structures, e.g., getting the largest value stored in a data structure, or counting. It is important to find methods for building efficient concurrent data structures in shared-memory systems, where n processes communicate by reading and writing to shared multi-writer multi-reader registers. One successful approach to building concurrent data structures is to employ the atomic snapshot abstraction [Afek et al. 1993; Anderson 1993; Aspnes and Herlihy 1990]. An atomic snapshot object is composed of components, each of which typically updated by a different processes; the components can be atomically read. By applying a specific function to the components read, we can provide a specific data structure. For example, to obtain a max register, supporting a write operation and a ReadMax operation that returns the largest value previously written, the function returns the component with the maximum value; to obtain a counter, supporting an increment operation and a ReadCounter operation, the function adds up the contribution from each process. These constructions take a linear (in n) number of steps, due to the cost of implementing atomic snapshots [Inoue and Chen 1994]. Indeed, Jayanti et al. [2000] show that operations must take Ω(n) space and Ω(n) steps in the worst case, for many common data structures, including max registers and counters. This seems to indicate that we cannot do better than snapshots. However, careful inspection of Jayanti, Tan, and Toueg s lower bound proof reveals that it holds only when there are numerous operations on the data structure. Thus, it does not rule out the possibility of having sub-linear algorithms when the number of operations is bounded, or, more generally, the existence of algorithms whose complexity depends on the number of operations. Such data structures are useful for many applications, either because they have a limited life-time, or because several instances of the data structure can be used. Our Contributions. In this paper, we present polylogarithmic implementations of key data structures that support only bounded values. The cornerstone of our constructions, and our first example, is an implementation of a max register that beats the Ω(n) lower bound of Jayanti et al. [2000] when log m is o(n). 1 If the number of values is bounded by m, its cost per operation is O(log m); for an unbounded set of values, the cost is O(min(log v, n)), where v is the value of the register. To implement a counter, instead of simply summing the individual process contributions, as in a snapshot-based implementation of a counter, we can use a tree of max registers to compute this sum: take an O(log n) depth tree of two-input adders, where the output of each adder is a max register. To increment, walk up the tree recomputing all values on path. The cost of a read operation is O(min(log v, n)), where v is the current value of counter, and the cost of an increment operation is O(min(log n log v, n)). When the number of increments is polynomial, this has O(log 2 n) cost, which is an exponential improvement from the trivial upper bound of O(n) using snapshots. The resulting counter is wait-free and linearizable. More generally, we show how a max register can be used to transform any monotone circuit into a wait-free concurrent data structure that provides write operations set- 1 Throughout the paper we use log to denote log 2.

3 Polylogarithmic Concurrent Data Structures A:3 ting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. Monotone circuits expose the parallelism inherent in the dependency of the data structure s values on the arguments to the operations. Formally, a monotone circuit computes a function over some finite alphabet of size m, which is assumed to be totally ordered. The circuit is represented by a directed acyclic graph where each node corresponds to a gate that computes a function of the outputs of its predecessors. Nodes with in-degree zero are input nodes; nodes with out-degree zero are output nodes. Each gate g, with k inputs, computes some monotone function f g of its inputs. Monotonicity means that if x i y i for all i, then f g (x 1,..., x k ) f g (y 1,..., y k ). The cost of writing a new value to an input to the circuit is bounded by O(Sd min( log m, n), where m is the size of the alphabet for the circuit, d is the number of inputs to each gate, and S is the number of gates whose value changes as the result of the write. The cost of reading the output value is min( log m, O(n)). While the resulting data structure is not linearizable in general, it satisfies a weaker but natural consistency condition, called monotone consistency, which we will show is still useful for many applications. So far we have only described upper bounds. We show a lower bound of min( log m, n 1) steps for the read operation of any solo-terminating deterministic implementation of a max register when m n. This implies that our implementation is optimal. A bound of min(n 1, log m log ( log m + k)) steps is shown to hold for the read operation of any solo-terminating deterministic implementation of an m-valued k-additive-accurate counter, which is a bounded counter in which a read operation is allowed to return a value within an additive error of ±k of the number of increment operations linearized before it. Furthermore, we show a lower bound for randomized implementations of a max register which exhibits a tradeoff between the number of steps required for read and write operations. It is shown that in any randomized implementation of an n-valued max register, there exist simple schedules containing n 1 partial write operations and one read operation in which, with probability 1 o(1), one of the following holds: (1) The write operation with maximum value takes more than w steps. (2) The read operation returns an incorrect value. (3) The read operation takes Ω(log n/(log w + log log n)) steps. This applies for an oblivious adversary, which determines the entire schedule in advance, even when requiring only solo-termination, and allowing global coins, which are random values shared by all processes (as opposed to local coins). In particular, this tradeoff shows an Ω(log n/ log log n) lower bound on the worst-case step complexity of read operations for any randomized max register whose write operations have polylogarithmic step complexity. Related Work. Previously, the linear lower bound of Jayanti et al. [2000] motivated, e.g., Aspnes and Censor [2009] to switch to approximate counting, but even this implementation has O(n 4/5+ɛ ) cost. Our construction improves this for the case of bounded counters. Regarding snapshot-based implementations, another approach that improves upon the snapshot-based implementation of many data structures, is to use f-arrays, as proposed by Jayanti [2002]. An f-array is a data structure that supports computation of a function f over the components of an array. Instead of having a process take a snapshot of the array and then locally apply f to the result, Jayanti implements an f-array by having the write operations update a predetermined location according to the new value of f, which requires a read operation to only read that location. This

4 A:4 James Aspnes et al. construction is then extended to a tree algorithm. For implementing an f-array of m registers, where f can be any common aggregate function, including the maximum value or the sum of values, this reduces the number of steps required to O(log m) for a write operation, while a read operation takes O(1) steps. These implementations use base objects that are stronger than read / write registers (specifically, LL/SC), while we restrict our base objects to multi-writer multi-reader registers. Hence, we do not use this approach in this paper. Model. We assume the standard model of an asynchronous shared-memory system (cf. [Attiya and Welch 2004]), which consists of a set of n processes P = {p 0,..., p n 1 }. Processes communicate by reading and writing to shared multi-writer multi-reader registers. Each step consists of some local computation and one shared memory event, which is either a read or a write to some register. The schedule according to which processes take steps is determined by an adversary. The system is asynchronous, which implies that there are no timing assumptions, and specifically no bounds on the time between two steps of a process, or between steps of different processes. We are interested in wait-free implementations, in which any operation on the data structure terminates within a finite number of its steps regardless of the schedule chosen by the adversary (even if all other processes stop taking steps). The cost of an implementation is measured by the maximum number of steps required for any operation on the data structure. Our lower bounds hold even for the weaker solo-termination condition, in which an operation is only required to terminate within a finite number of its steps if there are no concurrent steps by operations of other processes. An implementation should also satisfy some consistency condition. The consistency condition should specify the semantic requirements of all possible executions, including possibly both complete operations, which start and terminate during the execution, and incomplete operations, which begin during the execution but still have low-level shared memory accesses pending. While the sequential semantics of a data structure has to address only non-concurrent operations, our data structures are concurrent, which means that during an execution some operations may overlap, in which case their shared-memory accesses are interleaved. Operations whose shared memory accesses are not interleaved are called non-overlapping. Since a consistency condition has to specify the semantic requirements for all possible concurrent executions, it also has to address the possibility of overlapping operations. Some of our implementations are linearizable [Herlihy and Wing 1990]. This means that for any execution, there is a sequence that contains all the completed operations, as well as some of the incomplete ones, that (1) extends the order of non-overlapping operations; and (2) preserves the sequential semantics of the implemented object. Other implementations are not linearizable, but satisfy what we call monotone consistency, a weaker yet natural consistency condition defined formally in Section 4. In a randomized implementation, a process is allowed to flip coins as part of its local computation, and choose its next step according to their results. We require an operation of a randomized implementation of a data structure to be correct with high probability, i.e., with probability 1 o(1). The step complexity of an operation in a randomized implementation is an upper bound on the number of steps required for that operation with high probability. This implies that in a randomized implementation with probability 1 o(1) every operation returns the correct value within its step complexity.

5 Polylogarithmic Concurrent Data Structures A:5 2. MAX REGISTERS Our basic data structure is a max register, which is an object r that supports a WriteMax(r, t) operation with an argument t that records the value t in r, and a ReadMax(r) operation returning the maximum value written to the object r. A max register may be either bounded or unbounded. For a bounded max register, we assume that the values it stores are in the range {0,..., (m 1)}, where m is the size of the register. We assume that any non-negative integer can be stored in an unbounded max register. In general, we will be interested in unbounded max registers, but will consider bounded max registers in some of our constructions and lower bounds. One way to implement max registers is by using snapshots. Given a linear-time snapshot protocol (e.g., [Inoue and Chen 1994]), a WriteMax operation for process p i updates location a[i], while a ReadMax operation takes a snapshot of all locations and returns the maximum value. Assuming no bounds on the size of snapshot array elements, this gives an implementation of an unbounded max register with linear cost (in the number of processes n) for both WriteMax and ReadMax. We show below how to build more efficient max registers: a recursive construction that gives costs logarithmic in the size of the register for both WriteMax and ReadMax. We also describe (in Section 6.4) a non-linearizable, Monte Carlo implementation with only one atomic register read per ReadMax, but the cost of WriteMax is drastically increased; while impractical, it is useful for illustrating the limitations of our lower bounds. In the remainder of this section we show how to construct a max register recursively from a tree of increasingly large max registers. Each node in this tree is a one-bit register, that directs the reader left or right depend on its value. (See Figure 1.) Intuitively, we can think of a read of the max register as following a path through this tree, constructing the sequence of bits found in the registers along the path and then treating this sequence as an element of a prefix-free code that can be decoded to obtain the actual value. A write operation similarly walks down along the path determined by the encoding of its input, testing to see if a larger value has already been written. If it finds one (because it sees a 1 in a register where its bit-vector would put a 0), it exits without changing any of the bits in the tree; this mechanism prevents outof-date writes from entering non-linearizable values that would otherwise be observed by out-of-date reads. But if the writer successfully reaches a leaf of the tree without seeing evidence of a larger value, it sets the 1 bits on its path needed to cause a reader to follow it in bottom-up order. This ordering ensures that no reader is diverted to the new path until it has been completely marked. For convenience of both implementation and the correctness proof, it is easiest to describe this process recursively, where we think of a max register as built from a onebit atomic register switching between two smaller max registers. The smallest such registers are trivial MaxReg 0 objects, which are max registers r with size m = 1 that support only the value 0. The implementation of MaxReg 0 requires zero space and zero step complexity: WriteMax(r, 0) is a no-op, and ReadMax(r) always returns 0. To get larger max registers, we combine smaller ones recursively (see Figure 1). The base objects will consist of at most one snapshot-based max register as described earlier (used to limit the depth of the tree in the unbounded construction) and a large number of trivial MaxReg 0 objects and read/write registers. A recursive MaxReg object has three components: two MaxReg objects called r.left and r.right, where r.left is a bounded max register of size m, and one 1-bit multi-writer register called r.switch. The resulting object is a max register whose size is the sum of the sizes of r.left and r.right, or unbounded if r.right is unbounded. Pseudocode for the max-register implementation is given in Algorithm 1. Writing a value t to r is done using the WriteMax(r, t) procedure. For values t less than m,

6 A:6 James Aspnes et al. left switch right... left switch right MaxReg 0 MaxReg 0 Fig. 1. Implementing a max register. a process first checks that r.switch is off, and only then writes t to r.left. For larger values, the process writes t m to r.right and then sets r.switch. Reading the maximal value is done using the ReadMax(r) procedure. Here the process uses r.switch to choose whether to read r.left or r.right; if it reads r.right, it adds m to the result. shared data: switch: a 1-bit multi-writer register, initially 0 left, a MaxReg object of size m, initially 0, right, a MaxReg object of arbitrary size, initially 0 1 procedure WriteMax(r, t) 2 begin 3 if t < m then 4 if r.switch = 0 then 5 WriteMax(r.left, t) 6 end 7 else 8 WriteMax(r.right, t m) 9 r.switch 1 10 end 11 end 12 procedure ReadMax(r) 13 begin 14 if r.switch = 0 then 15 return ReadMax(r.left) 16 else 17 return ReadMax(r.right) + m 18 end 19 end Algorithm 1: Max register implementation An important property of this implementation is that it preserves linearizability, as shown in the following lemma.

7 Polylogarithmic Concurrent Data Structures A:7 LEMMA 2.1. If r.left and r.right are linearizable max registers, so is r. PROOF. We assume that each of the MaxReg objects r.left and r.right is linearizable. Thus, we can associate each operation on them with one linearization point and treat these operations as atomic. In addition, we can associate each read or write to the register r.switch with a single linearization point since it is atomic. We now consider a schedule of ReadMax(r) and WriteMax(r, t) operations. These consist of reads and writes to r.switch and of ReadMax and WriteMax operations on r.left and r.right. We divide the operations on r into three categories: C left : ReadMax(r) operations that read 0 from r.switch, and WriteMax(r, t) operations with t < m that read 0 from r.switch. C right : ReadMax(r) operations that read 1 from r.switch, and WriteMax(r, t) operations with t m (i.e., that write 1 to r.switch). C switch : WriteMax(r, t) operations with t < m that read 1 from r.switch. Inspection of the code shows that each completed operation on r falls into exactly one of these categories. Notice that an operation is in C left if and only if it invokes an operation on r.left, an operation is in C right if and only if it invokes an operation on r.right, and an operation is in C switch if and only if it invokes no operation on r.left or r.right. We order the operations by the following four rules: (1) We order all operations of C left before those of C right. (2) An operation op in C switch is ordered at the latest point possible before any operation op that starts after op finishes. If two operations are ordered in the same point, then they appear according to the order they are encountered. (3) Within C left we order the operations according to the order at which they access r.left, i.e., by the order of their respective linearization points in accessing r.left. (4) Within C right we order the operations according to the order at which they access r.right, i.e., by the order of their respective linearization points in accessing r.right). It is easy to verify that these rules are well-defined. We first prove that these rules preserve the execution order of non-overlapping operations. For two operations in the same category this is clearly implied by rules 2 4. Since rule 1 shows that two operations from C left and C right are also properly ordered because an operation that starts after an operation in C right finishes cannot be in C left, it is left to consider the case that one operation is in C switch and the other is either in C left or in C right. In this case, rule 2 implies that their order preserves the execution order. We now prove that this order satisfies the specification of a max register, i.e., if a ReadMax(r) operation op returns t then t is the largest value written by operations on r of type WriteMax that are ordered before op. This requires showing that there is a WriteMax(r, t) operation op w ordered before op, and that there is no WriteMax(r, t ) operation op w with t > t ordered before op. This is obtained by using the linearizability of the components. If op returns a value t < m (i.e., it is in C left ) then this is the value that is returned from its invocation op of ReadMax(r.left). By the linearizability of r.left, there is a WriteMax(r.left, t) operation op w ordered before op in the linearization of r.left. By rule 3, this implies that the WriteMax(r, t) operation op w which invoked op w is ordered before op. A similar argument for r.right applies if op returns a value t m. To prove that no operation of type WriteMax with a larger value is ordered before op, we assume, towards a contradiction, that there is a WriteMax(r, t ) operation op w with t > t that is ordered before op. If op returns a value t < m (i.e., it is in C left ) then op w cannot be in C right, otherwise it would be ordered after op, by rule 1. Moreover, op w cannot be in C switch, since rule 2 implies that op starts after op w finishes and

8 A:8 James Aspnes et al. hence must also read 1 from r.switch which is a contradiction to op C left. Therefore, op w C left, but this contradicts the linearizability of r.left. If op returns a value t m (i.e., it is in C right ) then op w cannot be in C left because t > t. Moreover, op w cannot be in C switch, since t > t m. Therefore, op w is in C right, which contradicts the linearizability of r.right. Using Lemma 2.1, we can build a max register whose structure corresponds to an arbitrary binary tree, where each internal node of the tree is represented by a recursive max register and each leaf is a MaxReg 0, or, for the rightmost leaf, a MaxReg 0 or snapshot-based MaxReg depending on whether we want a bounded or an unbounded max register. There are several natural choices, as we will discuss next Using a balanced binary tree To construct a bounded max register of size 2 k, we use a balanced binary tree. Let MaxReg k be a recursive max register built from two MaxReg k 1 objects, with MaxReg 0 being the trivial max register defined previously. Then MaxReg k has size 2 k for all k. It is linearizable by induction on k, using Lemma 2.1 for the induction step. We can also easily compute an exact upper bound on the cost of ReadMax and WriteMax on a MaxReg k object. For k = 0, both ReadMax and WriteMax perform no operations. For larger k, each ReadMax operation performs one register read and then recurses to perform a single ReadMax operation on a MaxReg k 1 object, while each WriteMax performs either a register read or a register write plus at most one recursive call to WriteMax. Thus: THEOREM 2.2. A MaxReg k object implements a linearizable max register for which every ReadMax operation requires exactly k register reads, and every WriteMax operation requires at most k register operations. In terms of the size of the max register, operations on a max register that supports m values, where 2 k 1 < m 2 k values, each take at most log m steps. Note that this cost does not depend on the number of processes n; indeed, it is not hard to see that this implementation works even with infinitely many processes Using an unbalanced binary tree In order to implement max registers that support unbounded values, we use unbalanced binary trees. Bentley and Yao [1976] provide several constructions of unbalanced binary trees with the property that the i-th leaf sits at depth O(log i). The simplest of these, called B 1, constructs the tree by encoding each positive integer using a modified version of a classic variable-length code known as the Elias delta code [Elias 1975]. In this code, each positive integer N = 2 k + l with 0 l < 2 k is represented by the bit sequence δ(n) = 1 k 0β(l), where β(l) is the k-bit binary expansion of l. The first few such encodings are 0, 100, 101, 11000, 11001, 11010, 11011, ,.... If we interpret a leading 0 bit as a direction to the left subtree and a leading 1 bit as a direction to the right subtree, this gives a binary tree that consists of an infinitely long rightmost path (corresponding to the increasingly long prefixes 1 k ), where the i-th node in this path (i = 1, 2,...) has a left subtree that is a balanced binary tree with 2 i 1 leaves. (A similar construction is used by Attiya and Fouren [2001].) Let us consider what happens if we build a max register using the B 1 tree (see Figure 2). A ReadMax operation that reads the value v will follow the path corresponding to δ(v + 1), and in fact will read precisely this sequence of bits from the switch registers in each recursive max register along the path. This gives a cost to read value v that is

9 Polylogarithmic Concurrent Data Structures A:9 left switch right switch MaxReg 0 left right switch left right... MaxReg 0 MaxReg 0 snapshot-based max register Fig. 2. An unbalanced max register. equal to δ(v + 1) = 2 log(v + 1) + 1. Similarly, the cost of WriteMax(v) will be at most 2 log(v + 1) + 1. Both of these costs are unbounded for unbounded values of v. For ReadMax operations, there is an additional complication: repeated concurrent WriteMax operations might set each switch just before the ReadMax reaches it, preventing the ReadMax from terminating. Another complication is in proving linearizability, as the induction does not terminate without trickery like truncating the structure just below the last node actually used by any completed operation. For these reasons, we prefer to backstop the tree with a single snapshot-based max register that replaces the entire subtree at position 1 n, where n is the number of processes. Using this construction, we have: THEOREM 2.3. There is a linearizable implementation of MaxReg for which every ReadMax operation that returns value v requires min(2 log(v + 1) + 1, O(n)) register reads, and every WriteMax operation requires at most min(2 log(v + 1) + 1, O(n)) register operations. If constant factors are important, the multiplicative factor of 2 can be reduced to 1 + o(1) by using a more sophisticated unbalanced tree; the interested reader should consult [Bentley and Yao 1976] for examples. Note that the infinite-tree construction does give an obstruction-free algorithm, since any operation does terminate when running alone. 3. COUNTERS An immediate application of max registers is a construction of a linearizable m-valued counter for n processes with step complexity O(min(log n log m, n)) for increment operations and O(min(log m, n)) for read operations. This is a special case of a general construction of a large class of monotone data structures based on monotone circuits that we present in Section 4, but we describe the construction first in its own terms without depending on the general construction. The complexity bounds are an exponential improvement on the best previously known upper bound of O(n) for exact counting, and on the bound O(n 4/5+ɛ ((1/δ) log n) O(1/ɛ) ), where ɛ is a small constant parameter, for approximate counting which is δ-accurate [Aspnes and Censor 2009].

10 A:10 James Aspnes et al. Pseudocode for the counter is given as Algorithm 2. The counter is structured as a binary tree of max registers. Each max register has an index, which is a bit vector x 1 x 2... x k, where k is the depth of the register in the tree. At the top of the tree is a root max register whose index is the empty sequence. With n processes, the leaves have depth log n, and each process is assigned to the leaf whose index x 1 x 2... x log n corresponds to the id of the process, expressed in binary. Each leaf register records the number of increments done by the corresponding process. Internal nodes of the tree record the total number of increments done by all processes whose leaves lie below the node. These values are, of course, not updated immediately; instead, when a process increments its leaf register, it takes responsibility for propagating the new value up the path to the root. It does so by reading both children of each node along the path, and writing their sum to the node itself, before proceeding to the next node and repeating the process. A read operation is carried out by simply reading the root node. shared data: MaxReg objects r[x 1... x k ] for each bit vector x 1... x k with 0 k log n, all initially 0. 1 procedure CounterIncrement() 2 begin 3 let x 1... x log n represent the current process s identity. 4 r[x 1... x log n ] r[x 1... x log n ] for k log n 1 down to 0 do 6 v 0 r[x 1... x k 0] 7 v 1 r[x 1... x k 1] 8 r[x 1... x k ] v 0 + v 1 9 end 10 end 11 procedure ReadCounter() 12 begin 13 return r[ ] 14 end Algorithm 2: Counter implementation To prove linearizability, we treat each subtree as a counter object in its own right. We can then consider some counter C rooted at some position x 1... x k as built from a max register r[x 1... x k ] together with two counters C 0 and C 1, corresponding to the subtrees rooted at x 1... x k 0 and x 1... x k 1, respectively. We can think of the execution of CounterIncrement up through the first log n k iterations of the loop as carrying out an increment operation on one of C 0 or C 1, followed by reading both counters and writing the sum of the values read to r[x 1... x k ]. We will show by induction on the height of r[x 1... x k ] in the tree that these internal increment and read operations are linearizable; when we reach the root register r[ ], this will show that the full counter is linearizable. The base of this induction is the leaf register r[x 1... r log n ], which is trivially linearizable as only one process ever updates this max register, letting us assign as linearization point for increments the corresponding WriteMax and for reads the corresponding ReadMax. At higher levels of the tree, we have the following lemma: LEMMA 3.1. If C 0 and C 1 are linearizable unit-increment counters, then so is C.

11 Polylogarithmic Concurrent Data Structures A:11 PROOF. Each increment operation of C is associated with a value equal to C 0 + C 1 when it increments C 0 or C 1, treating C 0 and C 1 as atomic counters according to the induction hypothesis. An increment operation with an associated value v is linearized when a value v v is first written to the output max register r[x 1... x k ]. A read operation is linearized at the point it reads the output max register r[x 1... x k ] (which we consider to be atomic). To see that the linearization point for increment v occurs within the interval of the operation, observe that no increment can write a value v v to r[x 1... x k ] before increment v finishes incrementing the relevant sub-counter C 0 or C 1, because before then C 0 + C 1 < v. Moreover, the increment v cannot finish before some v v is first written to r[x 1... x k ], because v itself writes a value v v before it finishes. Since the read operations are also linearized within their execution interval, this order is consistent with the order of non-overlapping operations. This gives a valid sequential execution, since we now have exactly one increment operation associated with every integer up to any value read from C, and there are exactly v increment operations ordered before a read operation that returns v. THEOREM 3.2. There is an implementation of a linearizable m-valued unitincrement counter of n processes where a read operation takes O(min(log m, n)) low-level register operations and an increment operation takes O(min(log n log m, n)) low-level register operations. PROOF. Linearizability follows from the preceding argument. For the complexity, observe that the read operation has the same cost as ReadMax, while an increment operation requires reading and updating O(1) max registers per iteration at a cost of O(min(log m, 2 i )) for the i-th iteration. The full cost of a write is obtained by summing this quantity as i goes from 0 from log n. Note that for a polynomial number of increments, an increment takes O(log 2 n) steps. It is also possible to use unbounded max registers, in which case the value m in the cost of a read or increment is replaced by the current value of the counter. 4. MONOTONE CIRCUITS In this section, we show how a max register can be used to construct more sophisticated data structures from arbitrary monotone circuits. The cost of operations on the data structure will be a function of the number of gates whose values need to be updated when an input changes and the size of the alphabet. For each monotone circuit, we can construct a corresponding monotone data structure. This data structure supports operations WriteInput and ReadOutput, where each WriteInput operation updates the value of one of the inputs to the circuit and each ReadOutput operation returns the value of one of the outputs. Like the circuit as a whole, the effects of WriteInput operations are monotone: attempts to set an input to a value less than or equal to its current value have no effect. The resulting data structure always provides monotone consistency, which is generally weaker than linearizability: Definition 4.1. A monotone data structure is monotone consistent if the following properties hold in any execution: (1) For each output, there is a total ordering < on all ReadOutput operations for it, such that if some operation R 1 finishes before some other operation R 2 starts, then R 1 < R 2, and if R 1 < R 2, then the value returned by R 1 is less than or equal to the value returned by R 2.

12 A:12 James Aspnes et al. (2) The value v returned by any ReadOutput operation satisfies f(x 1,..., x k ) v, where each x i is the largest value written to input i by a WriteInput operation that completes before the ReadOutput operation starts or the initial value of input i if no such operation exists. (3) The value v returned by any ReadOutput operation satisfies v f(y 1,..., y k ), where each y i is the largest value written to input i by a WriteInput operation that starts before the ReadOutput operation completes or the initial value of input i if no such operation exists. The intuition here is that the values at each output appear to be non-decreasing over time (the first condition), all completed WriteInput operations are always observed by ReadOutput (the second condition), and no spurious larger values are observed by ReadOutput (the third condition). But when operations are concurrent, it may be that some ReadOutput operations return intermediate values that are not consistent with any fixed ordering of WriteInput operations, violating linearizability (an example is given in Subsection 5.1). We convert a monotone circuit to a monotone data structure by assigning a max register to each input and each gate output in the circuit. We assume that these max registers are initialized to a default minimum value, so that the initial state of the data structure will be consistent with the circuit. A WriteInput operation on this data structure updates an input (using WriteMax) and propagates the resulting changes through the circuit as described in Procedure WriteInput. A ReadOutput operation reads the value of some output node, by performing a ReadMax operation on the corresponding output. The cost of a ReadOutput operation is the same as that of a ReadMax operation: O(min(log m, n)). The cost of WriteInput operation depends on the structure of the circuit: in the worst case, it is O(Sd min(log m, n)), where S is the number of gates reachable from the input and d is the maximum number of inputs to each gate. 1 procedure UpdateGate(g) 2 begin 3 Let x 1,..., x d be the inputs to g. 4 for i 1 to d do 5 y i ReadMax(x i ) 6 end 7 WriteMax(g, f g (y 1,..., y d )) 8 end 9 procedure WriteInput(g, v) 10 begin 11 WriteMax(g, v) 12 Let g 1,... g S be a topological sort of all gates reachable from g. 13 for i 1 to S do 14 UpdateGate(g i ) 15 end 16 end 17 procedure ReadOutput(g) 18 begin 19 return ReadMax(g) 20 end Algorithm 3: Monotone circuit implementation.

13 Polylogarithmic Concurrent Data Structures A:13 THEOREM 4.2. For any fixed monotone circuit C, the WriteInput and ReadOutput operations based on that circuit are monotone consistent. PROOF. Consider some execution of a collection of WriteInput and ReadOutput operations. We think of this execution as consisting of a sequence of atomic WriteMax and ReadMax operations and use time to refer to points in the execution. The first clause in Definition 4.1 follows immediately from the linearizability of max registers, since we can just order ReadOutput operations by the order of their internal ReadMax operations. For the remaining two clauses, we will jump ahead to the third, upper-bound, clause first. The proof is slightly simpler than the proof for the lower bound, and it allows us to develop tools that we will use for the proof of the second clause. For each input x i, let Vi t be the maximum value written to the register representing x i at or before time t or its initial value if no such write exists. For any gate g, let C g (x 1,..., x n ) be the function giving the output of g when the original circuit C is applied to x 1,..., x n (see Figure 3). For simplicity, we allow C in this case to include internal gates, output gates, and the registers representing inputs (which we can think of as zero-input gates). We thus can define C g recursively by C g (x 1,..., x n ) = x i when g = x i is an input gate and C g (x 1,..., x n ) = f g (C gi1 (x 1,..., x n ),..., C gik (x 1,..., x n )) when g is an internal or output gate with inputs g i1... g ik. Let g t be the actual output of g in our execution at time t, i.e., the contents of the max register representing the output of g. We claim that for all g and t, g t C g (V1 t,..., Vn). t The proof is by induction on t and the structure of C. In the initial state, all max registers are at their default minimum value and the induction hypothesis holds. Suppose now that some max register g changes its value at time t. If this max register represents an input, the new value corresponds to some input supplied directly to WriteInput, and we have g t = C g (V1 t,..., Vn). t If the max register represents an internal or output gate, its value is written during some call to UpdateGate, and is equal to f g (g t1 i 1, g t2 i 2,..., g t k ik ) where each g ij is some register read by this call to UpdateGate and t j < t is the time at which it is read. Because max register values can only increase over time, we have, for each j, g tj i j gi t j = g t 1 i j C gij (V1 t 1,..., Vn t 1 ) by the induction hypothesis, and the fact that only gate g changes at time t. This last quantity is in turn at most C gij (V1 t,..., Vn) t as only gate g changes at time t. By monotonicity of f g we then. g... g i1... g ik... x 1... x n Fig. 3. A gate g in a circuit computes a function of its inputs f g(g i1,..., g ik ). The inputs to the circuit are x 1,..., x n.

14 A:14 James Aspnes et al. get g t = f g (g t1 i 1, g t2 i 2,..., g t k ik ) f g (C gi1 (V t 1,..., V t n),..., C gik (V t 1,..., V t n)) = C g (V t 1,..., V t n) as claimed, which completes the proof of clause 3. We now consider clause 2, which gives a lower bound on output values. For each time t and input x i, let vi t be the maximum value written to the max register representing x i by a WriteInput operation that finishes at or before time t. We wish to show that for any output gate g, g t C g (v1, t..., vn). t As with the upper bound, we proceed by induction on t and the structure of C. But the induction hypothesis is slightly more complicated, in that in order to make the proof go through we must take into account which gate we are working with when choosing which input values to consider. For each gate g, let vi t(g) be the maximum value written to input register x i by any instance of WriteInput that completes UpdateGate(g) at or before time t or its initial value if no such operation exists. Our induction hypothesis is that at each time t and for each gate g, g t C g (v1(g), t... vn(g)). t Although in general we have vi t vt i (g), since the set of operations that give vi t contains the set of operations that gives vt i (g), having g t C g (v1(g), t..., vn(g)) t implies g t C g (v1, t..., vn), t as any process that writes to some input x i that affects the value of g as part of some WriteInput operation must complete UpdateGate(g) before finishing the operation. The claim holds for t = 0, since we assume a consistent initialization of the circuit. Suppose now that some max register g changes its value at time t or its initial value if no such operation exists. If g is an input, the induction hypothesis holds trivially. Otherwise, consider the set of all WriteInput operations that write to g at or before time t. Among these operations, one of them is the last to complete UpdateGate(g ) for some input g to g. Let this event occur at time t < t, and call the process that completes this operation p. We now consider the effect of the UpdateGate(g) procedure carried out as part of this WriteInput operation. Because no other operation completes an UpdateGate procedure for any input g ij to g between t and t, we have that for each such input and each i, vi t(g i j ) = vi t (g i j ). Since the ReadMax operation of each g ij in p s call to UpdateGate(g) occurs after time t, it obtains a value that is at least gi t j C gij (v1 t (g ij ),..., vn t (g ij )), by the induction hypothesis, and this value is at least C gij (v1(g t ij ),..., vn(g t ij )), by the monotonicity of C gij and the previous observation on the relation between v t i (g i j ) and v t i (g i j ). But then g t f g (C gi1 (v t 1(g i1 ),..., v t n(g i1 )),... C gik (v t 1(g ik ),..., v t n(g ik ))) f g (C gi1 (v t 1(g),..., v t n(g)),..., C gik (v t 1(g),..., v t n(g))) = C g (v t 1(g),..., v t n(g)). 5. APPLICATIONS In this section we consider applications of the circuit-based method for building data structures described in Section 4. Most of these applications will be variants on counters, as these are the main example of monotone data structures currently found in the literature. Because we are working over a finite alphabet, all of our counters will be bounded.

15 Polylogarithmic Concurrent Data Structures A: Generalized counters A generalized counter takes an argument to its increment operation, which allows the counter to be incremented by any non-negative amount. We can construct a generalized counter from a circuit consisting of a binary tree of adders, where each gate in the circuit computes the sum of its inputs and each input to the circuit is assigned to a distinct process to avoid lost updates. This gives an implementation essentially identical to Algorithm 2, except that now an increment by start by adding an arbitrary non-negative value to the leaf register. As in Algorithm 2, the step complexity of an increment is O(min(log n log m, n)) and of a read is O(min(log m, n)). Unfortunately, unlike in the unit-increment case, it is not hard to see that our generalized counter is not linearizable, even though it satisfies monotone consistency. The reason is that read operations it may return intermediate values that are not consistent with any ordering of the increments. Here is a small example of a non-linearizable execution, which we present to illustrate the differences between linearizability and monotone consistency. Consider an execution with three writers, and look at what happens at the top gate in the circuit. Imagine that process p 0 executes a WriteInput operation with argument 0, p 1 executes a WriteInput operation with argument 1, and p 2 executes a WriteInput operation with argument 2. Let p 1 and p 2 arrive at the top gate through different intermediate gates g 1 and g 2 ; we also assume that each process reads g 2 before g 1 when executing UpdateGate(g). Now consider an execution in which p 0 arrives at g first, reading 0 from g 2 just before p 2 writes 2 to g 2. Process p 2 then reads g 2 and g 1 and computes the sum 2 but does not write it yet. Process p 1 now writes 1 to g 1 and p 0 reads it, causing p 0 to compute the sum 1 which it writes to the output gate. Process p 2 now finishes by writing 2 to the output gate. If both these values are observed by readers, we have a non-linearizable schedule, as there is no sequential ordering of the increments 0, 1, and 2 that will yield both output values. However, for restricted applications, we can obtain a fully linearizable object, as shown in the following sections Threshold objects Another variant of a shared counter that is linearizable is a threshold object. This counter allows increment operations, and supports a read operation that returns a binary value indicating whether a predetermined threshold has been crossed. We implement a threshold object with threshold T by having increment operations act as in the generalized counter, and having a read operation return 1 if the value it reads from the output gate is at least T, and 0 otherwise. We show that this implementation is linearizable even with non-uniform increments, where the requirement is that a read operation returns 1 if and only if the sum of the increment operations linearized before it is at least T. LEMMA 5.1. The implementation of a threshold object C with threshold T by a monotone data structure with the procedures WriteInput and ReadOutput is linearizable. PROOF. We use monotone consistency to prove linearizability for the threshold object C. Let C 1 and C 2 be the sub-counters that are added to the final output gate g. We order read operations according to the ordering implied by monotone consistency, which is consistent with the order of non-overlapping read operations, and implies that once a read operation returns 1 then any following read operation returns 1. We order write operations according to their execution order, which is clearly consistent with the

Faster than Optimal Snapshots (for a While)

Faster than Optimal Snapshots (for a While) Faster than Optimal Snapshots (for a While) Preliminary Version ABSTRACT James Aspnes Department of Computer Science, Yale University aspnes@cs.yale.edu Keren Censor-Hillel Computer Science and Artificial

More information

Optimal-Time Adaptive Strong Renaming, with Applications to Counting

Optimal-Time Adaptive Strong Renaming, with Applications to Counting Optimal-Time Adaptive Strong Renaming, with Applications to Counting [Extended Abstract] Dan Alistarh EPFL James Aspnes Yale University Keren Censor-Hillel MIT Morteza Zadimoghaddam MIT Seth Gilbert NUS

More information

6.856 Randomized Algorithms

6.856 Randomized Algorithms 6.856 Randomized Algorithms David Karger Handout #4, September 21, 2002 Homework 1 Solutions Problem 1 MR 1.8. (a) The min-cut algorithm given in class works because at each step it is very unlikely (probability

More information

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017 CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive

More information

Time and Space Lower Bounds for Implementations Using k-cas

Time and Space Lower Bounds for Implementations Using k-cas Time and Space Lower Bounds for Implementations Using k-cas Hagit Attiya Danny Hendler September 12, 2006 Abstract This paper presents lower bounds on the time- and space-complexity of implementations

More information

The Complexity of Renaming

The Complexity of Renaming The Complexity of Renaming Dan Alistarh EPFL James Aspnes Yale Seth Gilbert NUS Rachid Guerraoui EPFL Abstract We study the complexity of renaming, a fundamental problem in distributed computing in which

More information

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Michiel Smid October 14, 2003 1 Introduction In these notes, we introduce a powerful technique for solving geometric problems.

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

6.852: Distributed Algorithms Fall, Class 21

6.852: Distributed Algorithms Fall, Class 21 6.852: Distributed Algorithms Fall, 2009 Class 21 Today s plan Wait-free synchronization. The wait-free consensus hierarchy Universality of consensus Reading: [Herlihy, Wait-free synchronization] (Another

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

Optimal Parallel Randomized Renaming

Optimal Parallel Randomized Renaming Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously

More information

Section 4 Concurrent Objects Correctness, Progress and Efficiency

Section 4 Concurrent Objects Correctness, Progress and Efficiency Section 4 Concurrent Objects Correctness, Progress and Efficiency CS586 - Panagiota Fatourou 1 Concurrent Objects A concurrent object is a data object shared by concurrently executing processes. Each object

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Self-stabilizing Byzantine Digital Clock Synchronization

Self-stabilizing Byzantine Digital Clock Synchronization Self-stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev and Ariel Daliot The Hebrew University of Jerusalem We present a scheme that achieves self-stabilizing Byzantine digital

More information

How Hard Is It to Take a Snapshot?

How Hard Is It to Take a Snapshot? How Hard Is It to Take a Snapshot? Faith Ellen Fich University of Toronto Toronto, Canada fich@cs.utoronto.ca Abstract. The snapshot object is an important and well-studied primitive in distributed computing.

More information

Concurrent Reading and Writing of Clocks

Concurrent Reading and Writing of Clocks Concurrent Reading and Writing of Clocks LESLIE LAMPORT Digital Equipment Corporation As an exercise in synchronization without mutual exclusion, algorithms are developed to implement both a monotonic

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Throughout this course, we use the terms vertex and node interchangeably.

Throughout this course, we use the terms vertex and node interchangeably. Chapter Vertex Coloring. Introduction Vertex coloring is an infamous graph theory problem. It is also a useful toy example to see the style of this course already in the first lecture. Vertex coloring

More information

Solutions. (a) Claim: A d-ary tree of height h has at most 1 + d +...

Solutions. (a) Claim: A d-ary tree of height h has at most 1 + d +... Design and Analysis of Algorithms nd August, 016 Problem Sheet 1 Solutions Sushant Agarwal Solutions 1. A d-ary tree is a rooted tree in which each node has at most d children. Show that any d-ary tree

More information

On the Space Complexity of Randomized Synchronization

On the Space Complexity of Randomized Synchronization On the Space Complexity of Randomized Synchronization FAITH FICH University of Toronto, Toronto, Ont., Canada MAURICE HERLIHY Brown University, Providence, Rhode Island AND NIR SHAVIT Tel-Aviv University,

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

DISTRIBUTED COMPUTING COLUMN. Stefan Schmid TU Berlin & T-Labs, Germany

DISTRIBUTED COMPUTING COLUMN. Stefan Schmid TU Berlin & T-Labs, Germany DISTRIBUTED COMPUTING COLUMN Stefan Schmid TU Berlin & T-Labs, Germany stefan.schmid@tu-berlin.de 1 The Renaming Problem: Recent Developments and Open Questions Dan Alistarh (Microsoft Research) 1 Introduction

More information

From Bounded to Unbounded Concurrency Objects and Back

From Bounded to Unbounded Concurrency Objects and Back From Bounded to Unbounded Concurrency Objects and Back Yehuda Afek afek@post.tau.ac.il Adam Morrison adamx@post.tau.ac.il School of Computer Science Tel Aviv University Guy Wertheim vgvertex@gmail.com

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

1 (15 points) LexicoSort

1 (15 points) LexicoSort CS161 Homework 2 Due: 22 April 2016, 12 noon Submit on Gradescope Handed out: 15 April 2016 Instructions: Please answer the following questions to the best of your ability. If you are asked to show your

More information

Lecture 6 Sequences II. Parallel and Sequential Data Structures and Algorithms, (Fall 2013) Lectured by Danny Sleator 12 September 2013

Lecture 6 Sequences II. Parallel and Sequential Data Structures and Algorithms, (Fall 2013) Lectured by Danny Sleator 12 September 2013 Lecture 6 Sequences II Parallel and Sequential Data Structures and Algorithms, 15-210 (Fall 2013) Lectured by Danny Sleator 12 September 2013 Material in this lecture: Today s lecture is about reduction.

More information

We assume uniform hashing (UH):

We assume uniform hashing (UH): We assume uniform hashing (UH): the probe sequence of each key is equally likely to be any of the! permutations of 0,1,, 1 UH generalizes the notion of SUH that produces not just a single number, but a

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T. Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

Subset sum problem and dynamic programming

Subset sum problem and dynamic programming Lecture Notes: Dynamic programming We will discuss the subset sum problem (introduced last time), and introduce the main idea of dynamic programming. We illustrate it further using a variant of the so-called

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

Sorting Networks. March 2, How much time does it take to compute the value of this circuit?

Sorting Networks. March 2, How much time does it take to compute the value of this circuit? Sorting Networks March 2, 2005 1 Model of Computation Can we do better by using different computes? Example: Macaroni sort. Can we do better by using the same computers? How much time does it take to compute

More information

Greedy Algorithms CHAPTER 16

Greedy Algorithms CHAPTER 16 CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

Linearizable Iterators

Linearizable Iterators Linearizable Iterators Supervised by Maurice Herlihy Abstract Petrank et. al. [5] provide a construction of lock-free, linearizable iterators for lock-free linked lists. We consider the problem of extending

More information

FINAL EXAM SOLUTIONS

FINAL EXAM SOLUTIONS COMP/MATH 3804 Design and Analysis of Algorithms I Fall 2015 FINAL EXAM SOLUTIONS Question 1 (12%). Modify Euclid s algorithm as follows. function Newclid(a,b) if a

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Discrete mathematics

Discrete mathematics Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many

More information

arxiv: v1 [cs.dc] 8 May 2017

arxiv: v1 [cs.dc] 8 May 2017 Towards Reduced Instruction Sets for Synchronization arxiv:1705.02808v1 [cs.dc] 8 May 2017 Rati Gelashvili MIT gelash@mit.edu Alexander Spiegelman Technion sashas@tx.technion.ac.il Idit Keidar Technion

More information

Approximate Shared-Memory Counting Despite a Strong Adversary

Approximate Shared-Memory Counting Despite a Strong Adversary Despite a Strong Adversary James Aspnes Keren Censor Processes can read and write shared atomic registers. Read on an atomic register returns value of last write. Timing of operations is controlled by

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES)

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES) Chapter 1 A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES) Piotr Berman Department of Computer Science & Engineering Pennsylvania

More information

Chapter 19. Sorting Networks Model of Computation. By Sariel Har-Peled, December 17,

Chapter 19. Sorting Networks Model of Computation. By Sariel Har-Peled, December 17, Chapter 19 Sorting Networks By Sariel Har-Peled, December 17, 2012 1 19.1 Model of Computation It is natural to ask if one can perform a computational task considerably faster by using a different architecture

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer May 14, 2008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

k-abortable Objects: Progress under High Contention

k-abortable Objects: Progress under High Contention k-abortable Objects: Progress under High Contention Naama Ben-David 1, David Yu Cheng Chan 2, Vassos Hadzilacos 2, and Sam Toueg 2 Carnegie Mellon University 1 University of Toronto 2 Outline Background

More information

1 Minimum Cut Problem

1 Minimum Cut Problem CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum

More information

Algorithms Dr. Haim Levkowitz

Algorithms Dr. Haim Levkowitz 91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic

More information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}. Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Algorithms that Adapt to Contention

Algorithms that Adapt to Contention Algorithms that Adapt to Contention Hagit Attiya Department of Computer Science Technion June 2003 Adaptive Algorithms / Hagit Attiya 1 Fast Mutex Algorithm [Lamport, 1986] In a well-designed system, most

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer Novembe4, 008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

c 2003 Society for Industrial and Applied Mathematics

c 2003 Society for Industrial and Applied Mathematics SIAM J. COMPUT. Vol. 32, No. 2, pp. 538 547 c 2003 Society for Industrial and Applied Mathematics COMPUTING THE MEDIAN WITH UNCERTAINTY TOMÁS FEDER, RAJEEV MOTWANI, RINA PANIGRAHY, CHRIS OLSTON, AND JENNIFER

More information

Notes on Binary Dumbbell Trees

Notes on Binary Dumbbell Trees Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes

More information

Cache-Oblivious Traversals of an Array s Pairs

Cache-Oblivious Traversals of an Array s Pairs Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious

More information

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o Reconstructing a Binary Tree from its Traversals in Doubly-Logarithmic CREW Time Stephan Olariu Michael Overstreet Department of Computer Science, Old Dominion University, Norfolk, VA 23529 Zhaofang Wen

More information

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 This examination is a three hour exam. All questions carry the same weight. Answer all of the following six questions.

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

CS525 Winter 2012 \ Class Assignment #2 Preparation

CS525 Winter 2012 \ Class Assignment #2 Preparation 1 CS525 Winter 2012 \ Class Assignment #2 Preparation Ariel Stolerman 2.26) Let be a CFG in Chomsky Normal Form. Following is a proof that for any ( ) of length exactly steps are required for any derivation

More information

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]: CSE 101 Homework 1 Background (Order and Recurrence Relations), correctness proofs, time analysis, and speeding up algorithms with restructuring, preprocessing and data structures. Due Thursday, April

More information

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions. CS 787: Advanced Algorithms NP-Hardness Instructor: Dieter van Melkebeek We review the concept of polynomial-time reductions, define various classes of problems including NP-complete, and show that 3-SAT

More information

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri 1 Introduction Today, we will introduce a fundamental algorithm design paradigm,

More information

Fixed Parameter Algorithms

Fixed Parameter Algorithms Fixed Parameter Algorithms Dániel Marx Tel Aviv University, Israel Open lectures for PhD students in computer science January 9, 2010, Warsaw, Poland Fixed Parameter Algorithms p.1/41 Parameterized complexity

More information

Applied Algorithm Design Lecture 3

Applied Algorithm Design Lecture 3 Applied Algorithm Design Lecture 3 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 3 1 / 75 PART I : GREEDY ALGORITHMS Pietro Michiardi (Eurecom) Applied Algorithm

More information

Multi-Way Search Trees

Multi-Way Search Trees Multi-Way Search Trees Manolis Koubarakis 1 Multi-Way Search Trees Multi-way trees are trees such that each internal node can have many children. Let us assume that the entries we store in a search tree

More information

Deterministic Conflict-Free Coloring for Intervals: from Offline to Online

Deterministic Conflict-Free Coloring for Intervals: from Offline to Online Deterministic Conflict-Free Coloring for Intervals: from Offline to Online AMOTZ BAR-NOY Brooklyn College and the Graduate Center, City University of New York PANAGIOTIS CHEILARIS City University of New

More information

9.1 Cook-Levin Theorem

9.1 Cook-Levin Theorem CS787: Advanced Algorithms Scribe: Shijin Kong and David Malec Lecturer: Shuchi Chawla Topic: NP-Completeness, Approximation Algorithms Date: 10/1/2007 As we ve already seen in the preceding lecture, two

More information

ACONCURRENT system may be viewed as a collection of

ACONCURRENT system may be viewed as a collection of 252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

The Relative Power of Synchronization Methods

The Relative Power of Synchronization Methods Chapter 5 The Relative Power of Synchronization Methods So far, we have been addressing questions of the form: Given objects X and Y, is there a wait-free implementation of X from one or more instances

More information

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8 CSE341T/CSE549T 09/17/2014 Lecture 8 Scan and its Uses 1 Scan Today, we start by learning a very useful primitive. First, lets start by thinking about what other primitives we have learned so far? The

More information

15.4 Longest common subsequence

15.4 Longest common subsequence 15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible

More information

Decreasing the Diameter of Bounded Degree Graphs

Decreasing the Diameter of Bounded Degree Graphs Decreasing the Diameter of Bounded Degree Graphs Noga Alon András Gyárfás Miklós Ruszinkó February, 00 To the memory of Paul Erdős Abstract Let f d (G) denote the minimum number of edges that have to be

More information

Recoloring k-degenerate graphs

Recoloring k-degenerate graphs Recoloring k-degenerate graphs Jozef Jirásek jirasekjozef@gmailcom Pavel Klavík pavel@klavikcz May 2, 2008 bstract This article presents several methods of transforming a given correct coloring of a k-degenerate

More information

The divide and conquer strategy has three basic parts. For a given problem of size n,

The divide and conquer strategy has three basic parts. For a given problem of size n, 1 Divide & Conquer One strategy for designing efficient algorithms is the divide and conquer approach, which is also called, more simply, a recursive approach. The analysis of recursive algorithms often

More information

Fully Dynamic Maximal Independent Set with Sublinear Update Time

Fully Dynamic Maximal Independent Set with Sublinear Update Time Fully Dynamic Maximal Independent Set with Sublinear Update Time ABSTRACT Sepehr Assadi University of Pennsylvania Philadelphia, PA, USA sassadi@cis.upenn.com Baruch Schieber IBM Research Yorktown Heights,

More information

Space vs Time, Cache vs Main Memory

Space vs Time, Cache vs Main Memory Space vs Time, Cache vs Main Memory Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Space vs Time, Cache vs Main Memory CS 4435 - CS 9624 1 / 49

More information

Cantor s Diagonal Argument for Different Levels of Infinity

Cantor s Diagonal Argument for Different Levels of Infinity JANUARY 2015 1 Cantor s Diagonal Argument for Different Levels of Infinity Michael J. Neely University of Southern California http://www-bcf.usc.edu/ mjneely Abstract These notes develop the classic Cantor

More information

EDAA40 At home exercises 1

EDAA40 At home exercises 1 EDAA40 At home exercises 1 1. Given, with as always the natural numbers starting at 1, let us define the following sets (with iff ): Give the number of elements in these sets as follows: 1. 23 2. 6 3.

More information

On Universal Cycles of Labeled Graphs

On Universal Cycles of Labeled Graphs On Universal Cycles of Labeled Graphs Greg Brockman Harvard University Cambridge, MA 02138 United States brockman@hcs.harvard.edu Bill Kay University of South Carolina Columbia, SC 29208 United States

More information

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes Greedy Algorithms CLRS Chapters 16.1 16.3 Introduction to greedy algorithms Activity-selection problem Design of data-compression (Huffman) codes (Minimum spanning tree problem) (Shortest-path problem)

More information

Anonymous and Fault-Tolerant Shared-Memory Computing

Anonymous and Fault-Tolerant Shared-Memory Computing Distributed Computing manuscript No. (will be inserted by the editor) Rachid Guerraoui Eric Ruppert Anonymous and Fault-Tolerant Shared-Memory Computing the date of receipt and acceptance should be inserted

More information

Verifying a Border Array in Linear Time

Verifying a Border Array in Linear Time Verifying a Border Array in Linear Time František Franěk Weilin Lu P. J. Ryan W. F. Smyth Yu Sun Lu Yang Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario

More information

Parameterized graph separation problems

Parameterized graph separation problems Parameterized graph separation problems Dániel Marx Department of Computer Science and Information Theory, Budapest University of Technology and Economics Budapest, H-1521, Hungary, dmarx@cs.bme.hu Abstract.

More information

3 SOLVING PROBLEMS BY SEARCHING

3 SOLVING PROBLEMS BY SEARCHING 48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not

More information

MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS

MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS MANUEL FERNÁNDEZ, NICHOLAS SIEGER, AND MICHAEL TAIT Abstract. In 99, Bollobás and Frieze showed that the threshold for G n,p to contain a spanning

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction

More information

On k-dimensional Balanced Binary Trees*

On k-dimensional Balanced Binary Trees* journal of computer and system sciences 52, 328348 (1996) article no. 0025 On k-dimensional Balanced Binary Trees* Vijay K. Vaishnavi Department of Computer Information Systems, Georgia State University,

More information

Recursion and Structural Induction

Recursion and Structural Induction Recursion and Structural Induction Mukulika Ghosh Fall 2018 Based on slides by Dr. Hyunyoung Lee Recursively Defined Functions Recursively Defined Functions Suppose we have a function with the set of non-negative

More information

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to

More information

Distributed Sorting. Chapter Array & Mesh

Distributed Sorting. Chapter Array & Mesh Chapter 9 Distributed Sorting Indeed, I believe that virtually every important aspect of programming arises somewhere in the context of sorting [and searching]! Donald E. Knuth, The Art of Computer Programming

More information

Cuckoo Hashing for Undergraduates

Cuckoo Hashing for Undergraduates Cuckoo Hashing for Undergraduates Rasmus Pagh IT University of Copenhagen March 27, 2006 Abstract This lecture note presents and analyses two simple hashing algorithms: Hashing with Chaining, and Cuckoo

More information