Tel Aviv University Raymond and Beverly Sackler Faculty of Exact Sciences School of Computer Science. by Guy Golan Gueta

Size: px

Start display at page:

Download "Tel Aviv University Raymond and Beverly Sackler Faculty of Exact Sciences School of Computer Science. by Guy Golan Gueta"

Eleanore Rodgers
5 years ago
Views:

1 Tel Aviv University Raymond and Beverly Sackler Faculty of Exact Sciences School of Computer Science AUTOMATIC FINE-GRAINED SYNCHRONIZATION by Guy Golan Gueta under the supervision of Prof. Mooly Sagiv and Prof. Eran Yahav and the consultation of Dr. G. Ramalingam A thesis submitted for the degree of Doctor of Philosophy Submitted to the Senate of Tel Aviv University April 2015

2 ii

3 To my loved ones, Sophie, Yasmin, Ortal and Ariel. iii

4 iv

5 Abstract Automatic Fine-Grained Synchronization Guy Golan Gueta School of Computer Science Tel Aviv University A key challenge in writing concurrent programs is synchronization: ensuring that concurrent accesses and modifications to a shared mutable state do not interfere with each other in undesirable ways. An important correctness criterion for synchronization is atomicity, i.e., the synchronization should ensure that a code section (a transaction) appears to execute atomically. Realizing an efficient and scalable synchronization that correctly ensures atomicity is considered a challenging task. In this thesis, we address the problem of achieving correct and efficient atomicity by developing and enforcing certain synchronization protocols. We present three novel synchronization approaches that utilize program-specific information using compile-time and run-time techniques. The first approach leverages the shape of shared memory in order to transform a sequential library into an atomic library (i.e., into a library in which each operation appears to execute atomically). The approach is based on domination locking, a novel fine-grained locking protocol designed specifically for concurrency control in object-oriented software with dynamic pointer updates. We present a static algorithm that automatically enforces domination locking in a sequential library which is implemented using a dynamic forest. We show that our algorithm can successfully add effective fine-grain locking to libraries where manually performing locking is challenging. The second approach transforms atomic libraries into transactional libraries, which ensure atomicity of sequences of operations. The central idea is to create a library that exploits information (foresight) provided by its clients. The foresight restricts the cases that should be considered by the library thereby permitting more efficient synchronization. This approach is based on a novel synchronization protocol which is based on a notion of dynamic right-movers. We present a static analysis to infer the foresight information required by the approach, allowing a compiler to automatically insert the foresight information into the client. This relieves the client programmer of this burden and simplifies writing client code. We show a generic implementation technique to realize the approach in a given library. We show that this approach enables enforcing atomicity of a wide selection of real-life v

6 Java composite operations. Our experiments indicate that the approach enables realizing efficient and scalable synchronization for real-life composite operations. Finally, we show an approach that enables using multiple transactional libraries. This approach is applicable to a special case of transactional libraries in which the synchronization is based on locking that exploits semantic properties of the library operations. This approach realizes a semantic-based finegrained locking which is based on the commutativity properties of the libraries operations and on the program s dynamic pointer updates. We show that this approach leads to effective synchronization. In some cases, it improves the applicability and the performance of our second approach. We formalize the above approaches and prove they guarantee atomicity and deadlock freedom. We show that our approaches provide a variety of tools to effectively deal with common cases in concurrent programs. vi

7 Acknowledgements First and foremost, I would like to express my deep gratitude and appreciation to my advisors, Mooly Sagiv and Eran Yahav. Their guidance, inspiration, knowledge, and optimism were crucial for the completion of this thesis. I would like to thank G. Ramalingam for his guidance, help, and support throughout the work on this thesis. This thesis would have been impossible without his guidance and help. I would like to thank Alex Aiken and Nathan Bronson for interesting discussions, joint work, and for the enjoyable visits at Stanford University. I would like to thank Mooly s group for many fruitful discussions and for being such a wonderful combination of research colleagues and friends: Ohad Shacham, Shachar Itzhaky, Omer Tripp, Ofri Ziv, Oren Zomer, Ghila Castelnuovo, Ariel Jarovsky, Hila Peleg, and Or Tamir. vii

8 viii

9 Contents 1 Introduction Automatic Fine-Grained Locking Transactional Libraries Composition of Transactional Libraries via Semantic Locking Fine-Grained Locking using Shape Properties Overview Preliminaries Domination Locking Enforcing DL in Forest-Based Libraries Eager Forest-Locking Enforcing EFL Example for Dynamically Changing Forest Performance Evaluation General Purpose Data Structures Specialized Implementations Transactional Libraries with Foresight Overview Serializable and Serializably-Completable Executions Serializably-Completable Execution: A Characterization Synchronization Using Foresight Realizing Foresight Based Synchronization Preliminaries Libraries Clients Foresight-Based Synchronization The Problem ix

10 3.3.2 The Client Protocol Dynamic Right Movers Serializability B-Serializable-Completability E-Completability Special Cases Automatic Foresight for Clients Annotation Language Inferring Calls to mayuse Procedures Implementation for Java Programs Implementing Libraries with Foresight The Basic Approach Using Dynamic Information Optimistic Locking Further Extensions Java Threads and Transactions Experimental Evaluation Applicability and Precision Of The Static Analysis Comparison To Hand-Crafted Implementations Evaluating The Approach On Realistic Software Java Implementation of the Transactional Maps Library Base Library Extended Library Utilizing Dynamic Information by Handcrafted Optimization API Adapter Composition of Transactional Libraries via Semantic Locking Semantic Locking Basics ADTs With Semantic Locking Automatic Atomicity Automatic Atomicity Enforcement Enforcing S2PL Lock Ordering Constraints Enforcing OS2PL on Acyclic Graphs Optimizations x

11 4.2.5 Handling Cycles via Coarse-Grained Locking Using Specialized Locking Operations Implementing ADTs with Semantic Locking Performance Evaluation Benchmarks Performance Related Work Synchronization Protocols Concurrent Data Structures Automatic Synchronization Conclusions and Future Work 109 Bibliography 111 xi

12 xii

13 Chapter 1 Introduction Concurrency is widely used in software systems because it helps reduce latency, increase throughput, and provide better utilization of multi-core machines [35, 69]. However, writing concurrent programs is considered a difficult and error-prone process, due to the need to consider all possible interleavings of code fragments which are executed in parallel. Atomicity Atomicity is a fundamental correctness property of code sections in concurrent programs. Intuitively, a code section is said to be atomic if for every (arbitrarily interleaved) program execution, there is an equivalent execution with the same overall behavior where the atomic code section is not interleaved with other parts of the program. In other words, an atomic code section can be seen as a code section which is always executed in isolation. Atomic code sections help reasoning about concurrent programs, since one can assume that each atomic code section is never interleaved with other parts of the program. Several different variants of the atomicity property have been defined and used in the literature of databases and shared memory systems, where each variant has its own semantic properties and is aimed at a specific set of scenarios and considerations (e.g., [45, 55, 75, 86]). For example, linearizability [55] is a variant of atomicity that is commonly used to describe implementations of shared libraries (and shared objects): an operation of a linearizable library can be seen as if it always takes effect instantaneously at some point between its invocation and its response. The linearizable property ignores the actual implementation details of the shared library; instead, it only considers the library behavior from the point of view of its client. The Problem In this thesis we address the problem of automatically ensuring atomicity of code sections by realizing an efficient and scalable synchronization. One of the main challenges in this problem is to guarantee atomicity in a scalable way, restricting parallelism only where necessary. The synchronization should not have a high run-time overhead which makes it worthless i.e., it should have better 1

14 2 CHAPTER 1. INTRODUCTION performance than simple alternatives (like a single global lock [89]). Solutions for enforcing atomicity, which are implemented in practice, are predominantly handcrafted and tend to lead to concurrency bugs (e.g., see [33, 58, 79]). Automatic approaches (see Section 5.3) allow a programmer to declaratively specify the atomic code sections, leaving it to a compiler and run-time to implement the necessary synchronization. However, existing automatic approaches have not been widely adopted due to various concerns [27, 34, 36, 69, 84, 89], including high run-time overhead, poor performance and limited ability to handle irreversible operations (such as I/O operations). Specialized Synchronization In this thesis we present several approaches for automatic atomicity enforcement, where each approach is designed to handle a restricted class of programs (and scenarios). The idea is to produce synchronization that enforces atomicity by exploiting the restricted properties of the programs. For each approach we describe a specialized synchronization protocol, and realize the protocol by using a combination of compile-time and run-time techniques. The synchronization protocols are designed to ensure efficient and scalable atomicity, without leading to deadlocks and without using any rollback mechanism. The presented approaches deal with two different aspects of the synchronization problem for concurrent programs. In the first approach, we deal with the code inside the libraries (i.e., the library implementation); whereas in the other two approaches, we deal with code which utilizes the libraries API. 1.1 Automatic Fine-Grained Locking In Chapter 2, we present an approach that leverages the shape of shared memory 1 to transform a sequential library into a linearizable library [55]. This approach is based on the paper Automatic Fine-Grain Locking using Shape Properties that was presented at OOPSLA 2011 [40]. A library encapsulates shared data with a set of procedures, which may be invoked by concurrently executing threads. Given the code of a library, our goal is to add correct fine-grained locking that ensures linearizability and permits a high degree of parallelism. Specifically, we are interested in locking in which each shared object has its own lock, and locks may be released before the end of the computation. The main insight of this approach is to use the shape of the pointer data structures to simplify reasoning about fine-grained locking and automatically infer efficient and correct fine-grained locking. Domination Locking We define a new fine-grained locking protocol called Domination Locking. Domination Locking is a set of conditions that guarantee atomicity and deadlock-freedom. Domination Locking is designed to handle dynamically-manipulated recursive data structures by leveraging natural 1 The shape of the heap s objects graph.

15 1.2. TRANSACTIONAL LIBRARIES 3 domination properties of paths in dynamically-changing data structures. This protocol is a strict generalization of several related fine-grained locking protocols such as dynamic tree locking and dynamic DAG locking [17, 19, 28]. Automatic Fine-Grained Locking We then present an automatic technique to enforce the conditions of Domination Locking. The technique is applicable to libraries where the shape of the shared heap is a forest. The technique allows the shape of the heap to change dynamically as long as the shape is a forest between invocations of library operations. We show that our technique adds efficient and scalable fine-grained locking in several practical data structures where it is hard to produce similar locking manually. We demonstrate the applicability of the method on balanced search-trees [16, 46], a self-adjusting heap [81] and specialized data structure implementations [18, 72]. 1.2 Transactional Libraries Linearizable libraries provide operations that appear to execute atomically. However, clients often need to perform a sequence of library operations that appears to execute atomically, referred to hereafter as an atomic composite operation. In Chapter 3, we consider the problem of extending a linearizable library to support arbitrary atomic composite operations by clients. We introduce a novel approach in which the library ensures atomicity of composite operations by exploiting information provided by its clients. We refer to such libraries as transactional libraries. Our basic methodology requires the client code to demarcate the sequence of operations for which atomicity is desired and provide declarative information to the library (foresight) about the library operations that the composite operation may invoke. It is the library s responsibility to ensure the desired atomicity, exploiting the foresight information for effective synchronization. Example The idea is demonstrated in code fragment shown in Figure 1.1. This code uses a shared Counter (a shared library) by invoking its Get and Inc operations. The code provides information (foresight) about the future possible Counter operations: at line 2 it indicates that any operation may be invoked (after line 2), at line 4 it indicates that only Inc may be invoked (after line 4), finally at line 9 it indicates that no more operations will be used (after line 9). This information is utilized by the Counter implementation in order to efficiently ensure that this code fragment will always be executed atomically. A detailed version of this example is described in Chapter 3. Our approach is based on the paper Concurrent Libraries with Foresight that was presented at PLDI 2013 [41].

16 4 CHAPTER 1. INTRODUCTION 1 */ { 3 c = Get(); 5 while (c > 0) { 6 c = c-1; 7 Inc(); 8 } 10 } Figure 1.1: Code that provides information (foresight) about future possible operations. Foresight based Synchronization We first present a formalization of this approach. We formalize the desired goals and present a sufficient correctness condition. As long as the clients and the library extension satisfy the correctness condition, all composite operations are guaranteed atomicity without deadlocks. Our sufficiency condition is broad and permits a range of implementation options and finegrained synchronization. It is based on a notion of dynamic right-movers (Section 3.3.3), which generalizes traditional notions of static right-movers and commutativity [61, 66]. Our approach decouples the implementation of the library from the client. Thus, the correctness of the client does not depend on the way the foresight information is used by library implementation. The client only needs to ensure the correctness of the foresight information. Automatic Foresight for Clients We then present a static analysis to infer the foresight information required by our approach, allowing a compiler to automatically insert the foresight information into the client code. This relieves the client programmer of this burden and simplifies writing atomic composite operations. Library Extension Realization Our approach permits the use of customized, hand-crafted, implementations of the library extension. However, we also present a generic technique for extending a linearizable library with foresight. The technique is based on a novel variant of the tree locking protocol in which the tree is designed according to semantic properties of the library s operations. We used our generic technique to implement a single general-purpose Java library for Map data structures. Our library permits composite operations to simultaneously work with multiple instances of Map data structures. (We focus on Maps, because Shacham [78] observed that Maps are heavily used for implementing composite operations in real-life concurrent programs). We use our library and the static analysis to enforce atomicity of a selection of real-life Java composite operations, including composite operations that manipulate multiple instances of Map data structures. Our experiments indicate that our approach enables realizing efficient and scalable synchronization for real-life composite operations.

17 1.3. COMPOSITION OF TRANSACTIONAL LIBRARIES VIA SEMANTIC LOCKING Composition of Transactional Libraries via Semantic Locking In Chapter 4, we present an approach for handling composite operations that use multiple transactional libraries. Our approach is described in the short paper Automatic Semantic Locking that was presented at PPOPP 2014 [42]. This approach is also used in the paper Automatic scalable atomicity via semantic locking that was presented at PPOPP 2015 [43]. Our approach can be seen as a combination of Chapter 3 and approaches for automatic lock inference (e.g., [68]). In this approach, we restrict the synchronization that can be implemented in the libraries to synchronization that resembles locking this synchronization is similar to the semantic-aware locking from the database literature. We refer to such libraries as libraries with semantic-locking. We describe a static algorithm that enforces atomicity of code sections that use multiple libraries with semantic-locking. We implement this static algorithm and show it produces efficient and scalable synchronization.

18 6 CHAPTER 1. INTRODUCTION

19 Chapter 2 Fine-Grained Locking using Shape Properties In this chapter, we consider the problem of turning a sequential library into a linearizable library [55]. Our goal is to provide a synchronization method which guarantees atomicity of the library operations in a scalable way, restricting parallelism only where necessary. We are interested in a systematic method that is applicable to a large family of libraries, rather than a method specific to a single library. Fine-Grained Locking One way to achieve scalable multi-threading is to use fine-grained locking (e.g., [19]). In fine-grained locking, one associates, e.g., each shared object with its own lock, permitting multiple operations to simultaneously operate on different parts of the shared state. Reasoning about fine-grained locking is challenging and error-prone. As a result, programmers often resort to coarsegrained locking, leading to limited scalability. The Problem We would like to automatically add fine-grained locking to a library. A library encapsulates shared data with a set of procedures, which may be invoked by concurrently executing threads. Given the code of a library, our goal is to add correct locking that ensures atomicity and permits a high degree of parallelism. Specifically, we are interested in locking in which each shared object has its own lock, and locks may be released before the end of the computation. Our main insight is that we can use the restricted shape of pointer data structures to simplify reasoning about fine-grained locking and automatically infer efficient and correct fine-grained locking. Domination Locking We define a new fine-grained locking protocol called Domination Locking. Domination Locking is a set of conditions that guarantees atomicity and deadlock-freedom. Domination Locking is designed to handle dynamically-manipulated recursive data structures by leveraging natural domination properties of dynamic data structures. 7

20 8 CHAPTER 2. FINE-GRAINED LOCKING USING SHAPE PROPERTIES root Key =10 Priority=99 Key =5 Priority=20 Key =15 Priority=72 Key =12 Priority=30 Key =18 Priority=50 Figure 2.1: An example for a Treap data structure. Automatic Fine-Grained Locking We present an automatic technique to enforce the conditions of Domination Locking. The technique is applicable to libraries where the shape of the shared memory is a forest. The technique allows the shape of the heap to change dynamically as long as the shape is a forest between invocations of library operations. In contrast to existing lock inference techniques, which are based on two-phased locking (see Section 5.3), our technique is able to release locks at early points of the computation. Finally, as we demonstrate in Section 2.4 and Section 2.5, our technique adds effective and scalable fine-grained locking in several practical data structures where it is extremely hard to manually produce similar locking. Our examples include balanced search-trees [16, 46], a self-adjusting heap [81] and specialized data structure implementations [18, 72]. Motivating Example Consider a library that implements the Treap data structure [16]. A Treap is a search tree that is simultaneously a binary search tree (on the key field) and a heap (on the priority field). An example is shown in Figure 2.1. If priorities are assigned randomly the resulting structure is equivalent to a random binary search tree, providing good asymptotic bounds for all operations. The Treap implementation consists of three procedures: insert, remove and lookup. Manually adding fine-grained locking to the Treap s code, is challenging since it requires considering many subtle details of the Treap s code. In contrast, our technique can add fine-grained locking to the Treap s code without considering its exact implementation details. (In other words, our technique does not need to understand the actual code of the Treap). For example, consider the Treap s remove operation shown in Figure 2.2. To achieve concurrent execution of its operations, we must release the lock on the root, while an operation is still in progress, once it is safe to do so. Either of the loops (starting at Lines 4 or 12) can move the current context to a subtree, after which the root (and, similarly, other nodes) should be unlocked. Several parts of this procedure implement tree rotations that change the order among the Treap s nodes, complicating

21 2.1. OVERVIEW 9 any correctness reasoning that depends on the order among nodes. Figure 2.3 shows an example of manual fine-grained locking of the Treap remove operation. Manually adding fine-grained locking to the code took an expert several hours and was an extremely error-prone process. In several cases, the expert locking released a lock too early, resulting in an incorrect concurrent algorithm (e.g., the release operation in Line 28). Our technique is able to automatically produce fine-grained concurrency in the Treap s code, by relying on its tree shape. This is in contrast to existing alternatives, such as manually enforcing handover-hand locking, that require a deep understanding of code details. Note that the dynamic tree locking protocol [17] is sufficient to ensure atomicity and deadlockfreedom of the Treap s example. In fact, the locking shown in Figure 2.3 satisfies the conditions of the dynamic tree locking protocol. But in contrast to the domination locking protocol which can be automatically enforced in the Treap s code, none of the existing synchronization techniques (see Section 5.3) can automatically enforce dynamic tree locking protocol for the Treap (even though the Treap is a single tree). 2.1 Overview In this section, we present an informal brief description of our approach. Domination Locking We define a new locking protocol, called Domination Locking (abbreviated DL). DL is a set of conditions that are designed to guarantee atomicity and deadlock freedom for operations of a well-encapsulated library. DL differentiates between a library s exposed and hidden objects: exposed objects (e.g., the Treap s root) act as the intermediary between the library and its clients, with pointers to such objects being passed back and forth between the library and its clients, while the clients are completely unaware of hidden objects (e.g., the Treap s intermediate nodes). The protocol exploits the fact that all operations must begin with one or more exposed objects and traverse the heap-graph to reach hidden objects. The protocol requires the exposed objects passed as parameters to an operation to be locked in a fashion similar to two-phase-locking. However, hidden objects are handled differently. A thread is allowed to acquire a lock on a hidden object if the locks it holds dominate the hidden object. (A set S of objects is said to dominate an object u if all paths (in the heap-graph) from an exposed object to u contains some object in S.) In particular, hidden objects can be locked even after other locks have been released, thus enabling early release of other locked objects (hidden as well as exposed). This simple protocol generalizes several fine-grained locking protocols defined for dynamically changing graphs [17, 19, 28] and is applicable in more cases (i.e., the conditions of DL are weaker). We use the DL s conditions as the basis for our automatic technique.

22 10 CHAPTER 2. FINE-GRAINED LOCKING USING SHAPE PROPERTIES 1 boolean remove(node par, int key) { 2 Node n = null; 3 n = par.right; // right child has root 4 while (n!= null && key!= n.key) { 5 par = n; 6 n = (key < n.key)? n.left : n.right; 7 } 8 if (n == null) 9 return false; // search failed, no change 10 Node nl = n.left; 11 Node nr = n.right; 12 while (true) { // n is the node to be removed 13 Node bestchild = (nl == null 14 (nr!= null && nr.prio > nl.prio))? nr : nl; 15 if (n == par.left) 16 par.left = bestchild; 17 else 18 par.right = bestchild; 19 if (bestchild == null) 20 break; // n was a leaf 21 if (bestchild == nl) { 22 n.left = nl.right; // rotate nl to n s spot 23 nl.right = n; 24 nl = n.left; 25 } else { 26 n.right = nr.left; // rotate nr to n s spot 27 nr.left = n; 28 nr = n.right; 29 } 30 par = bestchild; 31 } 32 return true; 33 } Figure 2.2: Removing an element from a treap by locating it and then rotating it into a leaf position. (Our technique can add fine-grained locking to this code without understanding its details.)

23 2.1. OVERVIEW 11 1 boolean remove(node par, int key) { 2 Node n = null; 3 acquire(par); 4 n = par.right; 5 if(n!= null) acquire(n); 6 while (n!= null && key!= n.key) { 7 release(par); 8 par = n; 9 n = (key < n.key)? n.left : n.right; 10 if(n!= null) acquire(n); 11 } 12 if (n == null){ release(par); return false; } 13 Node nl = n.left; if(nl!= null) acquire(nl); 14 Node nr = n.right; if(nr!= null) acquire(nr); 15 while (true) { 16 Node bestchild = (nl == null 17 (nr!= null && nr.prio > nl.prio))? nr : nl; 18 if (n == par.left) 19 par.left = bestchild; 20 else 21 par.right = bestchild; 22 release(par); 23 if (bestchild == null) 24 break; 25 if (bestchild == nl) { 26 n.left = nl.right; 27 nl.right = n; 28 // release(nl); // an erroneous release statment 29 nl = n.left; 30 if(nl!= null) acquire(nl); 31 } else { 32 n.right = nr.left; 33 nr.left = n; 34 nr = n.right; 35 if(nr!= null) acquire(nr); 36 } 37 par = bestchild; 38 } 39 return true; 40 } Figure 2.3: Treap s remove code with manual fine-grained locking.

24 12 CHAPTER 2. FINE-GRAINED LOCKING USING SHAPE PROPERTIES Key =10 Priority=99 par n Key =5 Priority=20 nl Key =15 Priority=72 nr bestchild Key =12 Priority=30 Key =18 Priority=50 Figure 2.4: Execution of the Treap s remove (Figure 2.2) in which the tree shape is violated. The node pointed by nr and bestchild has two predecessors. Automatic Locking of Forest-Based Libraries Our technique is able to automatically enforce DL, in a way that releases locks at early points of the computation. Specifically, the technique is applicable to libraries whose heap-graphs form a forest at the end of any complete sequential execution (of any sequence of operations). Note that existing shape analyses, for sequential programs, can be used to automatically verify if a library satisfies this precondition (e.g., [76, 88] ). In particular, we avoid the need for explicitly reasoning on concurrent executions. For example, the Treap is a tree at the end of any of its operations, when executed sequentially. Note that, during some of its operation (insert and remove) its tree shape is violated by a node with multiple predecessors (caused by the tree rotations). An example for tree violation (caused by the rotations in remove) is shown in Figure 2.4. Our technique uses the following locking scheme: a procedure invocation maintains a lock on the set of objects directly pointed to by its local variables (called the immediate scope). When an object goes out of the immediate scope of the invocation (i.e., when the last variable pointing to that object is assigned some other value), the object is unlocked if it has (at most) one predecessor in the heap graph (i.e., if it does not violate the forest shape). If a locked object has multiple predecessors when it goes out of the immediate scope of the invocation, then it is unlocked eventually when the object has at most one predecessor. The forest-condition guarantees that every lock is eventually released. To realize this scheme, we use a pair of reference counts to track incoming references from the heap and local variables of the current procedure. All the updates to the reference count can be done easily by instrumenting every assignment statement, allowing a relatively simple compile-time transformation. While we defer the details of the transformation to Section 2.4, Figure 2.5 shows the transformed implementation of remove (from Figure 2.2). ASNL and ASNF are macros that perform assignment to a local variable and a field, respectively, update reference counts, and conditionally acquire or release locks according to the above locking scheme.

25 2.1. OVERVIEW 13 1 boolean remove(node par, int key) { 2 Node n = null; 3 Take(par); 4 ASNL(n, par.right); 5 while (n!= null && key!= n.key) { 6 ASNL(par, n); 7 ASNL(n, (key < n.key)? n.left : n.right); 8 } 9 if (n == null) { 10 ASNL(par, null); 11 ASNL(n, null); 12 return false; 13 } 14 Node nl = null; ASNL(nL, n.left); 15 Node nr = null; ASNL(nR, n.right); 16 while (true) { 17 Node bestch = null; ASNL(bestCh, (nl == null 18 (nr!= null && nr.prio > nl.prio))? nr : nl); 19 if (n == par.left) 20 ASNF(par.left, bestch); 21 else 22 ASNF(par.right, bestch); 23 if (bestch == null) { 24 ASNL(bestCh, null); 25 break; 26 } 27 if (bestch == nl) { 28 ASNF(n.left, nl.right); 29 ASNF(nL.right, n); 30 ASNL(nL, n.left); 31 } else { 32 ASNF(n.right, nr.left); 33 ASNF(nR.left, n); 34 ASNL(nR, n.right); 35 } 36 ASNL(par, bestch); 37 ASNL(bestCh, null); 38 } 39 ASNL(par, null); ASNL(n, null); ASNL(nL, null); 40 ASNL(nR, null); 41 return true; 42 } Figure 2.5: Augmenting remove with macros to dynamically enforce domination locking.

26 14 CHAPTER 2. FINE-GRAINED LOCKING USING SHAPE PROPERTIES Main Contributions The main contributions of this chapter can be summarized as follows: We introduce a new locking protocol entitled Domination Locking. We show that domination locking can be enforced and verified by considering only sequential executions [17]: if domination locking is satisfied by all sequential executions, then atomicity and deadlock freedom are guaranteed in all executions, including non-sequential ones. We present an automatic technique to generate fine-grained locking by enforcing the domination locking protocol for libraries where the heap graph is guaranteed to be a forest in between operations. Our technique can handle any temporary violation of the forest shape constraint, including temporary cycles. We present a performance evaluation of our technique on several examples, including balanced search-trees [16, 46], a self-adjusting heap [81] and specialized data structure implementations [18, 72]. The evaluation shows that our automatic locking provides good scalability and performance comparable to hand crafted locking (for the examples where hand crafted locking solutions were available). We discuss extensions and additional applications of our suggestions. 2.2 Preliminaries Our goal is to augment a library with concurrency control that guarantees strict conflict-serializability [75] and linearizability [55]. In this section we formally define what a library is and the notion of strict conflict-serializability for libraries. Syntax and Informal Semantics A library defines a set of types and a set of procedures that may be invoked by clients of the library, potentially concurrently. A type consists of a set of fields of type boolean, integer, or pointer to a user-defined type. The types are private to the library: an object of a type T defined by a library M can be allocated or dereferenced only by procedures of library M. However, pointers to objects of type T can be passed back and forth between the clients of library M and the procedures of library M. Dually, types defined by clients are private to the client. Pointers to clientdefined types may be passed back and forth between the clients and the library, but the library cannot dereference such pointers (or allocate objects of such type). Procedures have parameters and local variables, which are private to the invocation of the procedures. (Thus, these are thread-local variables.) There are no static or global variables shared by different invocations of procedures. (However, our results can be generalized to support them.)

27 2.2. PRELIMINARIES 15 stms = skip x = e(y1,...,yk) assume(b) x = new R() x = y.f x.f = y acquire(x) release(x) return(x) Figure 2.6: Primitive instructions, b stands for a local boolean variable, e(y1,...,yk) stands for an expression over local variables. We assume that body of a procedure is represented by a control-flow graph. We refer to the vertices of a control-flow graph as program points. The edges of a control-flow graph are annotated with primitive instructions, shown in Figure 2.6. Conditionals are encoded by annotating control-flow edges with assume statements. Without loss of generality, we assume that a heap object can be dereferenced only in a load ( x = y.f ) or store ( x.f = y ) instruction. Operations to acquire or release a lock refer to a thread-local variable (that points to the heap object to be locked or unlocked). The other primitive instructions reference only thread-local variables. We present a semantics for a library independent of any specific client. We define a notion of execution that covers all possible executions of the library that can arise with any possible client, but restricting attention to the part of the program state owned by the library. (In effect, our semantics models what is usually referred to as a most-general-client of the library.) For simplicity, we assume that each procedure invocation is executed by a different thread, which allows us to identify procedure invocations using a thread-id. We refer to each invocation of a procedure as a transaction. We model a procedure invocation as a creation of a new thread with an appropriate thread-local state. We describe the behavior of a library by the relation. A transition σ σ represents the fact that a state σ can be transformed into a state σ by executing a single instruction. Transactions share a heap consisting of an (unbounded) set of heap objects. Any object allocated during the execution of a library procedure is said to be a library (owned) object. In fact, our semantics models only library owned objects. Any library object that is returned by a library procedure is said to be an exposed object. Other library objects are hidden objects. Note that an exposed object remains exposed forever. A key idea encoded in the semantics is that at any point during execution a new procedure invocation may occur. The only assumption made is that any library object passed as a procedure argument is exposed; i.e., the object was returned by some earlier procedure invocation. Each heap allocated object serves as a lock for itself. Locks are exclusive (i.e., a lock can be held by at most one transaction at a time). The execution of a transaction trying to acquire a lock (by an acquire

28 16 CHAPTER 2. FINE-GRAINED LOCKING USING SHAPE PROPERTIES v V al = Loc Z {true, false, null} ρ E = V V al h H = Loc F V al l L = Loc s S = K E 2 L σ Σ = H (Loc {true, false}) (T S) Figure 2.7: Semantic domains Instruction Transition Side Condition skip x = e(y 1,..., y k ) assume(b) x = newr() x = y.f x.f = y acquire(x) release(x) return(x) return(x) σ t,e h, r, ϱ[t k, ρ, L ] σ t,e h, r, ϱ[t k, ρ[x [[e]](ρ(y 1 ),..., ρ(y k ))], L ] σ t,e h, r, ϱ[t k, ρ, L ] σ t,e h[a o], r, ϱ[t k, ρ[x a], L ] σ t,e h, r, ϱ[t k, ρ[x h(ρ(y))(f)], L ] σ t,e h[ρ(x) (h(ρ(x))[f ρ(y)])], r, ϱ[t k, ρ, L ] ρ(b) = true a dom(h) ι(r)() = o ρ(y) dom(h) ρ(x) dom(h) σ t,e h, r, ϱ[t k, ρ, L {ρ(x)} ] ρ(x) L k, ρ, L range(ϱ) : ρ(x) L σ t,e h, r, ϱ[t k, ρ, L \ {ρ(x)} ] σ t,e h, r[ρ(x) true], ϱ[t k, ρ, L ] σ t,e h, r, ϱ[t k, ρ, L ] ρ(x) L ρ(x) dom(h) ρ(x) dom(h) Table 2.1: The semantics of primitive instructions. For brevity, we use the shorthands: σ = h, r, ϱ and ϱ(t) = k, ρ, L, and omit (k, k ) = e CFG t from all side conditions. statement) which is held by another transaction is blocked until a time when the lock is available (i.e., is not held by any transaction). Locks are reentrant; an acquire statement has no impact when it refers to a lock that is already held by the current transaction. A transaction cannot release a lock that it does not hold. Whenever a new object is allocated, its boolean fields are initialized to false, its integer fields are initialized to 0, and pointer fields are initialized to null. Local variables are initialized in the same manner. Semantics Figure 2.7 defines the semantic domains of a state of a library, and meta-variables ranging over them. Let t T be the domain of transaction identifiers. A state σ = h, r, ϱ Σ of a library is a triple: h assigns values to fields of dynamically allocated objects. A value v Val can be either a location, an integer, a boolean value,or null. r maps exposed objects to true, and hidden objects to false. Finally, ϱ associates a transaction t with its transaction local state ϱ(t). A transaction-local state s = k, ρ, L S is: k is the value of the transaction s program counter, ρ records the values of its local

29 2.2. PRELIMINARIES 17 variables, and L is the transaction s lock set which records the locks that the transaction holds. The behavior of a library is described by the relations and. The relation is a subset of Σ (T (K K)) Σ, and is defined in Table A transition σ t,e σ represents the fact that σ can be transformed into σ via transaction t executing the instruction annotating control-flow edge e. Invocation of a new transaction is modeled by the relation Σ T Σ; we say that h, r, ϱ t σ if σ = h, r, ϱ[t s] where t dom(ϱ) and s is any valid initial local state: i.e., s = entry, ρ, {}, where entry is the entry vertex, and ρ maps local variables and parameters to appropriate initial values (based on their type). In particular, ρ must map any pointer parameter of a type defined by the library to an exposed object (i.e., an object u in h such that r(u) = true). We write σ σ, if there exists t such that σ t σ or there exists t, e such that σ t,e σ. Running Transactions Each control-flow graph of a procedure has two distinguished control points: an entry site from which the transaction starts, and an exit site in which the transaction ends (if a CFG edge is annotated with a return statement, then this edge points to the exit site of the procedure). We say that a transaction t is running in a state σ, if t is not in its entry site or exit site. An idle state, is a state in which no transaction is running. Executions The initial state σ I has an empty heap and no transactions. A sequence of states π = σ 0,..., σ k is an execution if the following hold: (i) σ 0 is the initial state, (ii) for 0 i < k, σ i σ i+1. An execution π = σ 0,..., σ k is a complete execution, if σ k is idle. An execution π = σ 0,..., σ k is a sequential execution, if for each 0 i k at most one transaction in σ i is running. An execution is non-interleaved if transitions of different transactions are not interleaved (i.e., for every pair of transactions t i t j either all the transitions executed by t i come before any transition executed by t j, or vice versa). Note that, a sequential execution is a special case of a non-interleaved execution. In a sequential execution a new transaction starts executing only after all previous transactions have completed execution. In a non-interleaved execution, a new transaction can start executing before a previous transaction completes execution, but the execution is not permitted to include transitions by the previous transaction once the new transaction starts executing. We say that a sequential execution is completable if it is a prefix of a complete sequential execution. Schedules The schedule of an execution π = σ 0,..., σ k is a sequence t 0, e 0,..., t k 1, e k 1 such t i,e i t that for 0 i < k: σ i σ i+1, or σ i i σi+1 and e i = e init (where e init is disjoint with all edges in the CFG). We say that a sequence ξ = t 0, e 0,..., t k 1, e k 1 is a feasible schedule, if ξ 1 For simplicity of presentation, we use an idempotent variant of acquire (i.e., acquire has no impact when the lock has already owned by the current transaction). We note that this variant is permitted by the Lock interface from the java.util.concurrent.locks package, and can easily be implemented in languages such as Java and C++.

30 18 CHAPTER 2. FINE-GRAINED LOCKING USING SHAPE PROPERTIES is a schedule of an execution. The schedule of a transaction t in an execution is the (possibly noncontiguous) subsequence of the execution s schedule consisting only of t s transitions. Notice that each feasible schedule uniquely defines a single execution because: (i) we assume that there exists a single intial state; and (ii) each instruction defined intable 2.1 is a partial function (in our semantics nondeterminism is modeled by permitting CFG nodes with several outgoing edges). Graph-Representation The heap (shared memory) of a state identifies an edge-labelled multidigraph (a directed graph in which multiple edges are allowed between the same pair of vertices), which we call the heap graph. Each heap-allocated object is represented by a vertex in the graph. A pointer field f in an object u that points to an object v is represented by an edge (u, v) labelled f. (Note that the heap graph represents only objects owned by the library. Objects owned by the client are not represented in the heap graph.). We define the allocation id of an object in an execution to be the pair (t, i) if the object was allocated by the i-th transition executed by a transaction t. An object o 1 in an execution π 1 corresponds to an object o 2 in an execution π 2 iff their allocation ids are the same. We compare states and objects belonging to different executions modulo this correspondence relation. Strict Conflict-Serializability and Linearizability conflict if: Given an execution, we say that two transitions (i) they are executed by two different transactions, (ii) they access some common object (i.e., read or write fields of the same object). Executions π and π are said to be conflict-equivalent if they consist of the same set of transactions, and the schedule of every transaction t is the same in both executions, and the executions agree on the order between conflicting transitions (i.e., the ith transition of a transaction t precedes and conflicts with the jth transition of a transaction t in π, iff the former precedes and conflicts with the latter in π ). Conflict-equivalent executions produce the same state [86]. An execution is conflict-serializable if it is conflict-equivalent with a non-interleaved execution. We say that an execution π is strict conflict-serializable if it is conflict-equivalent to a non-interleaved execution π where a transaction t 1 completes execution before a transaction t 2 in π if t 1 completes execution before a transaction t 2 in π. Assume that all sequential executions of a library satisfy a given specification Φ. In this case, a strict conflict-serializable execution is also linearizable [56] with respect to specification Φ. 2 Thus, correctness in sequential executions combined with strict conflict-serializability is sufficient to ensure linearizability. 2 Strict conflict-serializability guarantees the atomicity and the run-time order required by the linearizability property. Moreover, note that according to the linearizability property (as defined in [56]) the execution may contain transactions that will never be able to complete.

31 2.3. DOMINATION LOCKING 19 The above definitions can also be used for feasible schedules because (as explained earlier) a feasible schedule uniquely defines a single execution. 2.3 Domination Locking In this section we present the Domination Locking Protocol (abbreviated DL). We show that if every sequential execution of a library satisfies DL and is completable, then every concurrent execution of the library is strict conflict-serializable and is a prefix of a complete-execution (i.e., atomicity and deadlockfreedom are guaranteed). The locking protocol is parameterized by a total order on all heap objects, which remains fixed over the whole execution. Definition 2.1 Let be a total order of heap objects. We say that an execution satisfies the Domination Locking protocol, with respect to, if it satisfies the following conditions: 1. A transaction t can access a field of an object u, only if u is currently locked by t. 2. A transaction t can acquire an exposed object u, only if t has never acquired an exposed object v such that u v. 3. A transaction t can acquire an exposed object, only if t has never released a lock. 4. A transaction t can acquire a hidden object u, only if every path between an exposed object to u includes an object which is locked by t. Intuitively, the protocol works as follows. Requirement (1) prevents race conditions where two transactions try to update an object neither has locked. Conditions (2) and (3) deal with exposed objects. Very little can be assumed about an object that has been exposed; references to it may reside anywhere and be used at any time by other transactions that know nothing about the invariants t is maintaining. Thus, as is standard, requirements (2) and (3) ensure all transactions acquire locks on exposed objects in a consistent order, preventing deadlocks. The situation with hidden objects is different, and we know more: other threads can only gain access to t s hidden objects through some chain of references starting at an exposed object, and so it suffices for t to guard each such potential access path with a lock. Another way of understanding the protocol is that previous proposals (e.g., [28, 59, 63, 80]) treat all objects as exposed, whereas domination locking also takes advantage of the information hiding of abstract data types to impose a different, and weaker, requirement on encapsulated data. In particular, no explicit order is imposed on the acquisition or release of locks on hidden objects, provided condition (4) is maintained.

Concurrent Libraries with Foresight

Concurrent Libraries with Foresight Guy Golan-Gueta Tel Aviv University ggolan@tau.ac.il G. Ramalingam Microsoft Research grama@microsoft.com Mooly Sagiv Tel Aviv University msagiv@tau.ac.il Eran Yahav