Combining Techniques Application for Tree Search Structures

Size: px
Start display at page:

Download "Combining Techniques Application for Tree Search Structures"

Transcription

1 RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES BLAVATNIK SCHOOL OF COMPUTER SCIENCE Combining Techniques Application for Tree Search Structures Thesis submitted in partial fulfillment of requirements for the M. Sc. degree in the School of Computer Science, Tel-Aviv University by Vladimir Budovsky The research work for this thesis has been carried out at Tel-Aviv University under the supervision of Prof. Yehuda Afek and Prof. Nir Shavit June 2010

2 CONTENTS 1. Introduction Flat Combining Skip Lists The Flat Combined Skip Lists Naive Flat Combined Skip List Flat Combined Skip List with Multiple Combiners Flat Combined Skip List with Hints Performance Performance Comparison of Flat Combined Skip Lists vs JDK ConcurrentSkipListSet Flat Combining Mechanism Experimental Verifications Conclusions

3 LIST OF FIGURES 1.1 Skip list of heights 4. May be considered either as collection of fat nodes or 2-d list Skip list traversal with key 12. Traversed predecessors are shown. start level is Multi-combiner skip list. Every node with height 3 is a combiner node Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality Hints FC skip list implementation vs JDK lock-free ConcurrentSkipList- Set, uniform keys distribution Hints FC skip list implementation vs JDK lock-free ConcurrentSkipList- Set, high access locality FC skip list implementation vs multi-lock one, naive implementations, uniform keys distribution FC skip list implementation vs multi-lock one, naive implementations, high access locality FC skip list implementation vs multi-lock one, hints implementations, uniform keys distribution FC skip list implementation vs multi-lock one, hints implementations, high access locality Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality Hints mechanism success rate for pure update workloads The connection between FC intensity to throughput per thread for pure update workloads Lock-free skip list CAS per update, CAS success rate and throughput per thread for pure update workloads

4 LISTINGS 2.1 Set of Integers Interface Flat combining definitions Node definition Wait free contains is the same for all skip lists add Naive implementation scanandcombine common implementation Physical add and remove Naive implementation Multi-combiner remove implementation Optimistic (hinted) FCrequest and add implementation Optimistic (hinted) doadd and verify implementation Optimistic (hinted) multi-lock add method implementation... 22

5 ACKNOWLEDGEMENTS I would like to thank all those who made this thesis possible. I am extremely grateful to my advisors Prof. Yehuda Afek and Prof. Nir Shavit who introduced me to the world of multiprocessors and distributed algorithms and whose supervision and support enabled me to advance my understanding of the subject. My sincere thanks to Ms. Moran Tzafrir for teaching me what everyday researcher s work is about and for supplying me with the arsenal of essential tools for my work. Finally, I am grateful to my family and especially to my sister Elena for the patience and encouragement.

6 ABSTRACT Flat combining (FC) is a new synchronization paradigm allowing to reduce dramatically the synchronization costs. Use of this technique, as it was recently shown, brings significant performance gain for several popular parallel data structures, such as stacks, queues, shared counters, etc. Besides, the combining paradigm application makes a code as simple as one synchronized via single global lock. However, the question about applicability for other classes of parallel data structures has not been answered yet. This work deals with FC paradigm application to binary tree-like data structures. As it is shown below, combining is hardly suitable for these cases. The limits for FC uses have been studied, and criterion for its applicability has been justified.

7 1. INTRODUCTION Multi and many core computers appear more and more common these days. We witness recent developments of computer chips with tens of cores that consume no more space and energy than a desktop processor. In the light of this trend, the development of scalable and correct data structures becomes extremely important. The most simple and straightforward solution is to devise concurrent data structure from sequential one using global lock as synchronization primitive. Unfortunately, this solution does not scale even for relatively small number of cores. Another approach is to design fine-grained synchronization schemes using multiple locks or non-blocking read-modify-write atomic operations. This method usually requests full algorithm redesign and implementation. Additional drawback of fine-grained and, especially, lock-free synchronization is theirs high complexity. It is very difficult to formally prove the correctness of such data structures (See, for example, [3] and [4] proofs). 1.1 Flat Combining Flat combining [7] programming paradigm allows to achieve high level of concurrency while preserving of code simplicity. The main idea behind the flat combining is to attach the public actions registry to existing sequential data structure. Each thread, before accessing to shared data, publishes its action request in the registry, and then tries to access the global lock. The winning thread becomes combiner, scans the registry and performs all found requests. Other threads simply wait for theirs fulfilled actions results, spinning on thread local Done flag. There are several benefits of this strategy: Low synchronization cost, comparing to global lock since there is only one competition round for acquiring the shared lock, and every thread - winning or missing - returns with its request performed. The combiner can use its knowledge about all requests and fulfill part of them without access to data structure. For stack, for example, the combiner may collect push/pop pairs and to return the results to appropriate callers. This technique is well-known and called elimination. For shared counter, the combiner can calculate the total counter change and update data structure only once. This technique, called combining, is also widely used. The variants of FC algorithm are described in details in Chapter 2 (The Flat Combined Skip Lists). The flat combining is proven very efficient for data structures with hot spots, such as stack head, queue ends, priority queue head, and so on. It also shows good results when synchronization costs are high. For

8 1. Introduction 2 Fig. 1.1: Skip list of heights 4. May be considered either as collection of fat nodes or 2-d list example, lock-free synchronous queues [16] demonstrate good throughput but moderate scalability, which can be improved using elimination or FC techniques. However, the question of FC usefulness for data structures without emphasized bottlenecks and high synchronization costs remains opened. This work studies the flat combining applicability for binary tree-like data structures, the ones with O(log n) access time, allowing range operations. 1.2 Skip Lists Tree search structures are, probably, the most popular and widespread ones. It is hard to find computer science or software programming area that does not use them. Their practical applications start with the most popular red-black tree [6], which is used nearly in any algorithms library, including C++ STL library [17] and Java TM SDK [18], and AVL tree [1], which is very popular for search dominated workloads, then continue with various B-trees [2], which are useful for block-organized memories, and finish with specialized suffix tries, splay trees, spatial search trees, persistent trees, etc. Since all of the above algorithms deal with large amount of data, and many of them run inside various operating systems or used as various search indexes inside databases, the distributed and multi-threaded decisions for search trees are in focus of many researches and commercial projects. The comprehensive survey of concurrent binary search trees is given in [13]. The common problem with all search trees mentioned above is that they either static (do not allow add/remove without full rebuild) or need re-balancing mechanism after updates in order to preserve logarithmic access time. In most of the cases, the re-balance scope is unknown prior to update action, and that makes the design of fine-grained synchronization for binary search trees very complicated task. That is why the skip lists were chosen as the basic data structure for the research. There were several reasons for the decision: Skip list is simple and has no re-balancing overheads, which simplifies measures.

9 1. Introduction 3 Fig. 1.2: Skip list traversal with key 12. Traversed predecessors are shown. start level is 3. Skip list is the only known concurrent lock-free binary search structure. Skip list was invented [15] in 1990 as a probabilistic alternative for binary search trees. Skip list is a linked list of fat nodes (Figure 1.1), where each one has randomly chosen height (number of levels). Every node has a unique key, and the nodes appear ordered in the list. Each node is connected at each level with the successor at the same level. The random level is chosen using geometrical distribution: the probability that the node has layer i, i 0 is 1 p,p > 1. So, every node has layer 0, and, if node has layer i, it, with the i probability of 1 p, has also layer i+1. In practice, p is usually chosen between 2 to 4. Such distribution gives O(log N) skip list maximal node height expectation, and between every two nodes of height k, p-1 nodes of height k-1 are expected to appear. It is useful to add two immutable nodes head and tail with highest possible level and to manage real highest level (start level) on every add or delete. Alternatively, the skip list may be represented as a collection of sorted lists with unique keys L 1,L 2,...,L k, such as i > j L i L j and all nodes with equal keys form vertical lists. The later representation is especially convenient for lock-free implementations, where all of the updates are implemented through atomic read-and-updates operations. Denote the next node to node n at level l as next l (n), and the key of n as key(n) The simple sequential list works in the following way: Initially, empty list contains head and tail with keys of and + correspondingly. The head node is connected to tail at every possible level, and actual start level is 0. Listtraversalwithkeyk startsfromnoden = headatlevell = start level, and proceeds at this level searching the pair of nodes (pred, succ), such that next l (pred) == succ and key(pred) < k key(succ). Set l = l 1 and n = pred and repeat the search. The process continues until 0 level achieved. Figure 1.2 illustrates the pred nodes observed during traversal with key 12. contains(k) simply calls the traversal with key k. It is unnecessary to

10 1. Introduction 4 proceed to the bottom, once the desired key is found, the traversal is interrupted and found node is returned. add(k) starts from generating random height h, as described above. After that, the traversal algorithm is performed, collecting h bottom pred and suss nodes. Once the node is not found (for pure set implementation), the new node of height h with key k is linked to collected nodes. remove(k) starts from the traversal run. Once, the node suss with key(suss) == k isobservedonthehighestlevelh,alltraversedpred nodes are collected. After reaching the bottom level, all collected next i (pred) references are set to next i (suss), and suss node memory is freed. After every update operation, start level is verified and updated, if needed. There are two cases - when adding the node with height h > start level, start level is set to h, and, when removing the node of start level height, to find the highest level h such that next h ( head) tail, and to set start level to h. Note, that the traversal algorithm performs O(1) expected steps at each level, and that the number of levels expected to be logarithmic to nodes number, and therefore, skip list has expected logarithmic access time. The above schema, short of some small variations, is used in the most of lock based concurrent skip lists, and our implementations use it as well. The differences in the implementations ([14], [8], [11]) are concerning locking schemes and state flags devised for consistency, linearizability [10] and skip list invariants preserving. Lock free skip lists, in contrast, cannot maintain skip list invariants - this approach needs multiple locations read-and-update atomic operations, unsupported on the most of existing platforms. The lock-free implementations ([5], [9]) use relaxed skip list algorithms, where the question about node existence is answered only on the bottom list level, and the other levels are regarded as sort of index allowing to reach the bottom level in expected logarithmic time, and skip list structure can be violated at the particular execution moments.

11 2. THE FLAT COMBINED SKIP LISTS All our FC skip list variants are implemented both in Java and C++ with minimal differences. C++ implementations require memory management and explicit memory barriers, while in Java implementations the memory barriers are introduced implicitly through volatile flags store/load operations. We have chosen to present only Java implementations in order to avoid memory management issues and to have clear and standard competitor - all performance comparisons use Java SDK lock-free ConcurrentSkipListSet [18]. The flat combined skip list implements the simplest integers set interface: Listing 2.1: Set of Integers Interface 1 public interface SimpleIntSet { 2 / 3 Add item to map key key to add; true if added, 6 false if the key already exists on the map 7 / 8 boolean add(int key ); 9 / 10 Removes item from the map key key to remove; true if removed, 13 false if the key does not exist on the map 14 / 15 boolean remove(int key ); 16 / 17 Verify if the item is on the map thread id key true if item exists, false otherwise 21 / 22 boolean contains(int key ); 23 } The add and remove methods use flat combining paradigm, while contains method is implemented wait free. The coexistence of flat combining and wait free methods requires special treatment for linearization points, since flat combining data is invisible for lock-free contains. Define FCData and FCRequest:

12 2. The Flat Combined Skip Lists 6 Listing 2.2: Flat combining definitions 1 class FCRequest{ 2 int key ; // Key 3 boolean response ; // Operation result 4 volatile int opcode = NONE; // Action 5 } 6 7 class FCData { 8 public FCRequest requests []; //Submitted requests 9 public AtomicInteger lock ; // FC node lock 10 } The FCData may be attached to one or several skip list nodes The skip list node class is: Listing 2.3: Node definition 1 class Link{ public Link next ; 4 public Node node ; 5 public Link up; 6 public Link down; 7 } 8 class Node { public int numlevels(){ // Node height 11 return links. length ; 12 } 13 // Node is FC when it has FC data 14 public boolean isfcnode(){ 15 return fcdata!= null ; 16 } 17 public Link at(int index){ // Get link at level 18 return links [ index ]; 19 } 20 public Link bottom(){ // The bottom link 21 return links [0]; 22 } 23 public Link top(){ // The top link 24 return links [ links. length 1]; 25 } 26 public final int key ; 27 public volatile boolean deleted = false ; 28 public volatile boolean fully connected = false ; 29 public FCData fc data ; 30 // 2D list of links with random access 31 // Link contains reference to next, up and down links 32 private Link [] links ; 33 }

13 2. The Flat Combined Skip Lists 7 Till now, the skip list is the regular single threaded one, save for two details - deleted and fully connected flags and FCData reference (which is not null for flat combining nodes). The contais method is also very similar to single threaded implementation: Listing 2.4: Wait free contains is the same for all skip lists 1 public boolean contains () { 2 int level = start level ; // Adoptable start level 3 Link pred = head. at( level ); 4 Link curr = null ; 5 6 for (; level >= 0; level, pred = pred. down) { 7 curr = pred. next ; 8 while (inkey > curr. node. key) { 9 pred = curr ; 10 curr = pred. next ; 11 } if (inkey == curr.node.key) 14 return (! curr. node. deleted && 15 curr.node. fully connected ); 16 } 17 return false ; 18 } The only distinguishing detail is the check of deleted and fully connected flags. The difference comes with add and remove implementations. We will present implementations for several flat combined lists variants. 2.1 Naive Flat Combined Skip List The first simplest implementation is Naive FC list. It has exactly one combiner node (the head one). The thread performing add or remove action: 1. Puts its FCRequest into head node FCData. 2. Tries to acquire lock. 3. If succeeded, scans and fulfills the requests 4. Else, the thread spins on its own request completion flag and checks lock state. If request fulfilled, the thread returns with desired result, otherwise, if lock is unlocked, continue from 2. The Listing 2.5 presents add method implementation.

14 2. The Flat Combined Skip Lists 8 Listing 2.5: add Naive implementation 1 public boolean add(int key) { 2 // Put my request to node s fc data 3 FCRequest my request = 4 head. fc data. req ary [ThreadId. getthreaid ()]; 5 my request.key = key; 6 // Volatile write, from here combiner sees it 7 my request. opcode = ADD; 8 AtomicInteger lock = fc node. fc data. lock ; 9 do { 10 if (0 == lock. get () && // TTAS lock 11 lock.compareandset(0, 0xFF)) { 12 // Perform all found requests 13 scanandcombine(fc node ); 14 lock. set (0); // Unlock 15 return my request. response ; 16 } else { 17 do { 18 Thread. yield (); // Give up processor 19 // Somebody did my work 20 if (my request. opcode == NONE) 21 return my request. response ; 22 }while(0!= lock. get ()); 23 } 24 } while(true); 25 } The remove method differs from the above one only by REMOVE opcode All the work is performed within scanandcombine method, which is the same for all following implementations: Listing 2.6: scanandcombine common implementation 1 protected void scanandcombine(node fc node) { 2 for(fcrequest curr req : fc node. fc data. requests ) { 3 switch(curr req. opcode) { 4 case ADD: 5 curr req. response = doadd(fc node, curr req. key, 6 curr req. pred ary, curr req. succ ary ); 7 curr req. opcode = NONE; // Release waiting thread 8 break; 9 case REMOVE: 10 curr req. response=doremove(fc node, curr req. key, 11 curr req. pred ary, curr req. succ ary ); 12 curr req. opcode=none; // Release waiting thread 13 break; 14 } 15 } 16 }

15 2. The Flat Combined Skip Lists 9 Here, the combiner thread scans all requests and performs modifications. Both doadd and doremove methods receive the containers for predecessors and successors nodes - technical detail which allows reusing of the memory in case of Naive list, but which is used in different way in other implementations. Beside this, fc node parameter indicates the start node for search - it is not relevant for single combiner list, but important to multi-combiner one, described below. The doadd/doremove methods act exactly as in case of single threaded skip list: Listing 2.7: Physical add and remove Naive implementation 1 private boolean doadd(node fc node, int key, 2 RandomAccessList<Link> pred ary, 3 RandomAccessList<Link> succ ary ){ 4 // New node height has to be known in advance 5 // in order to restrict nodes collection. 6 int top level = randomlevel (); 7 //Find placement and nodes to connect. 8 Node found node = find(fc node, key, pred ary, 9 succ ary, top level, true); 10 if (found node == null){// Node not on map 11 Node new node = new Node(key, 12 top level, false ); 13 Link new link = new node.bottom(); 14 RandomAccessList<Link>.BiDirIterator prediter = 15 pred ary. begin (); 16 RandomAccessList<Link>.BiDirIterator succiter = 17 succ ary. begin (); 18 // Connect new node 19 for (int level = 0; level < top level ; ++level, 20 new link = new link. up) { 21 new link. next = succiter. data ; 22 prediter. data. next = new link ; 23 prediter = prediter. next (); 24 succiter = succiter. next (); 25 } 26 // Linearization point 27 new node. fully connected = true; 28 return true; 29 } 30 return false ; 31 } 32 private boolean doremove(node fc node, int key, 33 RandomAccessList<Link> pred ary, 34 RandomAccessList<Link> succ ary){ 35 // Find node to delete and its predecessors. 36 Node found node = find(fc node, key, pred ary, 37 succ ary, fc node. num levels (), false ); if (found node!= null){ 40 int top level = found node. num levels ();

16 2. The Flat Combined Skip Lists // Get link on top level 42 Link lnk = found node. top (); 43 // Topmost predecessor 44 RandomAccessList<Link>.BiDirIterator prediter = 45 pred ary. rbegin (); 46 found node. deleted = true; // Logical delete 47 for (int level = 0; level < top level ; ++level, 48 lnk = lnk. down, prediter = prediter. prev ()) { 49 // Physical delete 50 prediter. data. next = lnk. next ; 51 } 52 return true; 53 } 54 return false ; 55 } In this implementation we use fast random number generator described in [12], the similar one is adopted in JDK s lock-free list. Consider the properties of the above skip list implementation. Property Naive skip list is deadlock free. Proof. The implementation uses only one lock. Therefore, the deadlock free implementation of the lock implies deadlock freedom of the data structure. Property Naive skip list update operations do not overlap each other and have strict total order. Proof. Consider two arbitrary update operations on the list. All modification are performed by the combiner thread during combining session (Listing 2.6). The combining sections are strictly ordered by single lock and do not overlap, and, so, if the operations belong to different sessions, the order is defined by the lock acquiring order. Otherwise, if the updates belong to the same session, the order is defined by combine algorithm - the combiner performs updates sequentially, and any two modifications do not overlap. Proposition Naive skip list is linearizable. Proof. Select linearization points for skip list updates: For add: the row 27 (Listing 2.7), where fully connected flag is set to true. For remove: the row 46 (Listing 2.7), where deleted flag is set to true. Use linearizability of OptimisticSkipList proved in [8]. Note, that by Property 2.1.2, all updates performed on our skip list may be regarded as performed by single dedicated thread. Therefore, since initial preconditions are identical for both OptimisticSkipList and Naive one, modifications of the next references and deleted and fully connected flags appear in program order exactly as in OptimisticSkipList, the Naive skip list state may be considered exactly equal to OptimisticSkipList one, where all modifications on the least are performed by single thread. Then, for each possible concurrent run on Naive skip list,

17 2. The Flat Combined Skip Lists 11 Fig. 2.1: Multi-combiner skip list. Every node with height 3 is a combiner node there is a run on OptimisticSkipList, where both skip lists states defined by the next references and flags are identical at every point of time, and so, the OptimisticSkipList linearization order is applicable to Naive skip list As expected, the flat combining in this implementation exposes the sequential bottleneck, very comparable to the global lock. In Section 3 (Performance) this estimation is verified. 2.2 Flat Combined Skip List with Multiple Combiners The second attempt is the introduction of several combiners, that allow to make several modifications simultaneously and, therefore, to improve scalability. The multi-combiner skip list is implemented with statically distributed immutable combiners. The idea is to divide the skip list into non-intersecting parts, such that every part is managed by some combiner node. The multi-combiner skip list is shown on Figure 2.1. Suppose, that we start from initially filled skip list of size N and have to add c < N combiners. We choose some heights h c such that number of nodes with height h h c is at least c, and make them to be combiner nodes by adding FCData to each one. In this work, only static multi-combiner skip lists are studied. The dynamic lists may be devised by alternating h c value - the process requires consecutive locking of all FC nodes layers, converting of needed layer to combiners/non-combiners and re-scheduling of all pending combining requests. Since, by its essence, flat combining has to use a very small number of combiners (otherwise, it does not differ from sort of fine-grained synchronization), the process is rare and do not expensive. Multi-combiner skip list acts very similar to single-combiner one. As it was mentioned early, the contains method is exactly the same, while add/remove single difference is that the requests are placed to appropriate combiner nodes instead of head one. The updating thread: 1. Finds combiner node fc node responsible to modification area. 2. Puts its FCRequest into fc node s FCData.

18 2. The Flat Combined Skip Lists Tries to acquire FCData lock. 4. If succeeded, scans and fulfills the requests 5. Else, spins on its own request completion flag and checks lock state. If request is fulfilled, returns with desired result, otherwise, if lock is unlocked, continue from 3. Listing 2.8: Multi-combiner remove implementation 1 public boolean remove(int key) { 2 //Get responsible combiner 3 Node fc node = findcombiner(key ); 4 // Put my requesrt to node s fc data 5 FCRequest my request = 6 fc node. fc data. req ary [ThreadId. getthreaid ()]; 7 my request.key = key; 8 // Volatile write, from here combiner sees it 9 my request. opcode = REMOVE; 10 AtomicInteger lock = fc node. fc data. lock ; 11 do { 12 // TTAS lock 13 if (0 == lock. get () && 14 lock.compareandset(0, 0xFF)) { 15 // Perform all found requests 16 scanandcombine(fc node ); 17 // Unlock 18 lock. set (0); 19 return my request. response ; 20 } else { 21 do { 22 Thread. yield (); 23 // Somebody did my work 24 if (my request. opcode == NONE) 25 return my request. response ; 26 }while(0!= lock. get ()); 27 } 28 } while(true); 29 } The method findcombiner is wait-free and is implemented similar to contains. It has three differences - 1. The search goes down to the lowest combiners level and does not proceed to the bottom. 2. The search returns the lowest combiner predecessor of the key. 3. Since combiners are immutable, there is no need to check their deleted flag. The multi-combiner skip list properties are similar to Naive list ones.

19 2. The Flat Combined Skip Lists 13 Property Multi-combiner skip list is deadlock free. Proof. As it follows from the algorithm, no thread try to hold more than one lock simultaneously. Then, the deadlock is impossible. Practically, the multi-combiner design divides the data structure into disjoint set of single combiner Naive lists. Call these lists combining clusters and the combiner, responsible for the cluster cluster head. Then, the properties of Naive FC lists are applicable for every combining cluster. Instead of strict total order, all updates operations of multi-combiner list form strict partial order, where operations on different clusters are commutative - the operations can be reordered without affecting the final state of data structure. Proposition Multi-combiner skip list is linearizable. Proof. Follows from linearizability of each cluster and the fact that linearizability is compositional (Theorem 1 from [10]) The multi-combiner skip list scales much better than single-combiner one, but still perform a lot of work sequentially. The next try is to reduce this part of the execution by hints mechanism. 2.3 Flat Combined Skip List with Hints Hints mechanism is inspired by optimistic skip list [8]. The idea is to collect in wait-free optimistic manner the links that have to be updated, to acquire the lock, verify (and re-find, if needed) the links and then to perform update. The Listing 2.9 shows FCrequest structure supplemented with hints and add method. Listing 2.9: Optimistic (hinted) FCrequest and add implementation 1 class FCRequest{ 2 int key; // Key 3 boolean response ; // Operation result 4 volatile int opcode = NONE; // Action 5 int top level // hints size 6 RandomAccessList<Link> pred ary ; //Collected hints 7 RandomAccessList<Link> succ ary ; //Collected hints 8 } 9 10 public boolean add(int key) { 11 //Get responsible combiner 12 Node fc node = findcombiner(key ); 13 FCRequest my request = 14 fc node. fc data. req ary [ThreadId. getthreaid ()]; 15 // We have to know level prior to find in order 16 // to restrict hints size 17 int top level = randomlevel (); 18 Node found node ; 19 do{ 20 // Find placement and f i l l hints data

20 2. The Flat Combined Skip Lists found node = find(fc node, key, my request. pred ary, 22 my request. succ ary, top level, true, true); 23 }while(found node!= null && found node. deleted ); 24 // Node already exists 25 if (found node!= null) 26 return false ; 27 // Put my request to node s fc data 28 my request. top level = top level ; 29 my request. key = key; 30 // Volatile write, from here combiner sees it 31 my request. opcode = ADD; 32 AtomicInteger lock = fc node. fc data. lock ; 33 do { 34 // TTAS lock 35 if (0 == lock. get () && 36 lock.compareandset(0, 0xFF)) { 37 // Perform all found requests 38 scanandcombine(fc node ); 39 // Unlock 40 lock. set (0); 41 return my request. response ; 42 } else { 43 do { 44 Thread. yield (); 45 // Somebody did my work 46 if (my request. opcode == NONE) 47 return my request. response ; 48 }while(0!= lock. get ()); 49 } 50 } while(true); 51 } The internal doadd and dodelete (Listing 2.10) methods are also slightly modified, since we have to verify and re-fill, if needed, the collections of the predecessors and the successors. The verify method checks if all collected nodes are correct, i. e. they are non-deleted and connected, and each predecessor s next reference points to the appropriate successor, and collected nodes keys suit the requested key. Listing 2.10: Optimistic (hinted) doadd and verify implementation 1 private boolean doadd(node fc node, int key, int top level 2 RandomAccessList<Link> pred ary, 3 RandomAccessList<Link> succ ary ){ 4 Node found node = null ; 5 // Verify data and re f i l l if needed 6 if (! verify (key, pred ary, succ ary, top level )){ 7 found node = find(fc node, key, pred ary, 8 succ ary, top level, true, false ); 9 } 10 // From here, as in \textit{naive} list

21 2. The Flat Combined Skip Lists } protected boolean verify (int key, 15 RandomAccessList<Link> predary, 16 RandomAccessList<Link> succary, 17 int top level ) 18 { 19 RandomAccessList<Link>.BiDirIterator prediter 20 = predary. begin (); 21 RandomAccessList<Link>.BiDirIterator succiter 22 = succary. begin (); 23 for(int ilevel = 0; ilevel < top level ; ++ilevel, 24 prediter = prediter. next(), succiter = succiter. next()){ 25 Link pred = prediter. data ; 26 Link next = succiter. data ; 27 if (pred. node. deleted next. node. deleted 28! pred. node. fully connected 29! next. node. fully connected 30 pred. next!= next 31 pred. node. key >= key next. node. key < key) 32 return false ; 33 } 34 return true; 35 } As its predecessors, the hinted skip list is deadlock free and linearizable. The deadlock freedom is obvious, since this implementation uses exactly the same locking scheme as previous ones. The linearizability may be devised from the fact that if verify fails, the hints skip list algorithm is identical to naive one. Otherwise verify success guarantees that the state of all memory that has to be updated is identical to one when data was collected, and therefore, all preconditions, mentioned in linearizability proof for OptimisticSkipList hold, and the proof is applicable also for hints skip list. The hints mechanism is applied to both single- and multi-combiners lists. As it is shown in Chapter 3 (Performance), the optimistic approach is very efficient, especially when update rate is not high.

22 3. PERFORMANCE For the performance verifications, we use the skip lists described above and several additional data structures designed to verify flat combining impact. The JDK ConcurrentSkipListSet by Doug Lea is used as a main competitor - by now, it is a one of the most efficient and scalable skip list implementations. Computations were performed on Sun TM SPARC R Enterprise T5140 server powered by two UltraSPARC T2 Plus processors. Each processor contains eight cores running eight hardware threads, which gives 128 total hardware threads per system. The benchmarked algorithms notation is: FC-Naive-0 - Naive FC-list with 0 non-head combiners. FC-Hints-64 - Hinted FC-list with at least 64 non-head combiners - the combiners distribution algorithm was described in Section 2.2 JDK - JDK ConcurrentSkipListSet (based on ConcurrentSkipListMap). ML-0, ML-64 - Multi-lock skip lists with 0 and 64 non-head locks correspondingly - the data structure, designed to isolate combining effect from combiners distribution one. Generally, it is multi-combiners skip list, where the FCData structures are substituted with simple locks. The updating thread locks appropriate locking node, makes the update and releases lock - instead of making all the combining algorithm. ML-hints-0, ML-hints-64 - Multi-lock optimistic skip lists with 0 and 64 nonhead locks correspondingly using hints mechanism exactly as flat combining one does. FC-Ideal-64 - The artificial FC-list made from FC-list with hints. Here, we assumed, that hints are always successful, and the combiner only work is to update the next references. This data structure gives an indication about maximal FC skip list performance, when the combiner fulfills all its requests sequentially. Experiments were performed on data structures with initial size of about keys. Actually, before selecting this size, the base skip list implementations were roughly benchmarked for wide range of sizes - from one hundred to few millions. The relations between run times for different skip list implementations were very similar for different sizes, and therefore, every initial size was representative enough to show qualitative differences between algorithms. The access locality factor was introduced to simulate different workloads. Suppose that the experiment is performed for keys space S = {1,2,...,N}. The access locality factor k,1 k N is defined in the following way: the keys in the

23 3. Performance 17 benchmark are uniformly selected from the S k = {t,t + 1,...,t + N/k}, where t is selected uniformly from S at the start of the run, and is changed slowly during the execution. The access locality factor of 1, therefore, corresponds to uniformly distributed keys from S. The factor increase means that the keys are selected from the smaller interval, and so the contention increases. 3.1 Performance Comparison of Flat Combined Skip Lists vs JDK ConcurrentSkipListSet The first group of benchmarks compares the flat combining skip list implementations throughput with JDK ConcurrentSkipListSet s one. Figure 3.1 presents the benchmark results for Naive flat combining using uniformly distributed values. The graphs show that single combiner implementation fails to compete with SDK list even for read-dominate loads, when implementation with 64 combiners shows scalability even for write only loads. The picture changes dramatically when workload locality increases. Figure 3.2 depicts the same data structures, where all requests are selected from 1/128 of total keys space. In this case, naive FC skip list lose to SDK one even for read dominated workloads - when number of running threads increases enough, and multiple combiners do not help. The next group of runs deals with improved optimistic skip list, using hints mechanism described above in Chapter 2 (The Flat Combined Skip Lists). Figure 3.3 shows the benchmark results for uniformly distributed requests, when Figure 3.4 depicts the runs with high locality access. The presented graphs show significant performance gain due to optimistic approach. For read-dominated workloads, both single- and multi-combiner lists perform better than SDK for all workload localities. For higher update operations rate, multi-combiners list competes well with SDK data structure, while single combiner one shows lack of scalability, especially for high access locality. So far, we can conclude that at least hinted variant of combining skip list is simple and effective alternative to SDK decision. It is clear enough that for read-dominated workloads lock-free list performs worse than ones with lock protected updates and lock-free contains. The first reason for more effective read is that FC lists contains (Listing 2.4) performs only two volatile reads, while lock-free implementations require all next references to be volatile, and therefore, need log N volatile reads. The second reason is that all known lockfree skip list implementations conclude about node presence only after reaching the bottom skip list level, when our implementation stops if node with desired key is found on any level. However, it remains not clear yet what the combiner mechanism impact on the presented results.

24 3. Performance 18 Fig. 3.1: Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution

25 3. Performance 19 Fig. 3.2: Naive FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality

26 3. Performance 20 Fig. 3.3: Hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, uniform keys distribution

27 3. Performance 21 Fig. 3.4: Hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality

28 3. Performance Flat Combining Mechanism Experimental Verifications. In this section we experimentally verify in depth the FC impact on skip list behavior. The first experiments compare flat combining implementations with especially designed multi-lock skip list. Multi-lock skip list is devised from flat combining one by replacing FCData by simple lock. It has single- and multilocks implementation, exactly as FC skip list has, and may be extended with hints mechanism as well. The multi-lock skip list with hints add method is shown at Listing 3.1. The method doadd called at row 26 is identical to flat combined one presented at Listing 2.9 Listing 3.1: Optimistic (hinted) multi-lock add method implementation 1 public boolean add(int key) { 2 //Get responsible combiner 3 Node lock node = findlocknode(key ); 4 5 // We have to know level prior to find in order 6 // to restrict hints size 7 int top level = randomlevel (); 8 // Thread local hints lists 9 int thread id = ThreadId.getThreadId (); 10 RandomAccessList<f c j a v a. MultiLockSkipListFH. Link> 11 succ ary = succ ary [ thread id ]; 12 RandomAccessList<f c j a v a. MultiLockSkipListFH. Link> 13 pred ary = pred ary [ thread id ]; Node found node ; 16 do{ 17 found node = find(lock node, key, pred ary, 18 succ ary, top level, true, true); 19 }while(found node!= null && found node. deleted ); 20 if (found node!= null) 21 return false ; 22 // Acquire lock and perform modification 23 AtomicInteger lock = lock node. node lock ; 24 do { // TTAS lock 25 if (0 == lock. get () && lock.compareandset(0, 0xFF)) { 26 doadd(thread id, lock node, key, pred ary, 27 succ ary, top level ); 28 // Release lock 29 lock. set (0); return true; 32 } else // Give up processor 33 Thread. yield (); 34 } while(true); 35 } Instead of placing the request and running the flat combining algorithm, the

29 3. Performance 23 updating thread finds appropriate lock node, acquires the lock and performs the change. The following graphs compare between multi-lock to FC Naive skip lists. We can see that for both low (Figure 3.5) and high (Figure 3.6) localities, and for any update rates both lists behave very similar. The multi-lock skip list even tends to perform slightly better for low access locality than its FC counterpart. It may be explained by additional overheads that flat combining introduces - the combiner thread has to read and maintain the FC registry and to write back the operations result. All this, if not compensated by FC gains that were described above, leads to performance decrease. The benchmarks of Hints versions of multi-lock and FC skip lists are shown on Figures 3.7 and 3.8 for low and high access locality. The hints mechanism introduction improves performance of both lists, but does not change the ratio between algorithms- both behave very similar with light preference to multi-lock skip list for low access locality. As it is mentioned before, flat combining, besides opening contention bottleneck, allows using the knowledge about all pending request for optimizing data structure updates. For tree-like data structures, and for skip lists in particular, the elimination and combining techniques can be applied for optimizing the data structure traversal, but it is very hard to use them for optimizing data structure update. For the next group experiments, we assumed that the traversal is perfectly optimized, i.e. our hints mechanism never fails. In practice, we replaced the verify method in Listing 2.10 with one always returning true, and supplied every nodes with additional dummy next references. The combiner, instead of writing to real next references, updates the equal quantity of dummy next ones. These benchmarks are presented on Figures 3.9 and 3.9, and show that FC skip list with ideal hints mechanism competes well with lock-free one, and fails only for high access locality and more than 50% update rate, and, so, hints mechanism verification and improvement makes sense. The next graph (Figure 3.11) shows our hints mechanism efficiency. As it follows from the graph, the hints are very close to ideal for uniform access and fall to about 50% failures, when threads number grows to 64. This result explains the scalability turning point between 16 to 32 threads for high access locality and high update rate. Note, that for ideal hints list the turning point also exists, but appears slightly later and is not so sharp. So, the problematic scalability of FC list caused, probably, by the flat combining itself.

30 3. Performance 24 Fig. 3.5: FC skip list implementation vs multi-lock one, naive implementations, uniform keys distribution

31 3. Performance 25 Fig. 3.6: FC skip list implementation vs multi-lock one, naive implementations, high access locality

32 3. Performance 26 Fig. 3.7: FC skip list implementation vs multi-lock one, hints implementations, uniform keys distribution

33 3. Performance 27 Fig. 3.8: FC skip list implementation vs multi-lock one, hints implementations, high access locality

34 3. Performance 28 Fig. 3.9: Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipList- Set, uniform keys distribution

35 3. Performance 29 Fig. 3.10: Ideal hints FC skip list implementation vs JDK lock-free ConcurrentSkipListSet, high access locality

36 3. Performance 30 Fig. 3.11: Hints mechanism success rate for pure update workloads Fig. 3.12: The connection between FC intensity to throughput per thread for pure update workloads The next two benchmarks, performed for pure update workload, intend to answer the question why lock-free list scales better than FC one. For flat combining loading estimation we introduce the FC intensity - the factor showing

37 3. Performance 31 additional combiner work. It is calculated in following way: < FC intensity >= < Fulfilled requests per FC session > 1 < Number of threads > This number is 0 for single threaded execution, and tend to 1 for large number of threads, when one combiner fulfills the requests of all other threads. The Figure 3.12 shows FC intensity together with throughput per thread for different number of combiners and workload localities, and the FC intensity increase is followed by throughput decrease (note, that the ideal scalability is horizontal line). The jump of intensity between 16 to 32 threads corresponds well with graphs on Figures 3.3 and 3.4 for 50% add / 50% remove workload. The jump may be explained in following way: starting from some number of threads, the combiner has no time to complete all the requests during the period, when the released thread prepares the new request, and so, the competition for lock newer interrupts. On the other hand, for 64 combiners and low locality, the jump has not happened, and algorithm is scalable. The Figure 3.13 shows lock-free list statistics for pure update workload. As it follows from the graphs, the CAS success rate never drops below 75% and CAS number is as small as CAS per update, which explains good algorithm scalability. Fig. 3.13: Lock-free skip list CAS per update, CAS success rate and throughput per thread for pure update workloads

38 4. CONCLUSIONS We studied several approaches for flat combining technique application to skip listbasedmaps. Asitwasshownonskiplistexample, forthestructuresallowing concurrent updates, the fine-grained and especially lock-free synchronizations are preferable to FC. This conclusion does not completely deny usefulness of the FC application for such structures since for read dominated workloads and for several update request distributions flat combining behaves better than lock-free synchronization. It is also possible that for different hardware the FC approach will show better scalability. The breakthrough can also come from FC algorithm improvements. It is possible, for example, to transform FC into some sort of job dispatcher: having all the requests, it can form mutually non-conflicting groups, so the waiting threads can execute them without synchronization. Such design faces the problems with additional FC overhead for sorting and analyzing the requests, but may be applicable for NUMA or client-server architectures. It is interesting also to study the FC implementation for other popular data structures - such as B-trees or Red-Black trees, where lock-free alternatives do not exist, and fine-grained locking requires complicated read-write locks. The FC s benefit of simplicity and proved linearizability may be valuable for these cases. Another, albeit auxiliary, data structure - multi-lock skip list - may be interesting by itself. It showed characteristics as good as FC skip list, but it is simpler, needs less memory and gives more uniform latency for update requests. The idea to build the small index, protected by locks (locked or FC layers), and entirely wait-free data structure body can replace hand-by-hand fine-grained synchronization schemes for tree-like structures.

39 BIBLIOGRAPHY [1] Adelson-Velskii, G. M., and Landis, E. M. An algorithm for the organization of information. Soviet Math. Doklady, 3 (1962), [2] Bayer, R., and McCreight, E. Organization and maintenance of large ordered indices. In SIGFIDET 70: Proceedings of the 1970 ACM SIG- FIDET (now SIGMOD) Workshop on Data Description, Access and Control (New York, NY, USA, 1970), ACM, pp [3] Colvin, R., Groves, L., Luchangco, V., and Moir, M. Formal verification of a lazy concurrent list-based set algorithm. In CAV (2006), pp [4] Doherty, S., Groves, L., Luchangco, V., and Moir, M. Formal verification of a practical lock-free queue algorithm. In In FORTE (2004), Springer, pp [5] Fraser, K. Practical lock freedom. PhD thesis, Cambridge University Computer Laboratory, Also available as Technical Report UCAM- CL-TR-579. [6] Guibas, L. J., and Sedgewick, R. A dichromatic framework for balanced trees. In SFCS 78: Proceedings of the 19th Annual Symposium on Foundations of Computer Science (Washington, DC, USA, 1978), IEEE Computer Society, pp [7] Hendler, D., Incze, I., Shavit, N., and Tzafrir, M. Flat combining and the synchronization-parallelism tradeoff. In SPAA (2010), pp [8] Herlihy, M., Lev, Y., Luchangco, V., and Shavit, N. A simple optimistic skiplist algorithm. In SIROCCO 07: Proceedings of the 14th international conference on Structural information and communication complexity (Berlin, Heidelberg, 2007), Springer-Verlag, pp [9] Herlihy, M., and Shavit, N. The art of multiprocessor programming. Morgan Kaufmann, [10] Herlihy, M. P., and Wing, J. M. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12 (1990), [11] Lotan, I., and Shavit., N. Skiplist-based concurrent priority queues. In Proc. of the 14th International Parallel and Distributed Processing Symposium (IPDPS) (2000), pp

A Simple Optimistic skip-list Algorithm

A Simple Optimistic skip-list Algorithm A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems Laboratories yosef.lev@sun.com Victor Luchangco Sun

More information

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists Companion slides for The by Maurice Herlihy & Nir Shavit Set Object Interface Collection of elements No duplicates Methods add() a new element remove() an element contains() if element

More information

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access Philip W. Howard, Josh Triplett, and Jonathan Walpole Portland State University Abstract. This paper explores the

More information

Distributed Computing Group

Distributed Computing Group Distributed Computing Group HS 2009 Prof. Dr. Roger Wattenhofer, Thomas Locher, Remo Meier, Benjamin Sigg Assigned: December 11, 2009 Discussion: none Distributed Systems Theory exercise 6 1 ALock2 Have

More information

A Concurrent Skip List Implementation with RTM and HLE

A Concurrent Skip List Implementation with RTM and HLE A Concurrent Skip List Implementation with RTM and HLE Fan Gao May 14, 2014 1 Background Semester Performed: Spring, 2014 Instructor: Maurice Herlihy The main idea of my project is to implement a skip

More information

Lock Oscillation: Boosting the Performance of Concurrent Data Structures

Lock Oscillation: Boosting the Performance of Concurrent Data Structures Lock Oscillation: Boosting the Performance of Concurrent Data Structures Panagiota Fatourou FORTH ICS & University of Crete Nikolaos D. Kallimanis FORTH ICS The Multicore Era The dominance of Multicore

More information

Linked Lists: The Role of Locking. Erez Petrank Technion

Linked Lists: The Role of Locking. Erez Petrank Technion Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing

More information

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention

More information

Scalable Flat-Combining Based Synchronous Queues

Scalable Flat-Combining Based Synchronous Queues Scalable Flat-Combining Based Synchronous Queues Danny Hendler 1, Itai Incze 2, Nir Shavit 2,3 and Moran Tzafrir 2 1 Ben-Gurion University 2 Tel-Aviv University 3 Sun Labs at Oracle Abstract. In a synchronous

More information

Concurrent Objects and Linearizability

Concurrent Objects and Linearizability Chapter 3 Concurrent Objects and Linearizability 3.1 Specifying Objects An object in languages such as Java and C++ is a container for data. Each object provides a set of methods that are the only way

More information

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas Outline Synchronization Methods Priority Queues Concurrent Priority Queues Lock-Free Algorithm: Problems

More information

Concurrent Counting using Combining Tree

Concurrent Counting using Combining Tree Final Project Report by Shang Wang, Taolun Chai and Xiaoming Jia Concurrent Counting using Combining Tree 1. Introduction Counting is one of the very basic and natural activities that computers do. However,

More information

ADAPTIVE SORTING WITH AVL TREES

ADAPTIVE SORTING WITH AVL TREES ADAPTIVE SORTING WITH AVL TREES Amr Elmasry Computer Science Department Alexandria University Alexandria, Egypt elmasry@alexeng.edu.eg Abstract A new adaptive sorting algorithm is introduced. The new implementation

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 20 Concurrency Control Part -1 Foundations for concurrency

More information

A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention

A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention Jonatan Lindén and Bengt Jonsson Uppsala University, Sweden December 18, 2013 Jonatan Lindén 1 Contributions Motivation: Improve

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock- Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Coarse-Grained Synchronization Each method locks the object Avoid

More information

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < > Adaptive Lock Madhav Iyengar < miyengar@andrew.cmu.edu >, Nathaniel Jeffries < njeffrie@andrew.cmu.edu > ABSTRACT Busy wait synchronization, the spinlock, is the primitive at the core of all other synchronization

More information

Overview of Lecture 4. Memory Models, Atomicity & Performance. Ben-Ari Concurrency Model. Dekker s Algorithm 4

Overview of Lecture 4. Memory Models, Atomicity & Performance. Ben-Ari Concurrency Model. Dekker s Algorithm 4 Concurrent and Distributed Programming http://fmt.cs.utwente.nl/courses/cdp/ Overview of Lecture 4 2 Memory Models, tomicity & Performance HC 4 - Tuesday, 6 December 2011 http://fmt.cs.utwente.nl/~marieke/

More information

A Distribution-Sensitive Dictionary with Low Space Overhead

A Distribution-Sensitive Dictionary with Low Space Overhead A Distribution-Sensitive Dictionary with Low Space Overhead Prosenjit Bose, John Howat, and Pat Morin School of Computer Science, Carleton University 1125 Colonel By Dr., Ottawa, Ontario, CANADA, K1S 5B6

More information

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Agenda Lecture Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Next discussion papers Selecting Locking Primitives for Parallel Programming Selecting Locking

More information

Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set

Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set Ahmed Hassan Roberto Palmieri Binoy Ravindran Virginia Tech hassan84@vt.edu robertop@vt.edu binoy@vt.edu Abstract

More information

Lock vs. Lock-free Memory Project proposal

Lock vs. Lock-free Memory Project proposal Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history

More information

Non-blocking Array-based Algorithms for Stacks and Queues!

Non-blocking Array-based Algorithms for Stacks and Queues! Non-blocking Array-based Algorithms for Stacks and Queues! Niloufar Shafiei! Department of Computer Science and Engineering York University ICDCN 09 Outline! Introduction! Stack algorithm! Queue algorithm!

More information

Reagent Based Lock Free Concurrent Link List Spring 2012 Ancsa Hannak and Mitesh Jain April 28, 2012

Reagent Based Lock Free Concurrent Link List Spring 2012 Ancsa Hannak and Mitesh Jain April 28, 2012 Reagent Based Lock Free Concurrent Link List Spring 0 Ancsa Hannak and Mitesh Jain April 8, 0 Introduction The most commonly used implementation of synchronization is blocking algorithms. They utilize

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

Programming Paradigms for Concurrency Lecture 3 Concurrent Objects

Programming Paradigms for Concurrency Lecture 3 Concurrent Objects Programming Paradigms for Concurrency Lecture 3 Concurrent Objects Based on companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Thomas Wies New York University

More information

AST: scalable synchronization Supervisors guide 2002

AST: scalable synchronization Supervisors guide 2002 AST: scalable synchronization Supervisors guide 00 tim.harris@cl.cam.ac.uk These are some notes about the topics that I intended the questions to draw on. Do let me know if you find the questions unclear

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Scheduling Transactions in Replicated Distributed Transactional Memory

Scheduling Transactions in Replicated Distributed Transactional Memory Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly

More information

Building Efficient Concurrent Graph Object through Composition of List-based Set

Building Efficient Concurrent Graph Object through Composition of List-based Set Building Efficient Concurrent Graph Object through Composition of List-based Set Sathya Peri Muktikanta Sa Nandini Singhal Department of Computer Science & Engineering Indian Institute of Technology Hyderabad

More information

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

Unit 6: Indeterminate Computation

Unit 6: Indeterminate Computation Unit 6: Indeterminate Computation Martha A. Kim October 6, 2013 Introduction Until now, we have considered parallelizations of sequential programs. The parallelizations were deemed safe if the parallel

More information

Flat Parallelization. V. Aksenov, ITMO University P. Kuznetsov, ParisTech. July 4, / 53

Flat Parallelization. V. Aksenov, ITMO University P. Kuznetsov, ParisTech. July 4, / 53 Flat Parallelization V. Aksenov, ITMO University P. Kuznetsov, ParisTech July 4, 2017 1 / 53 Outline Flat-combining PRAM and Flat parallelization PRAM binary heap with Flat parallelization ExtractMin Insert

More information

Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors

Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors Emery D. Berger Robert D. Blumofe femery,rdbg@cs.utexas.edu Department of Computer Sciences The University of Texas

More information

arxiv: v3 [cs.ds] 3 Apr 2018

arxiv: v3 [cs.ds] 3 Apr 2018 arxiv:1711.07746v3 [cs.ds] 3 Apr 2018 The Hidden Binary Search Tree: A Balanced Rotation-Free Search Tree in the AVL RAM Model Saulo Queiroz Email: sauloqueiroz@utfpr.edu.br Academic Department of Informatics

More information

CS 351 Design of Large Programs Programming Abstractions

CS 351 Design of Large Programs Programming Abstractions CS 351 Design of Large Programs Programming Abstractions Brooke Chenoweth University of New Mexico Spring 2019 Searching for the Right Abstraction The language we speak relates to the way we think. The

More information

Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Scalable Producer-Consumer Pools based on Elimination-Diraction Trees Scalable Producer-Consumer Pools based on Elimination-Diraction Trees Yehuda Afek, Guy Korland, Maria Natanzon, and Nir Shavit Computer Science Department Tel-Aviv University, Israel, contact email: guy.korland@cs.tau.ac.il

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

Flat Combining and the Synchronization-Parallelism Tradeoff

Flat Combining and the Synchronization-Parallelism Tradeoff Flat Combining and the Synchronization-Parallelism Tradeoff Danny Hendler Ben-Gurion University hendlerd@cs.bgu.ac.il Itai Incze Tel-Aviv University itai.in@gmail.com Moran Tzafrir Tel-Aviv University

More information

Locking Granularity. CS 475, Spring 2019 Concurrent & Distributed Systems. With material from Herlihy & Shavit, Art of Multiprocessor Programming

Locking Granularity. CS 475, Spring 2019 Concurrent & Distributed Systems. With material from Herlihy & Shavit, Art of Multiprocessor Programming Locking Granularity CS 475, Spring 2019 Concurrent & Distributed Systems With material from Herlihy & Shavit, Art of Multiprocessor Programming Discussion: HW1 Part 4 addtolist(key1, newvalue) Thread 1

More information

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

Design Tradeoffs in Modern Software Transactional Memory Systems

Design Tradeoffs in Modern Software Transactional Memory Systems Design Tradeoffs in Modern Software al Memory Systems Virendra J. Marathe, William N. Scherer III, and Michael L. Scott Department of Computer Science University of Rochester Rochester, NY 14627-226 {vmarathe,

More information

Advanced Multiprocessor Programming Project Topics and Requirements

Advanced Multiprocessor Programming Project Topics and Requirements Advanced Multiprocessor Programming Project Topics and Requirements Jesper Larsson Trä TU Wien May 5th, 2017 J. L. Trä AMP SS17, Projects 1 / 21 Projects Goal: Get practical, own experience with concurrent

More information

Concurrent Data Structures Concurrent Algorithms 2016

Concurrent Data Structures Concurrent Algorithms 2016 Concurrent Data Structures Concurrent Algorithms 2016 Tudor David (based on slides by Vasileios Trigonakis) Tudor David 11.2016 1 Data Structures (DSs) Constructs for efficiently storing and retrieving

More information

Parallel linked lists

Parallel linked lists Parallel linked lists Lecture 10 of TDA384/DIT391 (Principles of Conent Programming) Carlo A. Furia Chalmers University of Technology University of Gothenburg SP3 2017/2018 Today s menu The burden of locking

More information

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Philip A. Bernstein Microsoft Research Redmond, WA, USA phil.bernstein@microsoft.com Sudipto Das Microsoft Research

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

Modern High-Performance Locking

Modern High-Performance Locking Modern High-Performance Locking Nir Shavit Slides based in part on The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Locks (Mutual Exclusion) public interface Lock { public void lock();

More information

Linearizable Iterators

Linearizable Iterators Linearizable Iterators Supervised by Maurice Herlihy Abstract Petrank et. al. [5] provide a construction of lock-free, linearizable iterators for lock-free linked lists. We consider the problem of extending

More information

Lightweight Remote Procedure Call

Lightweight Remote Procedure Call Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990, pp. 37-55 presented by Ian Dees for PSU CS533, Jonathan

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract A simple correctness proof of the MCS contention-free lock Theodore Johnson Krishna Harathi Computer and Information Sciences Department University of Florida Abstract Mellor-Crummey and Scott present

More information

Deterministic Jumplists

Deterministic Jumplists Nordic Journal of Computing Deterministic Jumplists Amr Elmasry Department of Computer Engineering and Systems Alexandria University, Egypt elmasry@alexeng.edu.eg Abstract. We give a deterministic version

More information

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition

More information

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency

More information

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock- Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Last Lecture: Spin-Locks CS. spin lock critical section Resets lock

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

A Non-Blocking Concurrent Queue Algorithm

A Non-Blocking Concurrent Queue Algorithm A Non-Blocking Concurrent Queue Algorithm Bruno Didot bruno.didot@epfl.ch June 2012 Abstract This report presents a new non-blocking concurrent FIFO queue backed by an unrolled linked list. Enqueue and

More information

Database Management and Tuning

Database Management and Tuning Database Management and Tuning Concurrency Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 8 May 10, 2012 Acknowledgements: The slides are provided by Nikolaus

More information

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Anders Gidenstam Håkan Sundell Philippas Tsigas School of business and informatics University of Borås Distributed

More information

Conflict Detection and Validation Strategies for Software Transactional Memory

Conflict Detection and Validation Strategies for Software Transactional Memory Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;

More information

Dynamic Memory Allocation. Gerson Robboy Portland State University. class20.ppt

Dynamic Memory Allocation. Gerson Robboy Portland State University. class20.ppt Dynamic Memory Allocation Gerson Robboy Portland State University class20.ppt Harsh Reality Memory is not unbounded It must be allocated and managed Many applications are memory dominated Especially those

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

Last Lecture: Spin-Locks

Last Lecture: Spin-Locks Linked Lists: Locking, Lock- Free, and Beyond Last Lecture: Spin-Locks. spin lock CS critical section Resets lock upon exit Today: Concurrent Objects Adding threads should not lower throughput Contention

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Concurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Companion slides for The by Maurice Herlihy & Nir Shavit Concurrent Computation memory object object 2 Objectivism What is a concurrent object? How do we describe one? How do we implement

More information

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Håkan Sundell Philippas Tsigas OPODIS 2004: The 8th International Conference on Principles of Distributed Systems

More information

Using RDMA for Lock Management

Using RDMA for Lock Management Using RDMA for Lock Management Yeounoh Chung Erfan Zamanian {yeounoh, erfanz}@cs.brown.edu Supervised by: John Meehan Stan Zdonik {john, sbz}@cs.brown.edu Abstract arxiv:1507.03274v2 [cs.dc] 20 Jul 2015

More information

The New Java Technology Memory Model

The New Java Technology Memory Model The New Java Technology Memory Model java.sun.com/javaone/sf Jeremy Manson and William Pugh http://www.cs.umd.edu/~pugh 1 Audience Assume you are familiar with basics of Java technology-based threads (

More information

15 418/618 Project Final Report Concurrent Lock free BST

15 418/618 Project Final Report Concurrent Lock free BST 15 418/618 Project Final Report Concurrent Lock free BST Names: Swapnil Pimpale, Romit Kudtarkar AndrewID: spimpale, rkudtark 1.0 SUMMARY We implemented two concurrent binary search trees (BSTs): a fine

More information

The Relative Power of Synchronization Methods

The Relative Power of Synchronization Methods Chapter 5 The Relative Power of Synchronization Methods So far, we have been addressing questions of the form: Given objects X and Y, is there a wait-free implementation of X from one or more instances

More information

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014 Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 8 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus

More information

Understanding Task Scheduling Algorithms. Kenjiro Taura

Understanding Task Scheduling Algorithms. Kenjiro Taura Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 48 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary

More information

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence? CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Algorithm 23 works. Instead of a spanning tree, one can use routing.

Algorithm 23 works. Instead of a spanning tree, one can use routing. Chapter 5 Shared Objects 5.1 Introduction Assume that there is a common resource (e.g. a common variable or data structure), which different nodes in a network need to access from time to time. If the

More information

Lecture 13: AVL Trees and Binary Heaps

Lecture 13: AVL Trees and Binary Heaps Data Structures Brett Bernstein Lecture 13: AVL Trees and Binary Heaps Review Exercises 1. ( ) Interview question: Given an array show how to shue it randomly so that any possible reordering is equally

More information

Implementation of Process Networks in Java

Implementation of Process Networks in Java Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a

More information

Revisiting the Combining Synchronization Technique

Revisiting the Combining Synchronization Technique Revisiting the Combining Synchronization Technique Panagiota Fatourou Department of Computer Science University of Crete & FORTH ICS faturu@csd.uoc.gr Nikolaos D. Kallimanis Department of Computer Science

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

The Prioritized and Distributed Synchronization in Distributed Groups

The Prioritized and Distributed Synchronization in Distributed Groups The Prioritized and Distributed Synchronization in Distributed Groups Michel Trehel and hmed Housni Université de Franche Comté, 16, route de Gray 25000 Besançon France, {trehel, housni} @lifc.univ-fcomte.fr

More information

[ DATA STRUCTURES ] Fig. (1) : A Tree

[ DATA STRUCTURES ] Fig. (1) : A Tree [ DATA STRUCTURES ] Chapter - 07 : Trees A Tree is a non-linear data structure in which items are arranged in a sorted sequence. It is used to represent hierarchical relationship existing amongst several

More information

Introduction to OS Synchronization MOS 2.3

Introduction to OS Synchronization MOS 2.3 Introduction to OS Synchronization MOS 2.3 Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Introduction to OS 1 Challenge How can we help processes synchronize with each other? E.g., how

More information

How invariants help writing loops Author: Sander Kooijmans Document version: 1.0

How invariants help writing loops Author: Sander Kooijmans Document version: 1.0 How invariants help writing loops Author: Sander Kooijmans Document version: 1.0 Why this document? Did you ever feel frustrated because of a nasty bug in your code? Did you spend hours looking at the

More information

Efficient pebbling for list traversal synopses

Efficient pebbling for list traversal synopses Efficient pebbling for list traversal synopses Yossi Matias Ely Porat Tel Aviv University Bar-Ilan University & Tel Aviv University Abstract 1 Introduction 1.1 Applications Consider a program P running

More information

Stretch-Optimal Scheduling for On-Demand Data Broadcasts

Stretch-Optimal Scheduling for On-Demand Data Broadcasts Stretch-Optimal Scheduling for On-Demand Data roadcasts Yiqiong Wu and Guohong Cao Department of Computer Science & Engineering The Pennsylvania State University, University Park, PA 6 E-mail: fywu,gcaog@cse.psu.edu

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Programming Paradigms for Concurrency Lecture 6 Synchronization of Concurrent Objects

Programming Paradigms for Concurrency Lecture 6 Synchronization of Concurrent Objects Programming Paradigms for Concurrency Lecture 6 Synchronization of Concurrent Objects Based on companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Thomas

More information

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL Whatever can go wrong will go wrong. attributed to Edward A. Murphy Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL 251 Literature Maurice Herlihy and Nir Shavit. The Art of Multiprocessor

More information

Notes on Binary Dumbbell Trees

Notes on Binary Dumbbell Trees Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes

More information

Coarse-grained and fine-grained locking Niklas Fors

Coarse-grained and fine-grained locking Niklas Fors Coarse-grained and fine-grained locking Niklas Fors 2013-12-05 Slides borrowed from: http://cs.brown.edu/courses/cs176course_information.shtml Art of Multiprocessor Programming 1 Topics discussed Coarse-grained

More information

CSE 374 Programming Concepts & Tools

CSE 374 Programming Concepts & Tools CSE 374 Programming Concepts & Tools Hal Perkins Fall 2017 Lecture 22 Shared-Memory Concurrency 1 Administrivia HW7 due Thursday night, 11 pm (+ late days if you still have any & want to use them) Course

More information