A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention

Size: px
Start display at page:

Download "A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention"

Transcription

1 A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention Jonatan Lindén and Bengt Jonsson Uppsala University, Sweden December 18, 2013 Jonatan Lindén 1

2 Contributions Motivation: Improve performance of concurrent Discrete Event Simulator. Outcome: New lock-free skiplist-based priority queue. New representation for logically deleted nodes. Minimizes the contention. Improved performance over existing algorithms by 30 80% on multiprocessors. Linearizable. Jonatan Lindén 2

3 Outline Contributions Background Priority queue Skiplist Standard solution to increase concurrency The problem: contention Our algorithm Correctness Evaluation Jonatan Lindén 3

4 Priority Queues Priority Queue - A set of (key,value) pairs with two operations: Insert(key, value) Applications: Discrete Event Simulation, Numerical algorithms. Jonatan Lindén 4

5 Priority Queues Priority Queue - A set of (key,value) pairs with two operations: Insert(key, value) Applications: Discrete Event Simulation, Numerical algorithms. Implementations: Traditionally, implemented on top of heaps or tree data structures. Skiplists [Pugh:1990] have been used for several parallel implementations. Jonatan Lindén 4

6 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

7 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) Peek H T Jonatan Lindén 5

8 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

9 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

10 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

11 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

12 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

13 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

14 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

15 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Insert(4) H T Jonatan Lindén 5

16 Skiplist layered linked list lowest-level list defines logical state, ordered higher-level lists are shortcuts probabilistic guarantee of logarithmic search time Smallest element at the beginning of the lowest-level list. DeleteMin entry point H T Jonatan Lindén 5

17 Concurrent skiplist-based priority queues Skiplists are easy to make concurrent Skiplists scale well when concurrent threads access different parts of the structure. Jonatan Lindén 6

18 Concurrent skiplist-based priority queues Skiplists are easy to make concurrent Skiplists scale well when concurrent threads access different parts of the structure. Jonatan Lindén 6

19 Concurrent skiplist-based priority queues Skiplists are easy to make concurrent Skiplists scale well when concurrent threads access different parts of the structure. Bottleneck: concurrent DeleteMin operations in priority queues try to remove the same element. Jonatan Lindén 6

20 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. Delete flag. H T Jonatan Lindén 7

21 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. H T Jonatan Lindén 7

22 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. H T Jonatan Lindén 7

23 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. H T Jonatan Lindén 7

24 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. H T Jonatan Lindén 7

25 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. H T Jonatan Lindén 7

26 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. H T Jonatan Lindén 7

27 Standard solution Standard solution to increase concurrency: logical deletion by setting a delete flag. physical deletion unlinks the node after the logical deletion has succeeded. Lock-free: Perform all writes using Compare-and-Swap (CAS) Colocate delete flag together with each next pointer, e.g., in the lowest order bit [Harris 2001], to make physical deletion safe. H T Jonatan Lindén 7

28 Contention Bottleneck: CAS in DeleteMin Several types of contention: (i) Multiple CASes compete H T Jonatan Lindén 8

29 Contention Bottleneck: CAS in DeleteMin Several types of contention: (i) Multiple CASes compete modified by CAS H T Jonatan Lindén 8

30 Contention Bottleneck: CAS in DeleteMin Several types of contention: (i) Multiple CASes compete (ii) Updates must be propagated to reads H T Jonatan Lindén 8

31 Contention Bottleneck: CAS in DeleteMin Several types of contention: (i) Multiple CASes compete (ii) Updates must be propagated to reads Insert(6) serialize H T Jonatan Lindén 8

32 Our algorithm Jonatan Lindén 9

33 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. H T Jonatan Lindén 10

34 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. H T Jonatan Lindén 10

35 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. Requires that logically deleted nodes form a prefix. H T Jonatan Lindén 10

36 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. Requires that logically deleted nodes form a prefix. Insert(1) H T Jonatan Lindén 10

37 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. Requires that logically deleted nodes form a prefix. Insert(1) H T Jonatan Lindén 10

38 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. Requires that logically deleted nodes form a prefix. Insert(1) H T Jonatan Lindén 10

39 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. Requires that logically deleted nodes form a prefix. H T Does not work! Jonatan Lindén 10

40 Our solution Key idea: No physical deletion after logical deletion! Instead, delete nodes in batches. By updating the pointers in head node. Requires that logically deleted nodes form a prefix. H T should be inserted here Jonatan Lindén 10

41 Our solution To guarantee prefix property, Store delete flag together with the predecessor s next pointer. H T Jonatan Lindén 11

42 Our solution To guarantee prefix property, Store delete flag together with the predecessor s next pointer. Insert(1) H T Jonatan Lindén 11

43 Our solution To guarantee prefix property, Store delete flag together with the predecessor s next pointer. Insert(1) H T Jonatan Lindén 11

44 Our solution To guarantee prefix property, Store delete flag together with the predecessor s next pointer. Insert(1) H T Jonatan Lindén 11

45 Our solution To guarantee prefix property, Store delete flag together with the predecessor s next pointer. Insert(1) H T Still a prefix! Jonatan Lindén 11

46 Our solution Resolving conflicts between Insert and DeleteMin. Insert(1) H T Jonatan Lindén 12

47 Our solution Resolving conflicts between Insert and DeleteMin. Insert(1) H T 1 Jonatan Lindén 12

48 Our solution Resolving conflicts between Insert and DeleteMin. Insert(1) H T 1 Jonatan Lindén 12

49 Our solution Resolving conflicts between Insert and DeleteMin. Insert(1) H T 1 Jonatan Lindén 12

50 Our solution Resolving conflicts between Insert and DeleteMin. Insert(1) H T Jonatan Lindén 12

51 Our solution Resolving conflicts between Insert and DeleteMin. Insert(1) H T 1 Jonatan Lindén 12

52 Our solution Resolving conflicts between Insert and DeleteMin. H T 1 Jonatan Lindén 12

53 Our solution Resolving conflicts between Insert and DeleteMin. H T 1 Jonatan Lindén 12

54 Physical batch deletion Physical batch deletion: update pointers in the head Done by DeleteMin, when the prefix of deleted nodes exceeds a threshold H T Jonatan Lindén 13

55 Physical batch deletion Physical batch deletion: update pointers in the head Done by DeleteMin, when the prefix of deleted nodes exceeds a threshold H T Jonatan Lindén 13

56 Physical batch deletion Physical batch deletion: update pointers in the head Done by DeleteMin, when the prefix of deleted nodes exceeds a threshold H T Jonatan Lindén 13

57 Correctness linearizable concurrent priority queue Correctness proof based on assertional reasoning in the paper Follows rather easily after establishing that the lowest level list consists of a deleted prefix followed by a sorted list which defines the logical state of the queue. We have also modeled the algorithm in SPIN and performed extensive state-space exploration Jonatan Lindén 14

58 Evaluation Jonatan Lindén 15

59 Comparison of maximal throughput Compared against other lock-free skiplist-based priority queues: Sundell & Tsigas (ST): Only a single logically deleted node allowed in the lowest-level list. Herlihy & Shavit (HS): Lock-free adaptation of [Lotan & Shavit]. Logically deleted nodes need not form a prefix. Jonatan Lindén 16

60 Comparison of maximal throughput Benchmark: 50% Insert, 50% DeleteMin, 4 socket Intel sandybridge machine. M operations/s New HS ST Number of threads 30 80% improvement in comparison to HS 1-8 cores: single socket (shared L3 cache) Jonatan Lindén 17

61 Evaluation Benchmark: DES workload, 4-socket AMD bulldozer machine. M operations/s New HS ST Number of threads 30 80% improvement in comparison to HS 1 2 cores: shared L2 cache Jonatan Lindén 18

62 More resources BSD licensed implementation SPIN model extended technical report Performance bug in the version in the proceedings: in the Restructure algorithm which updates the head pointers please see the extended technical report. Jonatan Lindén 19

63 Conclusions a new linearizable, lock-free, skiplist-based priority queue algorithm. new representation of logical deletion. This reduces the number of CASes to critical shared memory locations % performance improvement over existing such algorithms. Jonatan Lindén 20

64 Conclusions a new linearizable, lock-free, skiplist-based priority queue algorithm. new representation of logical deletion. This reduces the number of CASes to critical shared memory locations % performance improvement over existing such algorithms. Thank you. Jonatan Lindén 20

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas Outline Synchronization Methods Priority Queues Concurrent Priority Queues Lock-Free Algorithm: Problems

More information

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Anders Gidenstam Håkan Sundell Philippas Tsigas School of business and informatics University of Borås Distributed

More information

arxiv: v1 [cs.ds] 16 Mar 2016

arxiv: v1 [cs.ds] 16 Mar 2016 Benchmarking Concurrent Priority Queues: Performance of k-lsm and Related Data Structures [Brief Announcement] Jakob Gruber TU Wien, Austria gruber@par.tuwien.ac.at Jesper Larsson Träff TU Wien, Austria

More information

CBPQ: High Performance Lock-Free Priority Queue

CBPQ: High Performance Lock-Free Priority Queue CBPQ: High Performance Lock-Free Priority Queue Anastasia Braginsky 1, Nachshon Cohen 2, and Erez Petrank 2 1 Yahoo! Labs Haifa anastas@yahoo-inc.com 2 Technion - Israel Institute of Technology {ncohen,erez}@cs.technion.ac.il

More information

The Contention Avoiding Concurrent Priority Queue

The Contention Avoiding Concurrent Priority Queue The Contention Avoiding Concurrent Priority Queue Konstantinos Sagonas and Kjell Winblad Department of Information Technology, Uppsala University, Sweden Abstract. Efficient and scalable concurrent priority

More information

Linked Lists: The Role of Locking. Erez Petrank Technion

Linked Lists: The Role of Locking. Erez Petrank Technion Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Håkan Sundell Philippas Tsigas OPODIS 2004: The 8th International Conference on Principles of Distributed Systems

More information

A Concurrent Skip List Implementation with RTM and HLE

A Concurrent Skip List Implementation with RTM and HLE A Concurrent Skip List Implementation with RTM and HLE Fan Gao May 14, 2014 1 Background Semester Performed: Spring, 2014 Instructor: Maurice Herlihy The main idea of my project is to implement a skip

More information

Allocating memory in a lock-free manner

Allocating memory in a lock-free manner Allocating memory in a lock-free manner Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers

More information

Per-Thread Batch Queues For Multithreaded Programs

Per-Thread Batch Queues For Multithreaded Programs Per-Thread Batch Queues For Multithreaded Programs Tri Nguyen, M.S. Robert Chun, Ph.D. Computer Science Department San Jose State University San Jose, California 95192 Abstract Sharing resources leads

More information

Efficient & Lock-Free Modified Skip List in Concurrent Environment

Efficient & Lock-Free Modified Skip List in Concurrent Environment Efficient & Lock-Free Modified Skip List in Concurrent Environment Ranjeet Kaur Department of Computer Science and Application Kurukshetra University, Kurukshetra Kurukshetra, Haryana Pushpa Rani Suri

More information

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led About this course This four-day instructor-led course provides students who manage and maintain SQL Server databases

More information

Analyzing the Performance of Lock-Free Data Structures: A Conflict-based Model

Analyzing the Performance of Lock-Free Data Structures: A Conflict-based Model Analyzing the Performance of Lock-Free Data Structures: A Conflict-based Model Aras Atalar, Paul Renaud-Goud, and Philippas Tsigas Chalmers University of Technology {aaras goud tsigas}@chalmers.se Abstract.

More information

Flat Parallelization. V. Aksenov, ITMO University P. Kuznetsov, ParisTech. July 4, / 53

Flat Parallelization. V. Aksenov, ITMO University P. Kuznetsov, ParisTech. July 4, / 53 Flat Parallelization V. Aksenov, ITMO University P. Kuznetsov, ParisTech July 4, 2017 1 / 53 Outline Flat-combining PRAM and Flat parallelization PRAM binary heap with Flat parallelization ExtractMin Insert

More information

Concurrent Data Structures Concurrent Algorithms 2016

Concurrent Data Structures Concurrent Algorithms 2016 Concurrent Data Structures Concurrent Algorithms 2016 Tudor David (based on slides by Vasileios Trigonakis) Tudor David 11.2016 1 Data Structures (DSs) Constructs for efficiently storing and retrieving

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 31 October 2012

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 31 October 2012 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 31 October 2012 Lecture 6 Linearizability Lock-free progress properties Queues Reducing contention Explicit memory management Linearizability

More information

Non-blocking Array-based Algorithms for Stacks and Queues!

Non-blocking Array-based Algorithms for Stacks and Queues! Non-blocking Array-based Algorithms for Stacks and Queues! Niloufar Shafiei! Department of Computer Science and Engineering York University ICDCN 09 Outline! Introduction! Stack algorithm! Queue algorithm!

More information

A Simple Optimistic skip-list Algorithm

A Simple Optimistic skip-list Algorithm A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems Laboratories yosef.lev@sun.com Victor Luchangco Sun

More information

CS4021/4521 INTRODUCTION

CS4021/4521 INTRODUCTION CS4021/4521 Advanced Computer Architecture II Prof Jeremy Jones Rm 4.16 top floor South Leinster St (SLS) jones@scss.tcd.ie South Leinster St CS4021/4521 2018 jones@scss.tcd.ie School of Computer Science

More information

Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing

Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing H. Sundell 1 1 School of Business and Informatics, University of Borås, Borås, Sweden Abstract We present a new algorithm for implementing

More information

Progress Guarantees When Composing Lock-Free Objects

Progress Guarantees When Composing Lock-Free Objects Progress Guarantees When Composing Lock-Free Objects Nhan Nguyen Dang and Philippas Tsigas Department of Computer Science and Engineering Chalmers University of Technology Gothenburg, Sweden {nhann,tsigas}@chalmers.se

More information

Mounds: Array-Based Concurrent Priority Queues

Mounds: Array-Based Concurrent Priority Queues 212 41st International Conference on Parallel Processing Mounds: Array-Based Concurrent Priority Queues Yujie Liu and Michael Spear Department of Computer Science and Engineering Lehigh University {yul51,

More information

Building Efficient Concurrent Graph Object through Composition of List-based Set

Building Efficient Concurrent Graph Object through Composition of List-based Set Building Efficient Concurrent Graph Object through Composition of List-based Set Sathya Peri Muktikanta Sa Nandini Singhal Department of Computer Science & Engineering Indian Institute of Technology Hyderabad

More information

Dynamic Concurrent Van Emde Boas Array

Dynamic Concurrent Van Emde Boas Array Dynamic Concurrent Van Emde Boas Array Data structure for high performance computing Konrad Kułakowski AGH University of Science and Technology HiPEAC Workshop 17 June 2016 Outline Instead of introduction:

More information

Fast and Scalable Queue-Based Resource Allocation Lock on Shared-Memory Multiprocessors

Fast and Scalable Queue-Based Resource Allocation Lock on Shared-Memory Multiprocessors Background Fast and Scalable Queue-Based Resource Allocation Lock on Shared-Memory Multiprocessors Deli Zhang, Brendan Lynch, and Damian Dechev University of Central Florida, Orlando, USA December 18,

More information

arxiv: v1 [cs.dc] 5 Aug 2014

arxiv: v1 [cs.dc] 5 Aug 2014 The Adaptive Priority Queue with Elimination and Combining arxiv:1408.1021v1 [cs.dc] 5 Aug 2014 Irina Calciu, Hammurabi Mendes, and Maurice Herlihy Department of Computer Science Brown University 115 Waterman

More information

arxiv: v2 [cs.dc] 9 May 2017

arxiv: v2 [cs.dc] 9 May 2017 Flat Parallelization Vitaly Aksenov and Petr Kuznetsov INRIA Paris, France and ITMO University, Russia aksenov.vitaly@gmail.com LTCI, Télécom ParisTech, Université Paris-Saclay petr.kuznetsov@telecom-paristech.fr

More information

[MS10987A]: Performance Tuning and Optimizing SQL Databases

[MS10987A]: Performance Tuning and Optimizing SQL Databases [MS10987A]: Performance Tuning and Optimizing SQL Databases Length : 4 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 17 November 2017 Lecture 7 Linearizability Lock-free progress properties Hashtables and skip-lists Queues Reducing contention Explicit

More information

Advanced Multiprocessor Programming Project Topics and Requirements

Advanced Multiprocessor Programming Project Topics and Requirements Advanced Multiprocessor Programming Project Topics and Requirements Jesper Larsson Trä TU Wien May 5th, 2017 J. L. Trä AMP SS17, Projects 1 / 21 Projects Goal: Get practical, own experience with concurrent

More information

Concurrent Access Algorithms for Different Data Structures: A Research Review

Concurrent Access Algorithms for Different Data Structures: A Research Review Concurrent Access Algorithms for Different Data Structures: A Research Review Parminder Kaur Program Study of Information System University Sari Mutiara, Indonesia Parm.jass89@gmail.com Abstract Algorithms

More information

A Practical Scalable Distributed B-Tree

A Practical Scalable Distributed B-Tree A Practical Scalable Distributed B-Tree CS 848 Paper Presentation Marcos K. Aguilera, Wojciech Golab, Mehul A. Shah PVLDB 08 March 8, 2010 Presenter: Evguenia (Elmi) Eflov Presentation Outline 1 Background

More information

Improving STM Performance with Transactional Structs 1

Improving STM Performance with Transactional Structs 1 Improving STM Performance with Transactional Structs 1 Ryan Yates and Michael L. Scott University of Rochester IFL, 8-31-2016 1 This work was funded in part by the National Science Foundation under grants

More information

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists Companion slides for The by Maurice Herlihy & Nir Shavit Set Object Interface Collection of elements No duplicates Methods add() a new element remove() an element contains() if element

More information

Flat Combining and the Synchronization-Parallelism Tradeoff

Flat Combining and the Synchronization-Parallelism Tradeoff Flat Combining and the Synchronization-Parallelism Tradeoff Danny Hendler Ben-Gurion University hendlerd@cs.bgu.ac.il Itai Incze Tel-Aviv University itai.in@gmail.com Moran Tzafrir Tel-Aviv University

More information

Combining Techniques Application for Tree Search Structures

Combining Techniques Application for Tree Search Structures RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES BLAVATNIK SCHOOL OF COMPUTER SCIENCE Combining Techniques Application for Tree Search Structures Thesis submitted in partial fulfillment of requirements

More information

Multiprocessor Support

Multiprocessor Support CSC 256/456: Operating Systems Multiprocessor Support John Criswell University of Rochester 1 Outline Multiprocessor hardware Types of multi-processor workloads Operating system issues Where to run the

More information

Lock vs. Lock-free Memory Project proposal

Lock vs. Lock-free Memory Project proposal Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history

More information

SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description.

SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases. Upcoming Dates. Course Description. SQL Server Administration 10987: Performance Tuning and Optimizing SQL Databases Learn the high level architectural overview of SQL Server 2016 and explore SQL Server execution model, waits and queues

More information

A Study of the Behavior of Synchronization Methods in Commonly Used Languages and Systems

A Study of the Behavior of Synchronization Methods in Commonly Used Languages and Systems A Study of the Behavior of Synchronization Methods in Commonly Used Languages and Systems Daniel Cederman, Bapi Chatterjee, Nhan Nguyen, Yiannis Nikolakopoulos, Marina Papatriantafilou and Philippas Tsigas

More information

CS377P Programming for Performance Multicore Performance Synchronization

CS377P Programming for Performance Multicore Performance Synchronization CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional

More information

Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting

Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting Efficient and Reliable Lock-Free Memory Reclamation d on Reference ounting nders Gidenstam, Marina Papatriantafilou, Håkan Sundell and Philippas Tsigas Distributed omputing and Systems group, Department

More information

Scalable Concurrent Hash Tables via Relativistic Programming

Scalable Concurrent Hash Tables via Relativistic Programming Scalable Concurrent Hash Tables via Relativistic Programming Josh Triplett September 24, 2009 Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second

More information

Modern High-Performance Locking

Modern High-Performance Locking Modern High-Performance Locking Nir Shavit Slides based in part on The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Locks (Mutual Exclusion) public interface Lock { public void lock();

More information

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs Authors: Jos e L. Abell an, Juan Fern andez and Manuel E. Acacio Presenter: Guoliang Liu Outline Introduction Motivation Background

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 21 November 2014 Lecture 7 Linearizability Lock-free progress properties Queues Reducing contention Explicit memory management Linearizability

More information

A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms

A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms Orr Tamir 1, Adam Morrison 2, and Noam Rinetzky 3 1 ortamir@post.tau.ac.il Blavatnik School of Computer Science,

More information

MSL Based Concurrent and Efficient Priority Queue

MSL Based Concurrent and Efficient Priority Queue MSL Based Concurrent and Efficient Priority Queue Ranjeet Kaur #1, Dr. Pushpa Rani Suri #2 1 Student, 2 Professor 1, 2 Department of Computer Science and Application. Kurukshetra University, Kurukshetra

More information

Fast and Scalable Queue-Based Resource Allocation Lock on Shared-Memory Multiprocessors

Fast and Scalable Queue-Based Resource Allocation Lock on Shared-Memory Multiprocessors Fast and Scalable Queue-Based Resource Allocation Lock on Shared-Memory Multiprocessors Deli Zhang, Brendan Lynch, and Damian Dechev University of Central Florida, Orlando, USA April 27, 2016 Mutual Exclusion

More information

DESIGN CHALLENGES FOR SCALABLE CONCURRENT DATA STRUCTURES for Many-Core Processors

DESIGN CHALLENGES FOR SCALABLE CONCURRENT DATA STRUCTURES for Many-Core Processors DESIGN CHALLENGES FOR SCALABLE CONCURRENT DATA STRUCTURES for Many-Core Processors DIMACS March 15 th, 2011 Philippas Tsigas Data Structures In Manycore Sys. Decomposition Synchronization Load Balancing

More information

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention

More information

A Skip List for Multicore

A Skip List for Multicore A Skip List for Multicore Ian Dick University of Sydney Alan Fekete University of Sydney Vincent Gramoli University of Sydney Abstract In this paper, we introduce the Rotating skip list, the fastest concurrent

More information

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics. CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics. Name: Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are more

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

10987: Performance Tuning and Optimizing SQL Databases

10987: Performance Tuning and Optimizing SQL Databases Let s Reach For Excellence! TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC Address: 103 Pasteur, Dist.1, HCMC Tel: 08 38245819; 38239761 Email: traincert@tdt-tanduc.com Website: www.tdt-tanduc.com; www.tanducits.com

More information

Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987)

Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987) Performance Tuning & Optimizing SQL Databases Microsoft Official Curriculum (MOC 10987) Course Length: 4 days Course Delivery: Traditional Classroom Online Live Course Overview This 4-day instructor-led

More information

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei Non-blocking Array-based Algorithms for Stacks and Queues Niloufar Shafiei Outline Introduction Concurrent stacks and queues Contributions New algorithms New algorithms using bounded counter values Correctness

More information

Log-Free Concurrent Data Structures

Log-Free Concurrent Data Structures Log-Free Concurrent Data Structures Abstract Tudor David IBM Research Zurich udo@zurich.ibm.com Rachid Guerraoui EPFL rachid.guerraoui@epfl.ch Non-volatile RAM (NVRAM) makes it possible for data structures

More information

Extreme Performance Platform for Real-Time Streaming Analytics

Extreme Performance Platform for Real-Time Streaming Analytics Extreme Performance Platform for Real-Time Streaming Analytics Achieve Massive Scalability on SPARC T7 with Oracle Stream Analytics O R A C L E W H I T E P A P E R A P R I L 2 0 1 6 Disclaimer The following

More information

Implementations. Priority Queues. Heaps and Heap Order. The Insert Operation CS206 CS206

Implementations. Priority Queues. Heaps and Heap Order. The Insert Operation CS206 CS206 Priority Queues An internet router receives data packets, and forwards them in the direction of their destination. When the line is busy, packets need to be queued. Some data packets have higher priority

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock- Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock- Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Coarse-Grained Synchronization Each method locks the object Avoid

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Priority Queues Outline Priority Queues 1 Priority Queues Jumping the Queue Priority Queues In normal queue, the mode of selection is first in,

More information

Sorting and Searching

Sorting and Searching Sorting and Searching Lecture 2: Priority Queues, Heaps, and Heapsort Lecture 2: Priority Queues, Heaps, and Heapsort Sorting and Searching 1 / 24 Priority Queue: Motivating Example 3 jobs have been submitted

More information

Lock-Free Concurrent Binomial Heaps

Lock-Free Concurrent Binomial Heaps Lock-Free Concurrent Binomial Heaps Gavin Lowe Department of Computer Science, University of Oxford gavin.lowe@cs.ox.ac.uk August 21, 2018 Abstract We present a linearizable, lock-free concurrent binomial

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

M4 Parallelism. Implementation of Locks Cache Coherence

M4 Parallelism. Implementation of Locks Cache Coherence M4 Parallelism Implementation of Locks Cache Coherence Outline Parallelism Flynn s classification Vector Processing Subword Parallelism Symmetric Multiprocessors, Distributed Memory Machines Shared Memory

More information

Chapter 6 Heaps. Introduction. Heap Model. Heap Implementation

Chapter 6 Heaps. Introduction. Heap Model. Heap Implementation Introduction Chapter 6 Heaps some systems applications require that items be processed in specialized ways printing may not be best to place on a queue some jobs may be more small 1-page jobs should be

More information

Drop the Anchor: Lightweight Memory Management for Non-Blocking Data Structures

Drop the Anchor: Lightweight Memory Management for Non-Blocking Data Structures Drop the Anchor: Lightweight Memory Management for Non-Blocking Data Structures Anastasia Braginsky Computer Science Technion anastas@cs.technion.ac.il Alex Kogan Oracle Labs alex.kogan@oracle.com Erez

More information

Sorting and Searching

Sorting and Searching Sorting and Searching Lecture 2: Priority Queues, Heaps, and Heapsort Lecture 2: Priority Queues, Heaps, and Heapsort Sorting and Searching 1 / 24 Priority Queue: Motivating Example 3 jobs have been submitted

More information

Concurrent specifications beyond linearizability

Concurrent specifications beyond linearizability Concurrent specifications beyond linearizability Éric Goubault Jérémy Ledent Samuel Mimram École Polytechnique, France OPODIS 2018, Hong Kong December 19, 2018 1 / 14 Objects Processes communicate through

More information

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas

More information

Non-blocking Priority Queue based on Skiplists with Relaxed Semantics

Non-blocking Priority Queue based on Skiplists with Relaxed Semantics UNLV Theses, Dissertations, Professional Papers, and Capstones 5-1-2017 Non-blocking Priority Queue based on Skiplists with Relaxed Semantics Ashok Adhikari University of Nevada, Las Vegas, ashokadhikari42@gmail.com

More information

CS106X Programming Abstractions in C++ Dr. Cynthia Bailey Lee

CS106X Programming Abstractions in C++ Dr. Cynthia Bailey Lee CS106X Programming Abstractions in C++ Dr. Cynthia Bailey Lee 2 Today s Topics: 1. Binary tree 2. Heap Priority Queue Emergency Department waiting room operates as a priority queue: patients are sorted

More information

Locking Granularity. CS 475, Spring 2019 Concurrent & Distributed Systems. With material from Herlihy & Shavit, Art of Multiprocessor Programming

Locking Granularity. CS 475, Spring 2019 Concurrent & Distributed Systems. With material from Herlihy & Shavit, Art of Multiprocessor Programming Locking Granularity CS 475, Spring 2019 Concurrent & Distributed Systems With material from Herlihy & Shavit, Art of Multiprocessor Programming Discussion: HW1 Part 4 addtolist(key1, newvalue) Thread 1

More information

PrimeBase XT. A transactional engine for MySQL. Paul McCullagh SNAP Innovation GmbH

PrimeBase XT. A transactional engine for MySQL.  Paul McCullagh SNAP Innovation GmbH PrimeBase XT A transactional engine for MySQL Paul McCullagh SNAP Innovation GmbH Our Company SNAP Innovation GmbH was founded in 1996, currently 25 employees. Purpose: develop and support PrimeBase database,

More information

Java Performance: The Definitive Guide

Java Performance: The Definitive Guide Java Performance: The Definitive Guide Scott Oaks Beijing Cambridge Farnham Kbln Sebastopol Tokyo O'REILLY Table of Contents Preface ix 1. Introduction 1 A Brief Outline 2 Platforms and Conventions 2 JVM

More information

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

Workload Characterization and Optimization of TPC-H Queries on Apache Spark Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 3 Nov 2017

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 3 Nov 2017 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 3 Nov 2017 Lecture 1/3 Introduction Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading without locking Flat

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

The SkipTrie: Low-Depth Concurrent Search without Rebalancing

The SkipTrie: Low-Depth Concurrent Search without Rebalancing The SkipTrie: Low-Depth Concurrent Search without Rebalancing Rotem Oshman University of Toronto rotem@cs.toronto.edu Nir Shavit MIT shanir@csail.mit.edu ABSTRACT To date, all concurrent search structures

More information

High Performance Transactions in Deuteronomy

High Performance Transactions in Deuteronomy High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,

More information

From Lock-Free to Wait-Free: Linked List. Edward Duong

From Lock-Free to Wait-Free: Linked List. Edward Duong From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the locality conscious linked list [Braginsky 2012] 2) Transformation concept from lock-free -> wait-free [Timnat

More information

Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures

Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures Nicholas Hunt Paramjit Singh Sandhu Luis Ceze University of Washington {nhunt,paramsan,luisceze}@cs.washington.edu Abstract

More information

6.852: Distributed Algorithms Fall, Class 15

6.852: Distributed Algorithms Fall, Class 15 6.852: Distributed Algorithms Fall, 2009 Class 15 Today s plan z z z z z Pragmatic issues for shared-memory multiprocessors Practical mutual exclusion algorithms Test-and-set locks Ticket locks Queue locks

More information

Energy-centric DVFS Controlling Method for Multi-core Platforms

Energy-centric DVFS Controlling Method for Multi-core Platforms Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Dr. Malek Mouhoub Department of Computer Science University of Regina Fall 2002 Malek Mouhoub, CS3620 Fall 2002 1 6. Priority Queues 6. Priority Queues ffl ADT Stack : LIFO.

More information

High-Performance Composable Transactional Data Structures

High-Performance Composable Transactional Data Structures University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) High-Performance Composable Transactional Data Structures 2016 Deli Zhang University of Central Florida

More information

Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model

Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model Manuel Pöter TU Wien, Faculty of Informatics Vienna, Austria manuel@manuel-poeter.at Jesper Larsson Träff

More information

Distributed Computing Group

Distributed Computing Group Distributed Computing Group HS 2009 Prof. Dr. Roger Wattenhofer, Thomas Locher, Remo Meier, Benjamin Sigg Assigned: December 11, 2009 Discussion: none Distributed Systems Theory exercise 6 1 ALock2 Have

More information

Multiprocessor Scheduling. Multiprocessor Scheduling

Multiprocessor Scheduling. Multiprocessor Scheduling Multiprocessor Scheduling Will consider only shared memory multiprocessor or multi-core CPU Salient features: One or more caches: cache affinity is important Semaphores/locks typically implemented as spin-locks:

More information

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single

More information

Heap Model. specialized queue required heap (priority queue) provides at least

Heap Model. specialized queue required heap (priority queue) provides at least Chapter 6 Heaps 2 Introduction some systems applications require that items be processed in specialized ways printing may not be best to place on a queue some jobs may be more small 1-page jobs should

More information

Introduction. CS3026 Operating Systems Lecture 01

Introduction. CS3026 Operating Systems Lecture 01 Introduction CS3026 Operating Systems Lecture 01 One or more CPUs Device controllers (I/O modules) Memory Bus Operating system? Computer System What is an Operating System An Operating System is a program

More information

High-Performance Key-Value Store on OpenSHMEM

High-Performance Key-Value Store on OpenSHMEM High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background

More information

Lock Oscillation: Boosting the Performance of Concurrent Data Structures

Lock Oscillation: Boosting the Performance of Concurrent Data Structures Lock Oscillation: Boosting the Performance of Concurrent Data Structures Panagiota Fatourou FORTH ICS & University of Crete Nikolaos D. Kallimanis FORTH ICS The Multicore Era The dominance of Multicore

More information

Course Syllabus. Operating Systems

Course Syllabus. Operating Systems Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling

More information

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional

More information

Application Programming

Application Programming Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris

More information