Split-Ordered Lists: Lock-Free Extensible Hash Tables. Pierre LaBorde

Similar documents
LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list

Fine-grained synchronization & lock-free programming

Linked Lists: The Role of Locking. Erez Petrank Technion

G52CON: Concepts of Concurrency

Håkan Sundell University College of Borås Parallel Scalable Solutions AB

Hashing and Natural Parallism. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Improving STM Performance with Transactional Structs 1

Fine-grained synchronization & lock-free data structures

Multiprocessor Support

Tom Hart, University of Toronto Paul E. McKenney, IBM Beaverton Angela Demke Brown, University of Toronto

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Concurrent Preliminaries

Part 1: Concepts and Hardware- Based Approaches

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0;

CMSC421: Principles of Operating Systems

Distributed Operating Systems

CS533 Concepts of Operating Systems. Jonathan Walpole

Distributed Systems Synchronization. Marcus Völp 2007

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Concurrency: Mutual Exclusion and Synchronization. Concurrency

Hopscotch Hashing. 1 Introduction. Maurice Herlihy 1, Nir Shavit 2,3, and Moran Tzafrir 3

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Warm-up question (CS 261 review) What is the primary difference between processes and threads from a developer s perspective?

Lecture 10: Multi-Object Synchronization

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

CBPQ: High Performance Lock-Free Priority Queue

High Performance Computing Course Notes Shared Memory Parallel Programming

Process Synchronisation (contd.) Operating Systems. Autumn CS4023

An Introduction to Parallel Systems

Last Class: Synchronization

CS 261 Fall Mike Lam, Professor. Threads

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Advanced Topic: Efficient Synchronization

Synchronization for Concurrent Tasks

Mutual Exclusion and Synchronization

recap, what s the problem Locks and semaphores Total Store Order Peterson s algorithm Johan Montelius 0 0 a = 1 b = 1 read b read a

Multi-core Architecture and Programming

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Synchronization in Concurrent Programming. Amit Gupta

Deterministic Futexes Revisited

Multithreading Programming II

Recitation 14: Proxy Lab Part 2

Building Efficient Concurrent Graph Object through Composition of List-based Set

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 31 October 2012

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

CS5460: Operating Systems

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

Process & Thread Management II. Queues. Sleep() and Sleep Queues CIS 657

Process & Thread Management II CIS 657

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Concurrent & Distributed Systems Supervision Exercises

Concurrency: a crash course

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Locks and semaphores. Johan Montelius KTH

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Lock-Free Concurrent Data Structures, CAS and the ABA-Problem

Locks and semaphores. Johan Montelius KTH

CPSC 261 Midterm 2 Thursday March 17 th, 2016

COL106: Data Structures and Algorithms. Ragesh Jaiswal, IIT Delhi

Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

Concurrency in Java Prof. Stephen A. Edwards

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs 3.

CS377P Programming for Performance Multicore Performance Synchronization

CIS Operating Systems Synchronization based on Busy Waiting. Professor Qiang Zeng Spring 2018

CS4021/4521 INTRODUCTION

IT 540 Operating Systems ECE519 Advanced Operating Systems

CS32 Discussion Week 3

CS420: Operating Systems. Process Synchronization

Marwan Burelle. Parallel and Concurrent Programming

CSE 332: Locks and Deadlocks. Richard Anderson, Steve Seitz Winter 2014

Atomic Transactions in Cilk

Concurrency, Thread. Dongkun Shin, SKKU

Concurrent Computing

Programming Languages

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL

Scalable Concurrent Hash Tables via Relativistic Programming

Concurrency. Part 2, Chapter 10. Roger Wattenhofer. ETH Zurich Distributed Computing

Concurrent Data Structures

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

List: Tree: Students are expected to pay attention and fill in the details. Fundamental dynamic memory structure is: list

Marwan Burelle. Parallel and Concurrent Programming

Mohamed M. Saad & Binoy Ravindran

UNIX Input/Output Buffering

Non-blocking Array-based Algorithms for Stacks and Queues!

Application Programming

Atomicity CS 2110 Fall 2017

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

SPIN, PETERSON AND BAKERY LOCKS

ECE 2035 Programming HW/SW Systems Fall problems, 5 pages Exam Three 20 November 2013

Reminder from last time

An Update on Haskell H/STM 1

Operating Systems. Synchronisation Part I

Introduction to OS Synchronization MOS 2.3

Transcription:

1 Split-Ordered Lists: Lock-Free Extensible Hash Tables Pierre LaBorde

Nir Shavit 2 Tel-Aviv University, Israel Ph.D. from Hebrew University Professor at School of Computer Science at Tel-Aviv University in 1992. 2004 Gödel Prize Winner The Art of Multiprocessor Programming

Outline 3 Hashing Concurrency Algorithm Implementation Performance

Hash Table 4 Map keys to values Map each possible key to a unique slot index Hash collisions are normal Constant average cost per operation Efficient

Hash Table 5 Name Phone Number

Collision Resolution 6 Bucket Chains sorted by the key field Disadvantages of linked lists Next pointer overhead Processor Cache

Chained Hashing 7

Extensible Hash Table 8 Treats a hash as a bit string Soft real-time Array of buckets Only increase in size

Outline 9 Hashing Concurrency Algorithm Implementation Performance

Concurrent Hash Table 10 Operations Insert Delete Find Ability to synchronize

Synchronization 11 Critical section Race condition Locking Mutex

Lock-Free 12 At least one thread will progress Wait-freedom General design problems Long delays Waiting for locks

Difficulty 13 Synchronization problems Deadlock Livelock Starvation Priority Inversion

Avoiding Locks 14 CAS LL/SC Single-word Hardware locks Fine-granularity

Resizing Problem 15 Requires moving items Atomic Other options Helping

Lock-Free Linked List 16 Validity Mark for Deletion Bit stealing Same CAS Straightforward implementation

Michael s Lock-Free LL 17 struct MarkPtrType { <mark, next>: <bool, NodeType *> }; struct NodeType { key_t key; MarkPtrType <mark, next>; }; /* thread-private variables */ MarkPtrType *prev; MarkPtrType <pmark, cur>; MarkPtrType <cmark, next>;

List: Find int list_find(nodetype **head, so_key_t key) { F1: try_again: F2: prev = head; F3: <pmark,cur> = *prev; F4: while(1) { F5: if (cur == NULL) return 0; F6: <cmark,next> = cur-><mark,next>; F7: ckey = cur->key; F8: if (*prev!= <0,cur>) F9: goto try_again; F10: if (!cmark) { F11: if (ckey >= key) F12: return ckey == key; F13: prev = &(cur-><mark,next>); } F14: else { F15: if (CAS(prev, <0,cur>, <0,next>)) F16: delete_node(cur); F17: else goto try_again; } F18: <pmark,cur> = <cmark,next>; } } 18

List: Insert 19 int list_insert(markptrtype *head, NodeType *node) { key = node->key; while (1) { if (list_find(head, key) return 0; node-><mark,next> = <0,cur>; if (CAS(prev, <0,cur>, <0,node>)) return 1; } }

List: Delete 20 int list_delete(markptrtype *head, so_key_t key) { while (1) { if (!list_find(head, key)) return 0; if (!CAS(&(cur-><mark,next>), <0,next>, <1,next>)) continue; if (CAS(prev, <0,cur>, <0,next>)) delete_node(cur); else list_find(head, key); return 1; } }

Outline 21 Hashing Concurrency Algorithm Implementation Performance

Algorithm 22 Split-ordering Avoid resizing problem "moving the buckets among the items

Resizing Revisited 23 Moving an item Ability to split sublists recursively Recursive split-ordering

Split-Ordered Hash Table 24

Split-Ordered Hash Table 25

Split-Ordered Hash Table 26

Split-Ordered Hash Table 27

Insertion 28

Insertion 29

Insertion 30

Insertion 31

Operations 32 Hash to bucket using split-ordering Follow pointer Traverse list

Split-Ordered Hash Table 33

Outline 34 Hashing Concurrency Algorithm Implementation Performance

Implementation 35 Modular design Michael s Lock-Free lists Memory management

Fetch and Increment 36 int fetch-and-inc(int *p) { do { old = *p; } while (!CAS(p, old, old+1); return old; } int fetch-and-dec(int *p) { do { old = *p; } while (!CAS(p, old, old-1); return old; }

Hash Table: Initialize Bucket 37 void initialize_bucket(uint bucket) { B1: parent = GET_PARENT(bucket); B2: if (T[parent] == UNINITIALIZED) B3: initialize_bucket(parent); B4: dummy = new node(so_dummykey(bucket)); B5: if (!list_insert(&(t[parent]), dummy)) { B6: delete dummy; B7: dummy = cur; } B8: T[bucket] = dummy; }

Hash Table: Insert 38 int insert(so_key_t key) { I1: node = new node(so_regularkey(key)); I2: bucket = key % size; I3: if (T[bucket] == UNINITIALIZED) I4: initialize_bucket(bucket); I5: if (!list_insert(&(t[bucket]), node)) { I6: delete_node(node); I7: return 0; } I8: csize = size; I9: if (fetch-and-inc(&count) / csize > MAX_LOAD) I10: CAS(&size, csize, 2 * csize); I11:return 1;}

Hash Table: Find 39 int find(so_key_t key) { S1: bucket = key % size; S2: if (T[bucket] == UNINITIALIZED) S3: initialize_bucket(bucket); S4: return list_find(&(t[bucket]), so_regularkey(key)); }

Hash Table: Delete 40 int delete(so_key_t key) { D1: bucket = key % size; D2: if (T[bucket] == UNINITIALIZED) D3: initialize_bucket(bucket); D4: if (!list_delete(&(t[bucket]), so_regularkey(key))) D5: return 0; D6: fetch-and-dec(&count); D7: return 1; }

Complexity 41 Distribution of keys Scheduling of threads

Outline 42 Hashing Concurrency Algorithm Implementation Performance

Throughput 43

Varying Preinsertions 44

Conclusion 45 Robustness with a non-uniform hash function Performance loss Low-load non-multiprogrammed Medium to high load

Bibliography 46 Split-Ordered Lists: Lock-Free Extensible Hash Tables The Art of Multiprocessor Programming Wikipedia

47 Split-Ordered Lists: Lock-Free Extensible Hash Tables Pierre LaBorde

48

Extendible Hashing for Concurrent Operations and Distributed Data 49

After Splitting 10 b bucket 50

Locks 51

52

53

54

55

56