Lock-free Programming

Similar documents
Program Graph. Lecture 25: Parallelism & Concurrency. Performance. What does it mean?

Concurrent Preliminaries

Marwan Burelle. Parallel and Concurrent Programming

Fine-grained synchronization & lock-free programming

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

Lecture 10: Avoiding Locks

Last Time. Think carefully about whether you use a heap Look carefully for stack overflow Especially when you have multiple threads

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list

Cache Coherence and Atomic Operations in Hardware

Learning from Bad Examples. CSCI 5828: Foundations of Software Engineering Lecture 25 11/18/2014

Lecture 8: September 30

Håkan Sundell University College of Borås Parallel Scalable Solutions AB

Lecture #7: Implementing Mutual Exclusion

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 31 October 2012

Order Is A Lie. Are you sure you know how your code runs?

EECS 482 Introduction to Operating Systems

Other consistency models

Programmazione di sistemi multicore

Fine-grained synchronization & lock-free data structures

3. Concurrency Control for Transactions Part One

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017

Part 1 Fine-grained Operations

Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting

Locking: A necessary evil? Lecture 10: Avoiding Locks. Priority Inversion. Deadlock

15 418/618 Project Final Report Concurrent Lock free BST

Tom Hart, University of Toronto Paul E. McKenney, IBM Beaverton Angela Demke Brown, University of Toronto

G52CON: Concepts of Concurrency

Administrivia. p. 1/20

CS 241 Honors Concurrent Data Structures

Lecture 5: Synchronization w/locks

Scalable Locking. Adam Belay

The Java Memory Model

Summary: Issues / Open Questions:

CS 167 Final Exam Solutions

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Multiprocessor Support

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

FINE-GRAINED SYNCHRONIZATION. Operating Systems Engineering. Sleep & Wakeup [chapter #5] 15-Jun-14. in the previous lecture.

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Concurrent Programming Lecture 10

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

Ext3/4 file systems. Don Porter CSE 506

Control Hazards. Prediction

Computer Architecture and Engineering CS152 Quiz #5 May 2th, 2013 Professor Krste Asanović Name: <ANSWER KEY>

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 5 Programming with Locks and Critical Sections

Linux kernel synchronization. Don Porter CSE 506

Linux kernel synchronization

Parallel Computer Architecture and Programming Written Assignment 3

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Transactionalizing Legacy Code: An Experience Report Using GCC and Memcached. Trilok Vyas, Yujie Liu, and Michael Spear Lehigh University

Rsyslog: going up from 40K messages per second to 250K. Rainer Gerhards

Multiprocessor Synchronization

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Readers-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CPS 310 midterm exam #1, 2/19/2016

(Refer Slide Time: 1:26)

230 Million Tweets per day

CSE 120 Principles of Operating Systems

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 4 Shared-Memory Concurrency & Mutual Exclusion

Split-Ordered Lists: Lock-Free Extensible Hash Tables. Pierre LaBorde

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

Need for synchronization: If threads comprise parts of our software systems, then they must communicate.

CSE332: Data Abstractions Lecture 22: Shared-Memory Concurrency and Mutual Exclusion. Tyler Robison Summer 2010

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014

An Almost Non- Blocking Stack

Problem Set 2. CS347: Operating Systems

CSE380 - Operating Systems

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

EECS 482 Introduction to Operating Systems

Multiprocessors and Locking

Introduction to Embedded Systems. Lab Logistics

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Midterm Exam Solutions and Grading Guidelines March 3, 1999 CS162 Operating Systems

Mutex Implementation

[537] Locks. Tyler Harter

Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures

Today: Segmentation. Last Class: Paging. Costs of Using The TLB. The Translation Look-aside Buffer (TLB)

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

CS 111. Operating Systems Peter Reiher

Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design

G Programming Languages - Fall 2012

Control Hazards. Branch Prediction

Progress Guarantees When Composing Lock-Free Objects

Timers 1 / 46. Jiffies. Potent and Evil Magic

Proving liveness. Alexey Gotsman IMDEA Software Institute

Kernel Scalability. Adam Belay

Review: Easy Piece 1

Verifying Concurrent Programs

Page 1. Goals for Today" Atomic Read-Modify-Write instructions" Examples of Read-Modify-Write "

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Transcription:

Lock-free Programming Nathaniel Wesley Filardo April 10, 2006

Outline Motivation Lock-Free Linked List Insertion Lock-Free Linked List Deletion Some real algorithms?

Motivation Review of atomic primitives Locks can be expensive

Review of atomic primitives ˆ XCHG (ptr, val) atomically: ˆ old val = *ptr ˆ *ptr = val ˆ return old val ˆ CAS (ptr, expect, new) atomically: ˆ if ( *ptr!= expect ) return *ptr; ˆ else return XCHG (ptr, new); ˆ Note that CAS is no harder - it s a read and a write; the logic is free (it s on the chip).

Locks can be expensive ˆ Consider XCHG style locks which use while( xchg( &locked, LOCKED ) == LOCKED ) as their core operation. ˆ Each xchg flushes the processor pipeline... ˆ We could spend a long time here waiting or yielding... ˆ This implies we ll have very high latency on contention...

Lock-Free Linked List Insertion Lock-Free Linked List Node Insertion into a Lock-free Linked List: Successful case Insertion into a Lock-free Linked List: Race case

Lock-Free Linked List Node ˆ Node definition is simple: void* data void* next

Insertion into a Lock-free Linked List: Successful case Setup A next C next B next ˆ Some thread constructs the bottom node B; wishes to place it between the two above, A and C.

Insertion into a Lock-free Linked List: Successful case First step A next C next B next ˆ Thread points B node s next into list at C.

Insertion into a Lock-free Linked List: Successful case First step A next C set by CAS B next next ˆ CAS used to point previous node A to new node B. ˆ... ˆ So wait, what s the cleverness?

Insertion into a Lock-free Linked List: Race case First step B next A next D next C next ˆ Two threads point their respective nodes B and C into list at D ˆ Both of them try to CAS the previous node s (A s) next pointer...

Insertion into a Lock-free Linked List: Race case One thread goes B next CAS A C next next D next ˆ One of the two goes (here the thread owning B won)...

Insertion into a Lock-free Linked List: Race case And the other... B next CAS A CAS fail! C next next D next ˆ And the other (owning C)... ˆ But the expect value doesn t match, so the linked list structure is OK. ˆ So this thread tries again and does the same dance...

That s great! ˆ Yes, if we want an insert-and-read only list, then it s fine! ˆ How many datastructures are like that?

Deletion is easy? ˆ Can we just prune the node? ˆ Given B next A next ˆ Can t we just transition via CAS to C next B next A next C next ˆ Yes, but can we reclaim that memory?

Deletion is easy? Continued ˆ Can t we just transition via CAS to B next A next C next ˆ There might be another thread touching the upper node (B)! ˆ Can t touch that memory at all! ˆ In particular, can t free() it!

Compromise? ˆ So, for a deleted node (often logically deleted node )... ˆ Let s just leave it detatched from the list, marking it somehow as deleted. B INVALID A next C next ˆ Other threads will fail their operations and restart. ˆ We might have a free list of available nodes, even... ˆ Some real-world implementations do this, leaving as an exercise to syncrhonize all threads to delete the the list and free list when everybody s done.

Compromise? Now reusing that memory... ˆ We might have a somewhat complex case of a sorted list 1 next 6 next 5 next

Compromise? Now reusing that memory... ˆ Thread X trying to insert 3 after 1 races against somebody deleting 5. ˆ So we now have 1 next 6 next 3 next 5 INVALID ˆ There is a deleted node ( 5, bottom right) that was the next of 1 when thread X started running

Compromise? Now reusing that memory (part 2) ˆ Thread Y now reclaims deleted node, pushes in 2 and points to 6. ˆ Trying for a sorted list with 1 next 6 next 3 next 2 next ˆ Thread X still trying to insert 3 after 1. Been preempted for a while ˆ Anybody see the problem yet?

Compromise? Now reusing that memory (part 3) ˆ Thread Y now inserts the reclaimed node where it belongs! (using CAS, of course) ˆ Trying for a sorted list with 1 next 6 next 3 next 2 next ˆ Thread X still trying to insert 3 after 1. Been preempted for a while ˆ The dotted line indicates what X expects to see! ˆ How about now?

Compromise? Now reusing that memory (part 4) ˆ Thread X wakes up, and the CAS works (!) giving instead 1 next 6 next 3 next 2 next

Compromise? Earth-Shattering KABOOM! Figure: There was supposed to be an... [mar()]

Compromise? Woah, what just happened? 1 next 6 next 3 next 2 next ˆ But, but, but... {1, 3, 2, 6} isn t sorted! ˆ This is called The ABA problem: the pointer changed meaning but we didn t notice.

Full fledged deletion & reclaim OK, so how do we actually do this? ˆ It turns out that we need a more sophisticated delete function. Look at [Fomitchev and Ruppert(2004)] or [Michael(2002a)] (or others) for more details. ˆ Generation counters are a simple way to solve ABA (usually requires use of CASn - acts on n words at once; much slower than CAS) ˆ But that doesn t solve memory reclaim - for these we need more sophisticated algorithms (which also solve ABA for us): ˆ Hazard Pointers ( Safe Memory Reclaimation or just SMR ) [Michael(2002b)] and [Michael(2004)] ˆ Wait-free reference counters [Sundell(2005)]

Some real algorithms? ˆ [Fomitchev and Ruppert(2004)] gives a simple, non-reclaimable lock-free linked/skip-list algorithm. ˆ [Michael(2002a)] specifies a CAS-based lock-free list-based sets and hash tables using SMR as a refinement of the above. ˆ Their performance figures are worth looking at. Summary: fine-grained locks (lock per node) show linear-time increase with # threads, their algorithm shows essentially constant time!

Marvin the martian, URL http://www.snowflake-designs.com/images/ Marvin%20Martian%201.jpg. M. Fomitchev and E. Ruppert, PODC pp. 50 60 (2004), URL http://www.research.ibm.com/people/m/ michael/podc-2002.pdf. M. M. Michael, SPAA pp. 73 83 (2002a), URL http://portal.acm.org/ft_gateway.cfm?id= 564881\&type=pdf\&coll=GUIDE\&dl=ACM\&CFID= 73232202\&CFTOKEN=1170757. M. M. Michael, PODC pp. 1 10 (2002b), URL http://www.research.ibm.com/people/m/michael/ podc-2002.pdf.

M. M. Michael, IEEECS pp. 1 10 (2004), URL http://www.research.ibm.com/people/m/michael/ podc-2002.pdf. H. Sundell (IEEE, 2005), 1530-2075/05, URL http:// ieeexplore.ieee.org/iel5/9722/30685/01419843. pdf?tp=\&arnumber=1419843\&isnumber=30685. Wikipedia, Lock-free and wait-free algorithms (2006a), URL http://en.wikipedia.org/wiki/lock-free_ and_wait-free_algorithms. Wikipedia, Non-blocking synchronization (2006b), URL http://en.wikipedia.org/wiki/non-blocking_ synchronization.

Acknowledgements ˆ Dave Eckhardt (de0u) and Bruce Maggs (bmm) for moral support and big-picture guidance ˆ Jess Mink (jmink), Matt Brewer (mbrewer), and Mister Wright (mrwright) for being victims of beta versions of this lecture.