Lecture 10: Avoiding Locks

Similar documents
Locking: A necessary evil? Lecture 10: Avoiding Locks. Priority Inversion. Deadlock

CS5460/6460: Operating Systems. Lecture 11: Locking. Anton Burtsev February, 2014

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Abstraction, Reality Checks, and RCU

Tom Hart, University of Toronto Paul E. McKenney, IBM Beaverton Angela Demke Brown, University of Toronto

Dealing with Issues for Interprocess Communication

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Synchronization for Concurrent Tasks

Conflict Detection and Validation Strategies for Software Transactional Memory

Chapter 6: Process Synchronization

Advance Operating Systems (CS202) Locks Discussion

Lesson 6: Process Synchronization

Fine-grained synchronization & lock-free data structures

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Timers 1 / 46. Jiffies. Potent and Evil Magic

Kernel Synchronization I. Changwoo Min

Other consistency models

Fine-grained synchronization & lock-free programming

CONCURRENT ACCESS TO SHARED DATA STRUCTURES. CS124 Operating Systems Fall , Lecture 11

Kernel Scalability. Adam Belay

CHAPTER 6: PROCESS SYNCHRONIZATION

CS 241 Honors Concurrent Data Structures

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

RCU in the Linux Kernel: One Decade Later

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list

Lecture 10: Multi-Object Synchronization

Read-Copy Update (RCU) Don Porter CSE 506

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

CS5460: Operating Systems

Chapter 5: Process Synchronization. Operating System Concepts Essentials 2 nd Edition

CMSC421: Principles of Operating Systems

Chapter 6: Process Synchronization

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

CS370 Operating Systems

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Chapter 6 Process Synchronization

CS510 Concurrent Systems. Jonathan Walpole

Advanced Topic: Efficient Synchronization

IV. Process Synchronisation

CS5460/6460: Operating Systems. Lecture 14: Scalability techniques. Anton Burtsev February, 2014

Readers-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Chapter 6: Process Synchronization. Module 6: Process Synchronization

Chapter 6: Synchronization. Chapter 6: Synchronization. 6.1 Background. Part Three - Process Coordination. Consumer. Producer. 6.

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling.

Chapter 6: Process Synchronization

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Concurrency Race Conditions and Deadlocks

Concurrent Preliminaries

Synchronization I. Jo, Heeseung

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

How to hold onto things in a multiprocessor world

Ext3/4 file systems. Don Porter CSE 506

The Deadlock Lecture

Background. Old Producer Process Code. Improving the Bounded Buffer. Old Consumer Process Code

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

CROWDMARK. Examination Midterm. Spring 2017 CS 350. Closed Book. Page 1 of 30. University of Waterloo CS350 Midterm Examination.

Lecture 16/17: Distributed Shared Memory. CSC 469H1F Fall 2006 Angela Demke Brown

An Introduction to Parallel Systems

Reminder from last time

Module 6: Process Synchronization

Threads Tuesday, September 28, :37 AM

Read-Copy Update (RCU) Don Porter CSE 506

CS 537 Lecture 11 Locks

CS370 Operating Systems

Semaphores. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSE 410 Final Exam Sample Solution 6/08/10

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Chapter 6: Process Synchronization. Operating System Concepts 8 th Edition,

Concurrency: Mutual Exclusion (Locks)

Scalable Concurrent Hash Tables via Relativistic Programming

CS-537: Midterm Exam (Spring 2009) The Future of Processors, Operating Systems, and You

Process & Thread Management II. Queues. Sleep() and Sleep Queues CIS 657

Process & Thread Management II CIS 657

CPSC/ECE 3220 Summer 2017 Exam 2

CSE 120 Principles of Operating Systems

CS 111. Operating Systems Peter Reiher

Linux Synchronization Mechanisms in Driver Development. Dr. B. Thangaraju & S. Parimala

Marwan Burelle. Parallel and Concurrent Programming

Concurrency: Locks. Announcements

CSE 4/521 Introduction to Operating Systems

CSC369H1 S2016 Midterm Test Instructor: Bogdan Simion. Duration 50 minutes Aids allowed: none. Student number:

ECE 574 Cluster Computing Lecture 8

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004

Review: Test-and-set spinlock

Virtual Machine Design

Synchronization COMPSCI 386

Design of Operating System

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Computer Systems Assignment 4: Scheduling and I/O

Intro to Threads. Two approaches to concurrency. Threaded multifinger: - Asynchronous I/O (lab) - Threads (today s lecture)

CS533 Concepts of Operating Systems. Jonathan Walpole

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL

Implementation and Evaluation of the Synchronization Protocol Immediate Priority Ceiling in PREEMPT-RT Linux

Scalable Locking. Adam Belay

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

Transcription:

Lecture 10: Avoiding Locks CSC 469H1F Fall 2006 Angela Demke Brown (with thanks to Paul McKenney)

Locking: A necessary evil? Locks are an easy to understand solution to critical section problem Protect shared data from corruption due to simultaneous updates Protect against inconsistent views of intermediate states But locks have lots of problems Deadlock Priority inversion Not fault tolerant Convoying Expensive, even when uncontended Not easy to use correctly!

Deadlock Textbook definition: Set of threads blocked waiting for event that can only be caused by another thread in the same set Classic example: P Get A... Get B Self-deadlock also a big issue A B Q Get B... Get A Thread holds lock on shared data structure and is interrupted Interrupt handler needs same lock! Solutions exist (e.g., disable interrupts while holding lock), but add complexity

Priority Inversion Lower priority thread gets spinlock Higher priority thread becomes runnable and preempts it needs lock, starts spinning lock Lock holder can t run and release lock May get to run on another CPU lock spin Solutions exist (e.g. disable preemption while holding spinlock, implement priority inheritance, etc.), but add complexity

Not fault tolerant Lock holder crashes, or suffers indefinite delay, no one makes progress lock lock spin lock spin

Convoying Suppose we have set of threads, similar work per thread, but started at different times, occasionally accessing shared data E.g. multi-threaded web server Expect access to shared objects (and hence times when locks are needed) to be spread out over time Delay of lock holder allows other threads to catch up Lock becomes contended and tends to stay that way lock unlock lock spin enter unlock lock spin enter unlock

Expensive, even when uncontended McKenney, 2005

Causes: Deeper Memory Hierarchy Then: CPU CPU Now: CPU L1 L2 L3 CPU L1 L2 L3 Memory Memory Memory speeds have not kept up with CPU speeds 1984: no caches needed, since instructions slower than memory accesses 2005: 3-4 level cache hierarchies, since instructions orders of magnitude faster than memory accesses Synch. ops typically execute at memory speed

Causes: Deeper Pipelines Then: Fetch Execute Retire Now: 1984: Many cycles per instruction 2005: Many instructions per cycle 20 stage pipelines CPU logic executes instructions out-of-order to keep pipeline full Synchronization instructions must not be reordered Or you could execute instructions inside c.s. without completing entry instructions So synchronization stalls the pipeline

Performance Main issue with lock performance used to be contention Techniques were developed to reduce overheads in contended case Today, issue is degraded performance even when locks are always available Together with other concerns about locks Quick look at lock performance

Hash Table Microbenchmark Read only Best case with brlock gets only 2X speedup on 4 CPUs Linux Big Reader Lock, per-cpu reader lock, writers must acquire all McKenney, 2005

Locks: A necessary evil? Idea: Don t lock if we don t need to Non-Blocking Synchronization (NBS) non blocking refers to progress guarantees in the presence of thread failures; it does not mean that individual threads do not sleep or get interrupted Wait-free everyone makes progress Lock-free someone makes progress Obstruction-free someone makes progress in the absence of contention We won t worry about these distinctions Use lockless to describe strategies that avoid locking

NBS Basics Make change optimistically, roll back and retry if conflict detected atomic_inc(int *counter) { int value; do { value = *counter; } while (!CAS(counter, value, value+1); } Complex updates (e.g. modifying multiple values in a structure) are hidden behind a single commit point using atomic instructions

Example: Stack Data Structure Lock-based synchronization: typedef struct node_s { int val; struct node_s *next; } node_t; typedef struct stack_s { node_t *top; lock_t *stack_lock; } stack_t; void push(stack_t S, node_t *n) { lock(s.stack_lock); n->next = S.top; S.top=n; unlock(s.stack_lock); } node_t* pop(stack_t S){ node_t *rnode = NULL; lock(s.stack_lock); if (S.top!= NULL) { rnode = S.top; S.top = S.top->next; } unlock(s.stack_lock); return rnode; }

Non-blocking stack (take 1) typedef struct node_s { int val; struct node_s *next; } node_t; typedef node_t *stack_t; void push(stack_t *S, node_t *n) { node_t *first; do { first = *S; n->next = first; } while (!CAS(S,first,n)); } node_t* pop(stack_t *S) { node_t *first, *second; do { first = *S; if (first!= NULL) { second = first->next; } else return NULL; } while (!CAS(S,first,second)); return first; } What s wrong?

ABA Problem CAS(x, y, z) succeeds if value stored at x matches y A C D Ti: pop() first second (int) (resume) first second A N D Tj: a = pop(); b = pop(); push(n); push(a);

One Solution Include a version number with every pointer pointer_t = <pointer, version> Increment version number (atomically) every time you modify pointer Change to version number guarantees CAS will fail if pointer has changed Requires double-word CAS operation (not every architecture provides this) May restrict reuse of memory

Using NBS Good for simple data structures, update heavy When you need NBS constraints/guarantees Progress in face of failure Linearizability Everyone agrees on all intermediate states Both constraints are often irrelevant!

Constraints Irrelevant? Real systems don t fail the way theoretical ones do Software bugs are not always fail-stop Preemption/interrupt is not a failure And can be controlled by system programmer or scheduler-conscious synchronization Page fault is not a failure Over-provision memory if shared data really is paged out, it will have to be brought into memory before progress is made anyway Don t always need intermediate states, just final Linearizability implies dependency limits parallelism If events are unrelated, asynchronous, does it matter which happened first?