Locking: A necessary evil? Lecture 10: Avoiding Locks. Priority Inversion. Deadlock

Similar documents
Lecture 10: Avoiding Locks

Abstraction, Reality Checks, and RCU

CS5460/6460: Operating Systems. Lecture 11: Locking. Anton Burtsev February, 2014

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Tom Hart, University of Toronto Paul E. McKenney, IBM Beaverton Angela Demke Brown, University of Toronto

Conflict Detection and Validation Strategies for Software Transactional Memory

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Dealing with Issues for Interprocess Communication

Read-Copy Update (RCU) Don Porter CSE 506

CONCURRENT ACCESS TO SHARED DATA STRUCTURES. CS124 Operating Systems Fall , Lecture 11

Lesson 6: Process Synchronization

Timers 1 / 46. Jiffies. Potent and Evil Magic

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Kernel Scalability. Adam Belay

CS510 Concurrent Systems. Jonathan Walpole

Chapter 6: Process Synchronization

Kernel Synchronization I. Changwoo Min

Fine-grained synchronization & lock-free data structures

Chapter 5: Process Synchronization. Operating System Concepts Essentials 2 nd Edition

Synchronization for Concurrent Tasks

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

CS370 Operating Systems

RCU in the Linux Kernel: One Decade Later

CS 241 Honors Concurrent Data Structures

How to hold onto things in a multiprocessor world

Ext3/4 file systems. Don Porter CSE 506

CHAPTER 6: PROCESS SYNCHRONIZATION

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Fine-grained synchronization & lock-free programming

LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list

Read-Copy Update (RCU) Don Porter CSE 506

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Lecture 10: Multi-Object Synchronization

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

Chapter 6: Process Synchronization

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Advance Operating Systems (CS202) Locks Discussion

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling.

CS5460: Operating Systems

Chapter 6: Synchronization. Chapter 6: Synchronization. 6.1 Background. Part Three - Process Coordination. Consumer. Producer. 6.

Computer Systems Assignment 4: Scheduling and I/O

Background. Old Producer Process Code. Improving the Bounded Buffer. Old Consumer Process Code

Chapter 6: Process Synchronization

Great Reality #2: You ve Got to Know Assembly Does not generate random values Arithmetic operations have important mathematical properties

The Deadlock Lecture

Threads Tuesday, September 28, :37 AM

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 410 Final Exam Sample Solution 6/08/10

Chapter 6 Process Synchronization

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

CS370 Operating Systems

CS 326: Operating Systems. CPU Scheduling. Lecture 6

Advanced Topic: Efficient Synchronization

Virtual Machine Design

Lecture 16/17: Distributed Shared Memory. CSC 469H1F Fall 2006 Angela Demke Brown

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004

Review for Midterm. Starring Ari and Tyler

CSC369H1 S2016 Midterm Test Instructor: Bogdan Simion. Duration 50 minutes Aids allowed: none. Student number:

CS 153 Design of Operating Systems Winter 2016

Intro to Threads. Two approaches to concurrency. Threaded multifinger: - Asynchronous I/O (lab) - Threads (today s lecture)

CS 111. Operating Systems Peter Reiher

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Semaphores. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Process & Thread Management II. Queues. Sleep() and Sleep Queues CIS 657

Process & Thread Management II CIS 657

CSEN 602-Operating Systems, Spring 2018 Practice Assignment 2 Solutions Discussion:

THREADS & CONCURRENCY

CS370 Operating Systems

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Threads and Parallelism in Java

Other consistency models

Liveness properties. Deadlock

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

Process Synchronization: Semaphores. CSSE 332 Operating Systems Rose-Hulman Institute of Technology

Chapter 6: Process Synchronization. Module 6: Process Synchronization

Scalable Concurrent Hash Tables via Relativistic Programming

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

CMSC421: Principles of Operating Systems

OPERATING SYSTEMS. Goals of the Course. This lecture will cover: This Lecture will also cover:

Midterm 1, CSE 451, Winter 2001 (Prof. Steve Gribble)

CS420: Operating Systems. Process Synchronization

Mutex Implementation

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

GFS: The Google File System

IV. Process Synchronisation

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

6.033: Fault Tolerance: Isolation Lecture 17 Katrina LaCurts,

RACE CONDITIONS AND SYNCHRONIZATION

Understanding Hardware Transactional Memory

CS-537: Midterm Exam (Spring 2009) The Future of Processors, Operating Systems, and You

Concurrent & Distributed Systems Supervision Exercises

Monitors; Software Transactional Memory

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

Reminder from last time

Read-Copy Update (RCU) for C++

Transcription:

Locking: necessary evil? Lecture 10: voiding Locks CSC 469H1F Fall 2006 ngela Demke Brown (with thanks to Paul McKenney) Locks are an easy to understand solution to critical section problem Protect shared data from corruption due to simultaneous updates Protect against inconsistent views of intermediate states But locks have lots of problems Deadlock Priority inversion Not fault tolerant Convoying Expensive, even when uncontended Not easy to use correctly! Deadlock Textbook definition: Set of threads blocked waiting for event that can only be caused by another thread in the same set Classic example: P Get... Get B B Q Get B... Get Self-deadlock also a big issue Thread holds lock on shared data structure and is interrupted Interrupt handler needs same lock! Solutions exist (e.g., disable interrupts while holding lock), but add complexity Priority Inversion Lower priority thread gets lock Higher priority thread becomes runnable and preempts it needs lock, starts ning Lock holder can t run and release lock May get to run on another Solutions exist (e.g. disable preemption while holding lock, implement priority inheritance, etc.), but add complexity 1

Not fault tolerant Convoying Lock holder crashes, or suffers indefinite delay, no one makes progress Suppose we have set of threads, similar work per thread, but started at different times, occasionally accessing shared data E.g. multi-threaded web server Expect access to shared objects (and hence times when locks are needed) to be spread out over time Delay of lock holder allows other threads to catch up Lock becomes contended and tends to stay that way enter enter Expensive, even when uncontended Causes: Deeper Memory Hierarchy Then: Now: L1 L2 L3 L1 L2 L3 Memory Memory McKenney, 2005 Memory speeds have not kept up with speeds 1984: no caches needed, since instructions slower than memory accesses 2005: 3-4 level cache hierarchies, since instructions orders of magnitude faster than memory accesses Synch. ops typically execute at memory speed 2

Then: Fetch Causes: Deeper Pipelines Execute Retire Now: 1984: Many cycles per instruction 2005: Many instructions per cycle 20 stage pipelines logic executes instructions out-of-order to keep pipeline full Synchronization instructions must not be reordered Or you could execute instructions inside c.s. without completing entry instructions So synchronization stalls the pipeline Performance Main issue with lock performance used to be contention Techniques were developed to reduce overheads in contended case Today, issue is degraded performance even when locks are always available Together with other concerns about locks Quick look at lock performance Hash Table Microbenchmark Read only Best case with brlock gets only 2X speedup on 4 s Linux Big Reader Lock, per-cpu reader lock, writers must acquire all Locks: necessary evil? Idea: Don t lock if we don t need to Non-Blocking Synchronization (NBS) non blocking refers to progress guarantees in the presence of thread failures; it does not mean that individual threads do not sleep or get interrupted Wait-free everyone makes progress Lock-free someone makes progress Obstruction-free someone makes progress in the absence of contention We won t worry about these distinctions Use lockless to describe strategies that avoid locking McKenney, 2005 3

NBS Basics Make change optimistically, roll back and retry if conflict detected atomic_inc(int *counter) { int value; value = *counter; while (!CS(counter, value, value+1); Complex updates (e.g. modifying multiple values in a structure) are hidden behind a single commit point using atomic instructions Example: Stack Data Structure Lock-based synchronization: typedef struct node_s { int val; struct node_s *next; node_t; typedef struct stack_s { node_t *top; lock_t *stack_lock; stack_t; void push(stack_t S, node_t *n) { lock(s.stack_lock); n->next = S.top; S.top=n; unlock(s.stack_lock); node_t* pop(stack_t S){ node_t *rnode = NULL; lock(s.stack_lock); if (S.top!= NULL) { rnode = S.top; S.top = S.top->next; unlock(s.stack_lock); return rnode; Non-blocking stack (take 1) B Problem typedef struct node_s { int val; struct node_s *next; node_t; typedef node_t *stack_t; void push(stack_t *S, node_t *n) { node_t *first; first = *S; n->next = first; while (!CS(S,first,n)); node_t* pop(stack_t *S) { node_t *first, *second; first = *S; if (first!= NULL) { second = first->next; else return NULL; while (!CS(S,first,second)); return first; What s wrong? CS(x, y, z) succeeds if value stored at x matches y C D Ti: pop() first second (int) (resume) first second N D Tj: a = pop(); b = pop(); push(n); push(a); 4

One Solution Include a version number with every pointer pointer_t = <pointer, version> Increment version number (atomically) every time you modify pointer Change to version number guarantees CS will fail if pointer has changed Requires double-word CS operation (not every architecture provides this) May restrict reuse of memory Using NBS Good for simple data structures, update heavy When you need NBS constraints/guarantees Progress in face of failure Linearizability Everyone agrees on all intermediate states Both constraints are often irrelevant! Constraints Irrelevant? Real systems don t fail the way theoretical ones do Software bugs are not always fail-stop Preemption/interrupt is not a failure nd can be controlled by system programmer or scheduler-conscious synchronization Page fault is not a failure Over-provision memory if shared data really is paged out, it will have to be brought into memory before progress is made anyway Don t always need intermediate states, just final Linearizability implies dependency limits parallelism If events are unrelated, asynchronous, does it matter which happened first? 5