Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL

Size: px
Start display at page:

Download "Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL"

Transcription

1 Whatever can go wrong will go wrong. attributed to Edward A. Murphy Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL 251

2 Literature Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Florian Negele. Combining Lock-Free Programming with Cooperative Multitasking for a Portable Multiprocessor Runtime System. ETH-Zürich, A substantial part of the following material is based on Florian Negele's Thesis. Florian Negele, Felix Friedrich, Suwon Oh and Bernhard Egger, On the Design and Implementation of an Efficient Lock-Free Scheduler, 19th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP)

3 Problems with Locks Deadlock Livelock Starvation Parallelism? Progress Guarantees? Reentrancy? Granularity? Fault Tolerance?

4 Politelock 254

5 Lock-Free 255

6 Definitions Lock-freedom: at least one algorithm makes progress even if other algorithms run concurrently, fail or get suspended. Implies system-wide progress but not freedom from starvation. implies Wait-freedom: each algorithm eventually makes progress. Implies freedom from starvation. 256

7 Progress Conditions Blocking Non-Blocking Someone make progress Everyone makes progress Deadlock-free Starvation-free Lock-free Wait-free Art of Multiprocessor Programming 257

8 Goals Lock Freedom Portability Progress Guarantees Reentrant Algorithms Hardware Independence Simplicity, Maintenance

9 Guiding principles 1. Keep things simple 2. Exclusively employ non-blocking algorithms in the system Use implicit cooperative multitasking no virtual memory limits in optimization

10 Where are the Locks in the Kernel? Scheduling Queues / Heaps object header PP P P ready queues array NIL NIL NIL P P P Memory Management 260

11 atomic CAS (again) Compare old with data at memory location If and only if data at memory equals old overwrite data with new int CAS (memref a, int old, int new) previous = mem[a]; if (old == previous) Mem[a] = new; return previous; Return previous memory value CAS is implemented wait-free(!) by hardware. Parallel Programming SS

12 Memory Model for Lockfree Active Oberon Only two rules 1. Data shared between two or more activities at the same time has to be protected using exclusive blocks unless the data is read or modified using the compare-and-swap operation 2. Changes to shared data visible to other activities after leaving an exclusive block or executing a compare-and-swap operation. Implementations are free to reorder all other memory accesses as long as their effect equals a sequential execution within a single activity. 262

13 Inbuilt CAS CAS instruction as statement of the language PROCEDURE CAS(variable, old, new: BaseType): BaseType Operation executed atomically, result visible instantaneously to other processes CAS(variable, x, x) constitutes an atomic read Compiler required to implement CAS as a synchronisation barrier Portability, even for non-blocking algorithms Consistent view on shared data, even for systems that represent words using bytes 263

14 Simple Example: Non-blocking counter PROCEDURE Increment(VAR counter: LONGINT): LONGINT; VAR previous, value: LONGINT; BEGIN REPEAT previous := CAS(counter,0,0); value := CAS(counter, previous, previous + 1); UNTIL value = previous; return previous; END Increment; 264

15 Lock-Free Programming Performance of CAS on the H/W level, CAS triggers a memory barrier performance suffers with increasing number of contenders to the same variable Successful CAS Operations [10 6 ] #Processors

16 CAS with backoff Successful CAS Operations [10 6 ] constant backoff with 10 6 iterations 10 5 iterations 10 4 iterations 10 3 iterations #Processors 266

17 Stack Node = POINTER TO RECORD item: Object; next: Node; END; Stack = OBJECT VAR top: Node; PROCEDURE Pop(VAR head: Node): BOOLEAN; PROCEDURE Push(head: Node); END; top item next item next item next NIL 267

18 Stack -- Blocking PROCEDURE Push(node: Node): BOOLEAN; BEGIN{EXCLUSIVE} node.next := top; top := node; END Push; PROCEDURE Pop(VAR head: Node): BOOLEAN; VAR next: Node; BEGIN{EXCLUSIVE} head := top; IF head = NIL THEN RETURN FALSE ELSE top := head.next; RETURN TRUE; END; END Pop; 268

19 Stack -- Lockfree PROCEDURE Pop(VAR head: Node): BOOLEAN; VAR next: Node; BEGIN LOOP head := CAS(top, NIL, NIL); IF head = NIL THEN RETURN FALSE END; next := CAS(head.next, NIL, NIL); IF CAS(top, head, next) = head THEN RETURN TRUE END; CPU.Backoff END; END Pop; head top next A B C NIL 269

20 Stack -- Lockfree PROCEDURE Push(new: Node); BEGIN LOOP head := CAS(top, NIL, NIL); CAS(new.next, new.next, head); IF CAS(top, head, new) = head THEN EXIT END; CPU.Backoff; END; END Push; new top head A B C NIL 270

21 Node Reuse Assume we do not want to allocate a new node for each Push and maintain a Node-pool instead. Does this work? NO! WHY NOT? 271

22 ABA Problem Thread X in the middle of pop: after read but before CAS Thread Y pops A Thread Z pushes B Pool Thread Z' pushes A Thread X completes pop head Pool A head A top A top A top B top B top B next next NIL NIL NIL NIL NIL time

23 The ABA-Problem "The ABA problem... occurs when one activity fails to recognise that a single memory location was modified temporarily by another activity and therefore erroneously assumes that the overal state has not been changed." X observes Variable V as A meanwhile V changes to B..... and back to A X observes A again and assumes the state is unchanged A B A A time 273

24 How to solve the ABA problem? DCAS (double compare and swap) not available on most platforms Hardware transactional memory not available on most platforms Garbage Collection relies on the existence of a GC impossible to use in the inner of a runtime kernel can you implement a lock-free garbage collector relying on garbage collection? Pointer Tagging does not cure the problem, rather delay it can be practical Hazard Pointers 274

25 Pointer Tagging ABA problem usually occurs with CAS on pointers Aligned addresses (values of pointers) make some bits available for pointer tagging. Example: pointer aligned modulo 32 5 bits available for tagging MSB... X X X X X X X X Each time a pointer is stored in a data structure, the tag is increased by one. Access to a data structure via address x x mod 32 This makes the ABA problem very much less probable because now 32 versions of each pointer exist. 275

26 Hazard Pointers The ABA problem stems from reuse of a pointer P that has been read by some thread X but not yet written with CAS by the same thread. Modification takes place meanwhile by some other thread Y. Idea to solve: Before X reads P, it marks it hazarduous by entering it in a threaddedicated slot of the n (n= number threads) slots of an array associated with the data structure (e.g. the stack) When finished (after the CAS), process X removes P from the array Before a process Y tries to reuse P, it checks all entries of the hazard array 276

27 Unbounded Queue (FIFO) item item item item item item first last 277

28 Enqueue case last!= NIL item item item item item item 1 new first 2 last case last = NIL new 1 first 2 last 278

29 Dequeue last!= first item item item item item item first 1 last last == first item 1 2 first last 279

30 Naive Approach Enqueue (q, new) REPEAT last := CAS(q.last, NIL, NIL); e1 UNTIL CAS(q.last, last, new) = last; IF last!= NIL THEN e2 CAS(last.next, NIL, new); ELSE e3 CAS(q.first, NIL, new); END Dequeue (q) REPEAT first= CAS(q.first, null, null); d1 IF first = NIL THEN RETURN NIL END; next = CAS(first.next, NIL,NIL) d2 UNTIL CAS(q.first, first, next) = first; IF next == NIL THEN d3 CAS(q.last, first, NIL); END A B A A B A first e1 last first last first last first + e3 e1 + e2 d2 + d3 d2 last 280

31 Scenario Process P enqueues A Process Q dequeues A A A first last first last first last first last initial P: Q: e1 d1 e3 P: 281

32 Scenario Process P enqueues A Process Q dequeues B B A B A B A first last first last first last first last initial P: e1 Q: d2 P: e2 282

33 Analysis The problem is that enqueue and dequeue do under some circumstances have to update several pointers at once [first, last, next] The transient inconsistency can lead to permanent data structure corruption Solutions to this particular problem are not easy to find if no double compare and swap (or similar) is available Need another approach: Decouple enqueue and dequeue with a sentinel. A consequence is that the queue cannot be in-place. 283

34 Queues with Sentinel sentinel item S next A B C Queue empty: Queue nonempty: Invariants: first = last first # last first # NIL last # NIL first last 284

35 Node Reuse simple idea: link from node to item and from item to node 2 B 285

36 Enqueue and Dequeue with Sentinel S A B C next S A B first last first last Item enqueued together with associated node. A becomes the new sentinel. S associated with free item. 286

37 Enqueue 2 3 PROCEDURE Enqueue- (item: Item; VAR queue: Queue); VAR node, last, next: Node; BEGIN node := Allocate(); node.item := Item: LOOP last := CAS (queue.last, NIL, NIL); next := CAS (last.next, NIL, node); IF next = NIL THEN EXIT END; Set last node's next pointer IF CAS (queue.last, last, next) # last THEN CPU.Backoff END; END; ASSERT (CAS (queue.last, last, node) # NIL); END Enqueue; If setting last pointer failed, then help other processes to update last node Progress guarantee B last Set last node, can fail but then others have already helped C 287

38 Dequeue 1 2 PROCEDURE Dequeue- (VAR item: Item; VAR queue: Queue): BOOLEAN; VAR first, next, last: Node; BEGIN LOOP first := CAS (queue.first, NIL, NIL); next := CAS (first.next, NIL, NIL); IF next = NIL THEN RETURN FALSE END; last := CAS (queue.last, first, next); item := next.item; IF CAS (queue.first, first, next) = first THEN EXIT END; CPU.Backoff; END; item.node := first; RETURN TRUE; END Dequeue; Remove potential inconsistency, help other processes to set last pointer set first pointer first associate node with first S A B last 288

39 ABA Problems of unbounded lock-free queues unboundedness dynamic memory allocation is inevitable if the memory system is not lock-free, we are back to square 1 reusing nodes to avoid memory issues causes the ABA problem (where?!) Employ Hazard Pointers now. 289

40 Hazard Pointers Store pointers of memory references about to be accessed by a thread Memory allocation checks all hazard pointers to avoid the ABA problem thread A - hp1 - hp2 thread B - hp1 - hp2 thread C - hp1 - hp2 Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping!

41 Key idea of Cooperative MT & Lock-free Algorithms Use the guarantees of cooperative multitasking to implement efficient unbounded lock-free queues

42 Time Sharing time user mode thread A thread B timer IRQ kernel mode - save processor registers (assembly) - call timer handler (assembly) - lock scheduling queue - pick new process to schedule - unlock scheduling queue - restore processor registers (assembly) - interrupt return (assembly) inherently hardware dependent (timer programming context save/restore) inherently non-parallel (scheduler lock)

43 Cooperative Multitasking user mode function call user mode hardware independent (no timer required, standard procedure calling convention takes care of register save/restore) time thread A thread B - save processor registers (assembly) - call timer handler (assembly) - lock scheduling queue - pick new process to schedule (lockfree) - unlock scheduling queue - switch base pointer - return from function call finest granularity (no lock)

44 Implicit Cooperative Multitasking Ensure cooperation Compiler automatically inserts code at specific points in the code Details Each process has a quantum At regular intervals, the compiler inserts code to decrease the quantum and calls the scheduler if necessary implicit cooperative multitasking AMD64

45 uncooperative PROCEDURE Enqueue- (item: Item; VAR queue: Queue); BEGIN {UNCOOPERATIVE}... (* no scheduling here! *)... END Enqueue; zero overhead processor local "locks" 295

46 Implicit Cooperative Multitasking Pros Cons extremely light-weight cost of a regular function call allow for global optimization calls to scheduler known to the compiler zero overhead processor local locks overhead of inserted scheduler code currently sacrifice one hardware register (rcx) require a special compiler and access to the source code

47 Cooperative MT & Lock-free Algorithms Guarantees of cooperative MT No more than M threads are executing inside an uncooperative block (M = # of processors) No thread switch occurs while a thread is running on a processor hazard pointers can be associated with the processor Number of hazard pointers limited by M Search time constant thread-local storage processor local storage

48 No Interrupts? Device drivers are interrupt-driven breaks all assumptions made so far (number of contenders limited by the number of processors) Key idea: model interrupt handlers as virtual processors M = # of physical processors + # of potentially concurrent interrupts

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs 3.

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs 3. Whatever can go wrong will go wrong. attributed to Edward A. Murphy Murphy was an optimist. authors of lock-free programs 3. LOCK FREE KERNEL 309 Literature Maurice Herlihy and Nir Shavit. The Art of Multiprocessor

More information

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2 Hazard Pointers Store pointers of memory references about to be accessed by a thread Memory allocation checks all hazard pointers to avoid the ABA problem thread A - hp1 - hp2 thread B - hp1 - hp2 thread

More information

On the Design and Implementation of an Efficient Lock-Free Scheduler

On the Design and Implementation of an Efficient Lock-Free Scheduler On the Design and Implementation of an Efficient Lock-Free Scheduler Florian Negele 1, Felix Friedrich 1, Suwon Oh 2, and Bernhard Egger 2 1 Dept. of Computer Science, ETH Zürich, Switzerland 2 Dept. of

More information

On the Design and Implementation of an Efficient Lock-Free Scheduler

On the Design and Implementation of an Efficient Lock-Free Scheduler On the Design and Implementation of an Efficient Lock-Free Scheduler Florian Negele 1, Felix Friedrich 1,SuwonOh 2, and Bernhard Egger 2(B) 1 Department of Computer Science, ETH Zürich, Zürich, Switzerland

More information

Non-blocking Array-based Algorithms for Stacks and Queues!

Non-blocking Array-based Algorithms for Stacks and Queues! Non-blocking Array-based Algorithms for Stacks and Queues! Niloufar Shafiei! Department of Computer Science and Engineering York University ICDCN 09 Outline! Introduction! Stack algorithm! Queue algorithm!

More information

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei Non-blocking Array-based Algorithms for Stacks and Queues Niloufar Shafiei Outline Introduction Concurrent stacks and queues Contributions New algorithms New algorithms using bounded counter values Correctness

More information

Concurrent Queues and Stacks. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Queues and Stacks. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit The Five-Fold Path Coarse-grained locking Fine-grained locking Optimistic synchronization

More information

LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list

LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list Lockless Algorithms CAS based algorithms stack order linked list CS4021/4521 2017 jones@scss.tcd.ie School of Computer Science and Statistics, Trinity College Dublin 2-Jan-18 1 Obstruction, Lock and Wait

More information

Fine-grained synchronization & lock-free programming

Fine-grained synchronization & lock-free programming Lecture 17: Fine-grained synchronization & lock-free programming Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Minnie the Moocher Robbie Williams (Swings Both Ways)

More information

Programming Paradigms for Concurrency Lecture 3 Concurrent Objects

Programming Paradigms for Concurrency Lecture 3 Concurrent Objects Programming Paradigms for Concurrency Lecture 3 Concurrent Objects Based on companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Thomas Wies New York University

More information

Concurrent Queues, Monitors, and the ABA problem. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Queues, Monitors, and the ABA problem. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Queues, Monitors, and the ABA problem Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Queues Often used as buffers between producers and consumers

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

Unit 6: Indeterminate Computation

Unit 6: Indeterminate Computation Unit 6: Indeterminate Computation Martha A. Kim October 6, 2013 Introduction Until now, we have considered parallelizations of sequential programs. The parallelizations were deemed safe if the parallel

More information

Overview of Lecture 4. Memory Models, Atomicity & Performance. Ben-Ari Concurrency Model. Dekker s Algorithm 4

Overview of Lecture 4. Memory Models, Atomicity & Performance. Ben-Ari Concurrency Model. Dekker s Algorithm 4 Concurrent and Distributed Programming http://fmt.cs.utwente.nl/courses/cdp/ Overview of Lecture 4 2 Memory Models, tomicity & Performance HC 4 - Tuesday, 6 December 2011 http://fmt.cs.utwente.nl/~marieke/

More information

Design of Concurrent and Distributed Data Structures

Design of Concurrent and Distributed Data Structures METIS Spring School, Agadir, Morocco, May 2015 Design of Concurrent and Distributed Data Structures Christoph Kirsch University of Salzburg Joint work with M. Dodds, A. Haas, T.A. Henzinger, A. Holzer,

More information

An Almost Non- Blocking Stack

An Almost Non- Blocking Stack An Almost Non- Blocking Stack Hans-J. Boehm HP Labs 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Motivation Update data structures

More information

Lecture 10: Avoiding Locks

Lecture 10: Avoiding Locks Lecture 10: Avoiding Locks CSC 469H1F Fall 2006 Angela Demke Brown (with thanks to Paul McKenney) Locking: A necessary evil? Locks are an easy to understand solution to critical section problem Protect

More information

Linked Lists: The Role of Locking. Erez Petrank Technion

Linked Lists: The Role of Locking. Erez Petrank Technion Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing

More information

Lock-Free Techniques for Concurrent Access to Shared Objects

Lock-Free Techniques for Concurrent Access to Shared Objects This is a revised version of the previously published paper. It includes a contribution from Shahar Frank who raised a problem with the fifo-pop algorithm. Revised version date: sept. 30 2003. Lock-Free

More information

Fine-grained synchronization & lock-free data structures

Fine-grained synchronization & lock-free data structures Lecture 19: Fine-grained synchronization & lock-free data structures Parallel Computer Architecture and Programming Redo Exam statistics Example: a sorted linked list struct Node { int value; Node* next;

More information

An Introduction to Parallel Systems

An Introduction to Parallel Systems Lecture 4 - Shared Resource Parallelism University of Bath December 6, 2007 When Week 1 Introduction Who, What, Why, Where, When? Week 2 Data Parallelism and Vector Processors Week 3 Message Passing Systems

More information

30. Parallel Programming IV

30. Parallel Programming IV 924 30. Parallel Programming IV Futures, Read-Modify-Write Instructions, Atomic Variables, Idea of lock-free programming [C++ Futures: Williams, Kap. 4.2.1-4.2.3] [C++ Atomic: Williams, Kap. 5.2.1-5.2.4,

More information

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations

More information

Chapter 6: Process [& Thread] Synchronization. CSCI [4 6] 730 Operating Systems. Why does cooperation require synchronization?

Chapter 6: Process [& Thread] Synchronization. CSCI [4 6] 730 Operating Systems. Why does cooperation require synchronization? Chapter 6: Process [& Thread] Synchronization CSCI [4 6] 730 Operating Systems Synchronization Part 1 : The Basics Why is synchronization needed? Synchronization Language/Definitions:» What are race conditions?»

More information

A Non-Blocking Concurrent Queue Algorithm

A Non-Blocking Concurrent Queue Algorithm A Non-Blocking Concurrent Queue Algorithm Bruno Didot bruno.didot@epfl.ch June 2012 Abstract This report presents a new non-blocking concurrent FIFO queue backed by an unrolled linked list. Enqueue and

More information

! Why is synchronization needed? ! Synchronization Language/Definitions: ! How are locks implemented? Maria Hybinette, UGA

! Why is synchronization needed? ! Synchronization Language/Definitions: ! How are locks implemented? Maria Hybinette, UGA Chapter 6: Process [& Thread] Synchronization CSCI [4 6] 730 Operating Systems Synchronization Part 1 : The Basics! Why is synchronization needed?! Synchronization Language/Definitions:» What are race

More information

Virtual Machine Design

Virtual Machine Design Virtual Machine Design Lecture 4: Multithreading and Synchronization Antero Taivalsaari September 2003 Session #2026: J2MEPlatform, Connected Limited Device Configuration (CLDC) Lecture Goals Give an overview

More information

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 9: Multiprocessor OSs & Synchronization CSC 469H1F Fall 2006 Angela Demke Brown The Problem Coordinated management of shared resources Resources may be accessed by multiple threads Need to control

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution

More information

Blocking Non-blocking Caveat:

Blocking Non-blocking Caveat: Overview of Lecture 5 1 Progress Properties 2 Blocking The Art of Multiprocessor Programming. Maurice Herlihy and Nir Shavit. Morgan Kaufmann, 2008. Deadlock-free: some thread trying to get the lock eventually

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),

More information

Concurrent Objects and Linearizability

Concurrent Objects and Linearizability Chapter 3 Concurrent Objects and Linearizability 3.1 Specifying Objects An object in languages such as Java and C++ is a container for data. Each object provides a set of methods that are the only way

More information

Order Is A Lie. Are you sure you know how your code runs?

Order Is A Lie. Are you sure you know how your code runs? Order Is A Lie Are you sure you know how your code runs? Order in code is not respected by Compilers Processors (out-of-order execution) SMP Cache Management Understanding execution order in a multithreaded

More information

a. A binary semaphore takes on numerical values 0 and 1 only. b. An atomic operation is a machine instruction or a sequence of instructions

a. A binary semaphore takes on numerical values 0 and 1 only. b. An atomic operation is a machine instruction or a sequence of instructions CSE 306 -- Operating Systems Spring 2002 Solutions to Review Questions for the Final Exam 1. [20 points, 1 each] rue or False, circle or F. a. A binary semaphore takes on numerical values 0 and 1 only.

More information

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Real world concurrency Understanding hardware architecture What is contention How to

More information

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling.

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling. Background The Critical-Section Problem Background Race Conditions Solution Criteria to Critical-Section Problem Peterson s (Software) Solution Concurrent access to shared data may result in data inconsistency

More information

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott Presentation by Joe Izraelevitz Tim Kopp Synchronization Primitives Spin Locks Used for

More information

The Universality of Consensus

The Universality of Consensus Chapter 6 The Universality of Consensus 6.1 Introduction In the previous chapter, we considered a simple technique for proving statements of the form there is no wait-free implementation of X by Y. We

More information

2.3. ACTIVITY MANAGEMENT

2.3. ACTIVITY MANAGEMENT 2.3. ACTIVITY MANAGEMENT 204 Life Cycle of Activities Running Awaiting Condition Awaiting Lock NIL Ready Terminated 205 Runtime Data Structures running array global 1 2 3 4 processor id object header object

More information

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008 Agenda Threads CSCI 444/544 Operating Systems Fall 2008 Thread concept Thread vs process Thread implementation - user-level - kernel-level - hybrid Inter-process (inter-thread) communication What is Thread

More information

CS 241 Honors Concurrent Data Structures

CS 241 Honors Concurrent Data Structures CS 241 Honors Concurrent Data Structures Bhuvan Venkatesh University of Illinois Urbana Champaign March 27, 2018 CS 241 Course Staff (UIUC) Lock Free Data Structures March 27, 2018 1 / 43 What to go over

More information

Lock-Free Concurrent Data Structures, CAS and the ABA-Problem

Lock-Free Concurrent Data Structures, CAS and the ABA-Problem Lock-Free Concurrent Data Structures, CAS and the ABA-Problem 2 Motivation Today almost all PCs and Laptops have a multi-core ( e.g. quad-core ) processor Dr. Wolfgang Koch using SMP (symmetric multiprocessing)

More information

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts. CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.

More information

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski Operating Systems Design Fall 2010 Exam 1 Review Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 To a programmer, a system call looks just like a function call. Explain the difference in the underlying

More information

Allocating memory in a lock-free manner

Allocating memory in a lock-free manner Allocating memory in a lock-free manner Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers

More information

Lecture 5: Synchronization w/locks

Lecture 5: Synchronization w/locks Lecture 5: Synchronization w/locks CSE 120: Principles of Operating Systems Alex C. Snoeren Lab 1 Due 10/19 Threads Are Made to Share Global variables and static objects are shared Stored in the static

More information

Boot Procedure. BP (boot processor) Start BIOS Firmware Load A2 Bootfile Initialize modules Module Machine Module Heaps

Boot Procedure. BP (boot processor) Start BIOS Firmware Load A2 Bootfile Initialize modules Module Machine Module Heaps Boot rocedure Start BIOS Firmware Load A2 Bootfile Initialize modules Module Machine Module Heaps Module Objects Setup scheduler and self process Module Kernel Start all processors Module Bootconsole read

More information

Lecture Notes on Advanced Garbage Collection

Lecture Notes on Advanced Garbage Collection Lecture Notes on Advanced Garbage Collection 15-411: Compiler Design André Platzer Lecture 21 November 4, 2010 1 Introduction More information on garbage collection can be found in [App98, Ch 13.5-13.7]

More information

Review: Easy Piece 1

Review: Easy Piece 1 CS 537 Lecture 10 Threads Michael Swift 10/9/17 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1 Review: Easy Piece 1 Virtualization CPU Memory Context Switch Schedulers

More information

Advance Operating Systems (CS202) Locks Discussion

Advance Operating Systems (CS202) Locks Discussion Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static

More information

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract A simple correctness proof of the MCS contention-free lock Theodore Johnson Krishna Harathi Computer and Information Sciences Department University of Florida Abstract Mellor-Crummey and Scott present

More information

CSE 153 Design of Operating Systems Fall 2018

CSE 153 Design of Operating Systems Fall 2018 CSE 153 Design of Operating Systems Fall 2018 Lecture 5: Threads/Synchronization Implementing threads l Kernel Level Threads l u u All thread operations are implemented in the kernel The OS schedules all

More information

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Processes and Non-Preemptive Scheduling. Otto J. Anshus Processes and Non-Preemptive Scheduling Otto J. Anshus Threads Processes Processes Kernel An aside on concurrency Timing and sequence of events are key concurrency issues We will study classical OS concurrency

More information

CPSC/ECE 3220 Summer 2017 Exam 2

CPSC/ECE 3220 Summer 2017 Exam 2 CPSC/ECE 3220 Summer 2017 Exam 2 Name: Part 1: Word Bank Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are

More information

Synchronization I. Jo, Heeseung

Synchronization I. Jo, Heeseung Synchronization I Jo, Heeseung Today's Topics Synchronization problem Locks 2 Synchronization Threads cooperate in multithreaded programs To share resources, access shared data structures Also, to coordinate

More information

Running. Time out. Event wait. Schedule. Ready. Blocked. Event occurs

Running. Time out. Event wait. Schedule. Ready. Blocked. Event occurs Processes ffl Process: an abstraction of a running program. ffl All runnable software is organized into a number of sequential processes. ffl Each process has its own flow of control(i.e. program counter,

More information

CMSC 341. Linked Lists, Stacks and Queues. CMSC 341 Lists, Stacks &Queues 1

CMSC 341. Linked Lists, Stacks and Queues. CMSC 341 Lists, Stacks &Queues 1 CMSC 341 Linked Lists, Stacks and Queues CMSC 341 Lists, Stacks &Queues 1 Goal of the Lecture To complete iterator implementation for ArrayList Briefly talk about implementing a LinkedList Introduce special

More information

Operating Systems Comprehensive Exam. Spring Student ID # 3/16/2006

Operating Systems Comprehensive Exam. Spring Student ID # 3/16/2006 Operating Systems Comprehensive Exam Spring 2006 Student ID # 3/16/2006 You must complete all of part I (60%) You must complete two of the three sections in part II (20% each) In Part I, circle or select

More information

The Five-Fold Path. Queues & Stacks. Bounded vs Unbounded. Blocking vs Non-Blocking. Concurrent Queues and Stacks. Another Fundamental Problem

The Five-Fold Path. Queues & Stacks. Bounded vs Unbounded. Blocking vs Non-Blocking. Concurrent Queues and Stacks. Another Fundamental Problem Concurrent Queues and Stacks The Five-Fold Path Coarse-grained locking Fine-grained locking Optimistic synchronization Lazy synchronization Lock-free synchronization 2 Another Fundamental Problem Queues

More information

Synchronization for Concurrent Tasks

Synchronization for Concurrent Tasks Synchronization for Concurrent Tasks Minsoo Ryu Department of Computer Science and Engineering 2 1 Race Condition and Critical Section Page X 2 Synchronization in the Linux Kernel Page X 3 Other Synchronization

More information

Modern High-Performance Locking

Modern High-Performance Locking Modern High-Performance Locking Nir Shavit Slides based in part on The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Locks (Mutual Exclusion) public interface Lock { public void lock();

More information

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 4 - Concurrency and Synchronization Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Mutual exclusion Hardware solutions Semaphores IPC: Message passing

More information

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion

More information

Module 14: "Directory-based Cache Coherence" Lecture 31: "Managing Directory Overhead" Directory-based Cache Coherence: Replacement of S blocks

Module 14: Directory-based Cache Coherence Lecture 31: Managing Directory Overhead Directory-based Cache Coherence: Replacement of S blocks Directory-based Cache Coherence: Replacement of S blocks Serialization VN deadlock Starvation Overflow schemes Sparse directory Remote access cache COMA Latency tolerance Page migration Queue lock in hardware

More information

Spin Locks and Contention

Spin Locks and Contention Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified for Software1 students by Lior Wolf and Mati Shomrat Kinds of Architectures

More information

1 Process Coordination

1 Process Coordination COMP 730 (242) Class Notes Section 5: Process Coordination 1 Process Coordination Process coordination consists of synchronization and mutual exclusion, which were discussed earlier. We will now study

More information

PROCESS STATES AND TRANSITIONS:

PROCESS STATES AND TRANSITIONS: The kernel contains a process table with an entry that describes the state of every active process in the system. The u area contains additional information that controls the operation of a process. The

More information

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430 Implementing Mutual Exclusion Sarah Diesburg Operating Systems CS 3430 From the Previous Lecture The too much milk example shows that writing concurrent programs directly with load and store instructions

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

Advanced Topic: Efficient Synchronization

Advanced Topic: Efficient Synchronization Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is

More information

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points Name: CS520 Final Exam 12 December 2018, 120 minutes, 26 questions, 100 points The exam is closed book and notes. Please keep all electronic devices turned off and out of reach. Note that a question may

More information

Concurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Companion slides for The by Maurice Herlihy & Nir Shavit Concurrent Computation memory object object 2 Objectivism What is a concurrent object? How do we describe one? How do we implement

More information

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention

More information

IT 540 Operating Systems ECE519 Advanced Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (5 th Week) (Advanced) Operating Systems 5. Concurrency: Mutual Exclusion and Synchronization 5. Outline Principles

More information

Processes and Threads Implementation

Processes and Threads Implementation Processes and Threads Implementation 1 Learning Outcomes An understanding of the typical implementation strategies of processes and threads Including an appreciation of the trade-offs between the implementation

More information

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0;

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0; 1 Solution: a lock (a/k/a mutex) class BasicLock { public: virtual void lock() =0; virtual void unlock() =0; ; 2 Using a lock class Counter { public: int get_and_inc() { lock_.lock(); int old = count_;

More information

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014 GC (Chapter 14) Eleanor Ainy December 16 th 2014 1 Outline of Today s Talk How to use parallelism in each of the 4 components of tracing GC: Marking Copying Sweeping Compaction 2 Introduction Till now

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

Lecture 10: Multi-Object Synchronization

Lecture 10: Multi-Object Synchronization CS 422/522 Design & Implementation of Operating Systems Lecture 10: Multi-Object Synchronization Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous

More information

Intel Thread Building Blocks, Part IV

Intel Thread Building Blocks, Part IV Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for

More information

Multitasking. Embedded Systems

Multitasking. Embedded Systems Multitasking in Embedded Systems 1 / 39 Multitasking in Embedded Systems v1.0 Multitasking in ES What is Singletasking? What is Multitasking? Why Multitasking? Different approaches Realtime Operating Systems

More information

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized

More information

Universality of Consensus. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Universality of Consensus. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Universality of Consensus Companion slides for The by Maurice Herlihy & Nir Shavit Turing Computability 1 0 1 1 0 1 0 A mathematical model of computation Computable = Computable on a T-Machine 2 Shared-Memory

More information

Using Software Transactional Memory In Interrupt-Driven Systems

Using Software Transactional Memory In Interrupt-Driven Systems Using Software Transactional Memory In Interrupt-Driven Systems Department of Mathematics, Statistics, and Computer Science Marquette University Thesis Defense Introduction Thesis Statement Software transactional

More information

Analyzing Real-Time Systems

Analyzing Real-Time Systems Analyzing Real-Time Systems Reference: Burns and Wellings, Real-Time Systems and Programming Languages 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich Real-Time Systems Definition Any system

More information

* What are the different states for a task in an OS?

* What are the different states for a task in an OS? * Kernel, Services, Libraries, Application: define the 4 terms, and their roles. The kernel is a computer program that manages input/output requests from software, and translates them into data processing

More information

ArdOS The Arduino Operating System Reference Guide Contents

ArdOS The Arduino Operating System Reference Guide Contents ArdOS The Arduino Operating System Reference Guide Contents 1. Introduction... 2 2. Error Handling... 2 3. Initialization and Startup... 2 3.1 Initializing and Starting ArdOS... 2 4. Task Creation... 3

More information

Dealing with Issues for Interprocess Communication

Dealing with Issues for Interprocess Communication Dealing with Issues for Interprocess Communication Ref Section 2.3 Tanenbaum 7.1 Overview Processes frequently need to communicate with other processes. In a shell pipe the o/p of one process is passed

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

Dr. Rafiq Zakaria Campus. Maulana Azad College of Arts, Science & Commerce, Aurangabad. Department of Computer Science. Academic Year

Dr. Rafiq Zakaria Campus. Maulana Azad College of Arts, Science & Commerce, Aurangabad. Department of Computer Science. Academic Year Dr. Rafiq Zakaria Campus Maulana Azad College of Arts, Science & Commerce, Aurangabad Department of Computer Science Academic Year 2015-16 MCQs on Operating System Sem.-II 1.What is operating system? a)

More information

1995 Paper 10 Question 7

1995 Paper 10 Question 7 995 Paper 0 Question 7 Why are multiple buffers often used between producing and consuming processes? Describe the operation of a semaphore. What is the difference between a counting semaphore and a binary

More information

Midterm Exam Amy Murphy 6 March 2002

Midterm Exam Amy Murphy 6 March 2002 University of Rochester Midterm Exam Amy Murphy 6 March 2002 Computer Systems (CSC2/456) Read before beginning: Please write clearly. Illegible answers cannot be graded. Be sure to identify all of your

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

Lecture 4: Mechanism of process execution. Mythili Vutukuru IIT Bombay

Lecture 4: Mechanism of process execution. Mythili Vutukuru IIT Bombay Lecture 4: Mechanism of process execution Mythili Vutukuru IIT Bombay Low-level mechanisms How does the OS run a process? How does it handle a system call? How does it context switch from one process to

More information

Following are a few basic questions that cover the essentials of OS:

Following are a few basic questions that cover the essentials of OS: Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.

More information

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Anders Gidenstam Håkan Sundell Philippas Tsigas School of business and informatics University of Borås Distributed

More information

Threads Tuesday, September 28, :37 AM

Threads Tuesday, September 28, :37 AM Threads_and_fabrics Page 1 Threads Tuesday, September 28, 2004 10:37 AM Threads A process includes an execution context containing Memory map PC and register values. Switching between memory maps can take

More information

Marwan Burelle. Parallel and Concurrent Programming

Marwan Burelle.   Parallel and Concurrent Programming marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 Solutions Overview 1 Sharing Data First, always try to apply the following mantra: Don t share data! When non-scalar data are

More information

Programming Languages

Programming Languages TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK Programming Languages Concurrency: Atomic Executions, Locks and Monitors Dr. Michael Petter Winter term 2016 Atomic Executions, Locks and Monitors

More information

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017 CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Outline o Process concept o Process creation o Process states and scheduling o Preemption and context switch o Inter-process communication

More information