Distributed Operating Systems

Similar documents
Distributed Systems Synchronization. Marcus Völp 2007

Distributed Operating Systems

Distributed Operating Systems Memory Consistency

CS377P Programming for Performance Multicore Performance Synchronization

Advance Operating Systems (CS202) Locks Discussion

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

The complete license text can be found at

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

CSC2/458 Parallel and Distributed Systems Scalable Synchronization

1 RCU. 2 Improving spinlock performance. 3 Kernel interface for sleeping locks. 4 Deadlock. 5 Transactions. 6 Scalable interface design

Fine-grained synchronization & lock-free data structures

Multiprocessors and Locking

Readers-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha

Advanced Topic: Efficient Synchronization

Synchronization I. Jo, Heeseung

Dept. of CSE, York Univ. 1

Scalable Locking. Adam Belay

Fine-grained synchronization & lock-free programming

Other consistency models

G52CON: Concepts of Concurrency

Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Hardware: BBN Butterfly

AST: scalable synchronization Supervisors guide 2002

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

COMP 3430 Robert Guderian

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

[537] Locks. Tyler Harter

l12 handout.txt l12 handout.txt Printed by Michael Walfish Feb 24, 11 12:57 Page 1/5 Feb 24, 11 12:57 Page 2/5

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors

Synchronization for Concurrent Tasks

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott

Lecture 19: Synchronization. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Multiprocessor Systems. Chapter 8, 8.1

Module 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms.

Review: Test-and-set spinlock

Lecture 10: Multi-Object Synchronization

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs

Locks. Dongkun Shin, SKKU

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

SMP & Locking. NICTA 2010 From imagination to impact

CS3733: Operating Systems

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Process Synchronization

Concurrency: Mutual Exclusion (Locks)

Implementing Locks. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

SMP & Locking. NICTA 2010 From imagination to impact

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

Advanced Operating Systems (CS 202)

Threads. Concurrency. What it is. Lecture Notes Week 2. Figure 1: Multi-Threading. Figure 2: Multi-Threading

Reminder from last time

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Systèmes d Exploitation Avancés

CS5460: Operating Systems

6.852: Distributed Algorithms Fall, Class 15

CSE 120 Principles of Operating Systems

CS758 Synchronization. CS758 Multicore Programming (Wood) Synchronization

CPSC/ECE 3220 Summer 2017 Exam 2

Administrivia. Review: Thread package API. Program B. Program A. Program C. Correct answers. Please ask questions on Google group

Preemptive Scheduling and Mutual Exclusion with Hardware Support

Relative Performance of Preemption-Safe Locking and Non-Blocking Synchronization on Multiprogrammed Shared Memory Multiprocessors

Concurrency: Locks. Announcements

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Operating Systems: William Stallings. Starvation. Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

Multiprocessor Synchronization

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 3 Nov 2017

Chapter 6: Process Synchronization

Mutex Implementation

Chapters 5 and 6 Concurrency

Multiprocessor Support

Synchronization 1. Synchronization

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

CS370 Operating Systems

COREMU: a Portable and Scalable Parallel Full-system Emulator

Spinlocks. Spinlocks. Message Systems, Inc. April 8, 2011

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Conventions. Barrier Methods. How it Works. Centralized Barrier. Enhancements. Pitfalls

Dealing with Issues for Interprocess Communication

Programming Languages

Fine-Grained Synchronization and Lock-Free Programming

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Concurrent Preliminaries

Operating Systems. Synchronization

Lecture 5: Synchronization w/locks

CS 537 Lecture 11 Locks

Transcription:

Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 2009 1

Topics Synchronization Locking Analysis / Comparison Distributed Operating Systems 2009 Marcus Völp 2

Overview Introduction Hardware Primitives Locking Spin Lock (Test & Set Lock) Test & Test & Set Lock Ticket Locks MCS Locks Lock-free Synchronization Special Issues Timeouts Reader Writer Locks Lockholder Preemption Monitor, Mwait Performance Distributed Operating Systems 2009 Marcus Völp 3

Introduction An example: Request Queue Circular Buffer Head Tail Request Tail Head free free free free Distributed Operating Systems 2009 Marcus Völp 4

Introduction A 1) A,B create list elements 2) A,B set next pointer to head B Distributed Operating Systems 2009 Marcus Völp 5

Introduction A 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer B Distributed Operating Systems 2009 Marcus Völp 6

Introduction A 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer 4) A set prev pointer B Distributed Operating Systems 2009 Marcus Völp 7

Introduction A B 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer 4) A set prev pointer 5) A update head pointer Distributed Operating Systems 2009 Marcus Völp 8

Introduction A B 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer 4) A set prev pointer 5) A update head pointer 6) B update head pointer Distributed Operating Systems 2009 Marcus Völp 9

Introduction First Solution Locks coarse grain: lock entire list fine grain: lock list elements lock_list; insert_element; unlock_list; lock head if (try_lock head->next) { insert_element unlock head->next unlock head retry if trylock failed Distributed Operating Systems 2009 Marcus Völp 10

Hardware Primitives How to make instructions atomic Bus lock Lock memory bus for duration of single instruction (e.g., lock add) Cache Lock x86 > Pentium 4: processor may not grab the bus lock but execute atomic operation on single cacheline; subsequent accesses to this cacheline are delayed Observe Cache (ARM v6, Alpha, x86: monitor, mwait) Load Linked: Load value and watch location Store Conditional: Store value if no other store has accessed location Atomic operations loads, stores swap (XCHG) bit test and set (BTS) if bit clear then set bit; return true else return false compare and swap (CAS m, old, new) if m == old then m := new ; return true else new := m, return false Distributed Operating Systems 2009 Marcus Völp 11

Load Linked Store Conditional 0 1 (!I:) S S load linked a reg modify reg store conditional reg a Distributed Operating Systems 2009 Marcus Völp 12

Load Linked Store Conditional 0 1 write a S S->M load linked a reg modify reg store a store conditional reg a Distributed Operating Systems 2009 Marcus Völp 13

Load Linked Store Conditional 0 1 write a S I S M load linked a reg modify reg store a store conditional reg a Distributed Operating Systems 2009 Marcus Völp 14

Mutual Exclusion without Locks Last week: Decker / Peterson only atomic stores, atomic loads if memory is sequentially consistent (memory fences otherwise) bool flag[2] = {false, false; int turn = 0; void entersection(int thread) { int other = 1 - thread; /* id of other thread; thread in {0,1*/ flag[thread] = true; /* show interest */ turn= other; /* give precedence to other thread */ while (turn == other && flag[other]) {; /* wait */ void leavesection(int thread) { flag[thread] = false; Distributed Operating Systems 2009 Marcus Völp 15

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap lock (lock_var & l) { do { reg = 1; 0 1 2 3 swap (l, reg) while (reg == 1); unlock (lock_var & l) { l = 0; Distributed Operating Systems 2009 Marcus Völp 16

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap Lock lock (lock_var & l) { do { reg = 1; swap (l, reg) while (reg == 1); 0 1 2 3 l = M l = I l = I l = I unlock (lock_var & l) { l = 0; Distributed Operating Systems 2009 Marcus Völp 17

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap lock (lock_var & l) { do { reg = 1; swap (l, reg) while (reg == 1); 0 Swap 1 Lock 2 3 l = I l = M l = I l = I unlock (lock_var & l) { l = 0; Distributed Operating Systems 2009 Marcus Völp 18

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap Lock Swap lock (lock_var & l) { do { reg = 1; swap (l, reg) while (reg == 1); 0 1 2 3 l = I l = I l = I l = M unlock (lock_var & l) { l = 0; Distributed Operating Systems 2009 Marcus Völp 19

Locking Algorithms Spin Lock (Test and Test and Set Lock) atomic swap Lock lock (lock_var & l) { do { reg = 1; do { while (l == 1); swap (l, reg) while (reg == 1); 0 1 2 3 l = I l = I l = M l = I unlock (lock_var & l) { l = 0; Distributed Operating Systems 2009 Marcus Völp 20

Locking Algorithms Spin Lock (Test and Test and Set Lock) atomic swap lock (lock_var & l) { do { reg = 1; do { while (l == 1); swap (l, reg) while (reg == 1); test 0 test 1 Lock 2 3 l = S l = S l = M l = I unlock (lock_var & l) { l = 0; Distributed Operating Systems 2009 Marcus Völp 21

Locking Algorithms Fairness 0 1 2 3 free lock unlock test test lock test test unlock free test lock test Distributed Operating Systems 2009 Marcus Völp 22

Locking Algorithms Fairness: Ticket Lock fetch and add (xadd) lock_struct { next_ticket, current_ticket ticket_lock (lock_struct & l) { my_ticket = xadd (&l.next_ticket, 1) do { while (l.current_ticket!= my_ticket); unlock (lock_var & l) { current_ticket ++; 0 1 [my_ticket] current next 0 0 L.0 [0]: 0 1 => Lockholder = 0 L.1 [1]: 0 2 L.2 [2]: 0 3 U.0 [0]: 1 3 => Lockholder = 1 L.3 [3]: 1 4 L.0 [4]: 1 5 2 3 U.1 [1]: 2 5 => Lockholder = 2 Distributed Operating Systems 2009 Marcus Völp 23

Locking Algorithms Fairness: Ticket Lock fetch and add (xadd) lock_struct { next_ticket, current_ticket 0 1 2 my_ticket: 0 my_ticket: 1 write 3 ticket_lock (lock_struct & l) { my_ticket = xadd (&l.next_ticket, 1) do { while (l.current_ticket!= my_ticket); 1, 3 updates not required (not next) Spin on global variable unlock (lock_var &l) { current_ticket ++; However: - Signal all s not only next - Preemption of enlisted threads Distributed Operating Systems 2009 Marcus Völp 24

Local Spinning 0 1 2 3 write parallel reads 0 1 Need to propagate write on Bus 2-3 (or 1 3) 2 3 0 1 msg 3 Network Messages 2 3 Distributed Operating Systems 2009 Marcus Völp 25

MCS-Lock by Mellor-Crummey and Scott Fair Lock with Local Spinning 0 1 2 3 l next l next l next Lock L Distributed Operating Systems 2009 Marcus Völp 26

MCS-Lock (Mellor Crummey Scott) Fair Lock with Local Spinning 0 1 2 3 l next l next l next l next Lock L Distributed Operating Systems 2009 Marcus Völp 27

MCS-Lock (Mellor Crummey Scott) Fair Lock with Local Spinning 0 1 2 3 l next l next l next l next Lock L Distributed Operating Systems 2009 Marcus Völp 28

MCS Locks Fair, local spinning atomic compare exchange: cmpxchg (L == Old, New) lock(node * & L, Node * I) { I->next = null; I->lock = false; Node * prev = swap(l, I); If (prev) { prev->next = I; do { while (I->lock == false); unlock (Node * & L, Node * I) { if (!I->next) { if (cmpxchg (L == I, 0)) return; // no waiting cpu do { while (!I->next); // spin until the following process updates the next pointer I->next->lock = true; Distributed Operating Systems 2009 Marcus Völp 29

Lock-free synchronization prev next new insert(new, prev) { retry: new.next = next; if (not CAS(prev.next == next, new)) goto retry; prev next new insert(new, prev) { retry: new.next = next; new.prev = prev; if (not DCAS(prev.next == next && next.prev ==prev, prev.next = new, next.prev = new)) goto retry; Distributed Operating Systems 2009 Marcus Völp 30

Lock-free Synchronization Load Linked, Store Conditional insert (prev, new) { retry: ll (prev.next); new.next = prev.next; if (not stc (prev.next, new)) goto retry; Distributed Operating Systems 2009 Marcus Völp 31

Overview Introduction Hardware Primitives Locking Spin Lock (Test & Set Lock) Test & Test & Set Lock Ticket Locks MCS Locks Lock-free Synchronization Special Issues Timeouts Reader Writer Locks Lockholder Preemption Monitor, Mwait Performance Distributed Operating Systems 2009 Marcus Völp 32

Special Issues Timeouts No longer apply for lock after timeout Dequeue from MCS queue Reader Writer Locks Lock differentiates two types of lockers: reader, writer Multiple readers may hold the lock at the same time A writer can hold the lock only exclusively Fairness Improve reader latency by allowing readers to overtake writers (unfair lock) Distributed Operating Systems 2009 Marcus Völp 33

Special Issues Fair Ticket Reader-Writer Lock combine read, write ticket in single word lock read (next, current) { my_ticket = xadd (next, 1); do { while (current.write!= my_ticket.write); write read lock write (next, current) { my_ticket = xadd (next.write, 1); do { while (current!= my_ticket); unlock_read () { xadd (current.read, 1); current next R0 R1 W2 R3 0 0 0 0 0 0 0 1 0 1 0 2 0 2 1 2 1 2 unlock write () { current.write ++; Distributed Operating Systems 2009 Marcus Völp 34

Special Issues Lockholder preemption Spinning-time of other s increase by the preemption time of lockholders E.g., no packets can be sent when the OS network thread is preempted while it holds the lock of the xmit queue => do not preempt lock holders spin_lock(lock_var) { pushf; // store whether interrupts were already closed do { popf; reg = 1; do { while (lock_var == 1); spin_unlock(lock_var) { pushf; lock_var = 0; cli; popf; swap(lock_var, reg); while (reg == 1); Distributed Operating Systems 2009 Marcus Völp 35

Special Issues Monitor, Mwait Stop / HT while waiting for lock (signal) Saves power Frees up processor resources (HT) Monitor: watch cacheline Mwait: stop / HT until: cacheline has been written, or interrupt occurs while (trigger[0]!= value) { monitor (&trigger[0]) if (trigger[0]!= value) { mwait mwait 0 1 write to trigger t Distributed Operating Systems 2009 Marcus Völp 36

Performance Source: Mellor Crummey, Scott [1990]: Algorithms for Scalable Synchronization on Shared Memory Multiprocessors on BBN Butterfly: 256 nodes, local memory; each node can access other memory through log4(depth) switched network Anderson: array-based queue lock Distributed Operating Systems 2009 Marcus Völp 37

References Scheduler-Conscious Synchronization LEONIDAS I. KONTOTHANASSIS, ROBERT W. WISNIEWSKI, MICHAEL L. SCOTT Scalable Reader- Writer Synchronization for Shared-Memory Multiprocessors John M. Mellor-Crummey, Michael L. Scottt Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors JOHN M. MELLOR-CRUMMEY, MICHAEL L. Concurrent Update on Multiprogrammed Shared Memory Multiprocessors Maged M. Michael, Michael L. Scott Scalable Queue-Based Spin Locks with Timeout Michael L. Scott and William N. Scherer III Reactive Synchronization Algorithms for Multiprocessors B. Lim, A. Agarwal Lock Free Data Structures John D. Valois (PhD Thesis) Distributed Operating Systems 2009 Marcus Völp 38