Distributed Systems Synchronization. Marcus Völp 2007

Similar documents
Distributed Operating Systems

Distributed Operating Systems

Distributed Operating Systems Memory Consistency

Advance Operating Systems (CS202) Locks Discussion

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

CS377P Programming for Performance Multicore Performance Synchronization

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSC2/458 Parallel and Distributed Systems Scalable Synchronization

1 RCU. 2 Improving spinlock performance. 3 Kernel interface for sleeping locks. 4 Deadlock. 5 Transactions. 6 Scalable interface design

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

Readers-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha

Scalable Locking. Adam Belay

The complete license text can be found at

Other consistency models

AST: scalable synchronization Supervisors guide 2002

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Fine-grained synchronization & lock-free data structures

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

Advanced Topic: Efficient Synchronization

Synchronization I. Jo, Heeseung

Dept. of CSE, York Univ. 1

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors

Fine-grained synchronization & lock-free programming

l12 handout.txt l12 handout.txt Printed by Michael Walfish Feb 24, 11 12:57 Page 1/5 Feb 24, 11 12:57 Page 2/5

G52CON: Concepts of Concurrency

Lecture 10: Multi-Object Synchronization

CSE 120 Principles of Operating Systems

GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Synchronization for Concurrent Tasks

[537] Locks. Tyler Harter

Reminder from last time

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Lecture 19: Synchronization. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Hardware: BBN Butterfly

Algorithms for Scalable Lock Synchronization on Shared-memory Multiprocessors

Concurrency: Mutual Exclusion (Locks)

CS5460: Operating Systems

Mutex Implementation

Review: Test-and-set spinlock

Locks. Dongkun Shin, SKKU

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Advanced Operating Systems (CS 202)

Multiprocessors and Locking

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

COMP 3430 Robert Guderian

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Multiprocessor Systems. Chapter 8, 8.1

SMP & Locking. NICTA 2010 From imagination to impact

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

Process Synchronization

CS3733: Operating Systems

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.

6.852: Distributed Algorithms Fall, Class 15

Relative Performance of Preemption-Safe Locking and Non-Blocking Synchronization on Multiprogrammed Shared Memory Multiprocessors

SMP & Locking. NICTA 2010 From imagination to impact

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

Implementing Locks. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

Operating Systems: William Stallings. Starvation. Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

CS758 Synchronization. CS758 Multicore Programming (Wood) Synchronization

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Module 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms.

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Last Class: Synchronization

IV. Process Synchronisation

Synchronization for Concurrent Tasks

Systèmes d Exploitation Avancés

Synchronization 1. Synchronization

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

Concurrency: Locks. Announcements

Threads. Concurrency. What it is. Lecture Notes Week 2. Figure 1: Multi-Threading. Figure 2: Multi-Threading

Programming Languages

A simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract

COREMU: a Portable and Scalable Parallel Full-system Emulator

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Chapters 5 and 6 Concurrency

CS 537 Lecture 11 Locks

Lecture 5: Synchronization w/locks

Operating Systems. Synchronization

Lecture 19: Coherence and Synchronization. Topics: synchronization primitives (Sections )

Multiprocessor Support

CS370 Operating Systems

Preemptive Scheduling and Mutual Exclusion with Hardware Support

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 3 Nov 2017

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Today: Synchronization. Recap: Synchronization

Lecture 10: Avoiding Locks

Lock free Algorithm for Multi-core architecture. SDY Corporation Hiromasa Kanda SDY

Transcription:

Distributed Systems Synchronization Marcus Völp 2007 1

Purpose of this Lecture Synchronization Locking Analysis / Comparison Distributed Operating Systems 2007 Marcus Völp 2

Overview Introduction Hardware Primitives Locking Spin Lock (Test & Set Lock) Test & Test & Set Lock Ticket Locks MCS Locks Lock-free Synchronization Special Issues Timeouts Reader Writer Locks Lockholder Preemption Monitor, Mwait Performance Distributed Operating Systems 2007 Marcus Völp 3

Introduction An example: Request Queue Circular Buffer Head Tail Request Tail Head free free free free Distributed Operating Systems 2007 Marcus Völp 4

Introduction A 1) A,B create list elements 2) A,B set next pointer to head B Distributed Operating Systems 2007 Marcus Völp 5

Introduction A 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer B Distributed Operating Systems 2007 Marcus Völp 6

Introduction A 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer 4) A set prev pointer B Distributed Operating Systems 2007 Marcus Völp 7

Introduction A B 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer 4) A set prev pointer 5) A update head pointer Distributed Operating Systems 2007 Marcus Völp 8

Introduction A B 1) A,B create list elements 2) A,B set next pointer to head 3) B set prev pointer 4) A set prev pointer 5) A update head pointer 6) B update head pointer Distributed Operating Systems 2007 Marcus Völp 9

Introduction First Solution Locks List Lock List Element Lock lock_list; insert_element; unlock list; Distributed Operating Systems 2007 Marcus Völp 10

Hardware Primitives How to make instructions atomic Bus lock Lock memory bus for duration of single instruction (e.g., lock add) Observe Cache (ARM v6, Alpha, x86: monitor, mwait) Load Linked: Load value and watch location Store Conditional: Store value if no other store has accessed location Atomic operations loads, stores swap (XCHG)!! x86 implementation requires no bus lock!! bit test and set (BTS) if bit clear then set bit; return true else return false compare and swap (CAS m, old, new) if m == old then m := new ; return true else new := m, return false Distributed Operating Systems 2007 Marcus Völp 11

Locking Algorithms Peterson's Algorithm Works only for 2 Threads atomic stores, atomic loads sequential consistency (memory fences) bool interested[2]; int blocked; void entersection(int thread) { int other; /* number of other thread */ other = 1 - thread; /* the other thread: 1 for thread 0, 0 for thread 1 */ interested[thread] = true; /* show that you are interested */ blocked=thread; while (blocked == thread && interested[other] == true){;/*wait*/ void leavesection(int thread) { interested[thread] = false; Distributed Operating Systems 2007 Marcus Völp 12

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap lock (lock_var l) { do { reg = 1; swap (l, reg) while (reg == 1); CPU 0 CPU 1 CPU 2 CPU 3 unlock (lock_var l) { l = 0; Distributed Operating Systems 2007 Marcus Völp 13

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap Lock lock (lock_var l) { do { reg = 1; swap (l, reg) while (reg == 1); CPU 0 CPU 1 CPU 2 CPU 3 l = M l = I l = I l = I unlock (lock_var l) { l = 0; Distributed Operating Systems 2007 Marcus Völp 14

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap lock (lock_var l) { do { reg = 1; swap (l, reg) while (reg == 1); Swap Lock CPU 0 CPU 1 CPU 2 CPU 3 l = I l = M l = I l = I unlock (lock_var l) { l = 0; Distributed Operating Systems 2007 Marcus Völp 15

Locking Algorithms Spin Lock (Test and Set Lock) atomic swap Lock Swap lock (lock_var l) { do { reg = 1; swap (l, reg) while (reg == 1); CPU 0 CPU 1 CPU 2 CPU 3 l = I l = I l = I l = M unlock (lock_var l) { l = 0; Distributed Operating Systems 2007 Marcus Völp 16

Locking Algorithms Spin Lock (Test and Test and Set Lock) atomic swap Lock lock (lock_var l) { do { reg = 1; do { while (l == 1); swap (l, reg) while (reg == 1); CPU 0 CPU 1 CPU 2 CPU 3 l = I l = I l = M l = I unlock (lock_var l) { l = 0; Distributed Operating Systems 2007 Marcus Völp 17

Locking Algorithms Spin Lock (Test and Test and Set Lock) atomic swap lock (lock_var l) { do { reg = 1; do { while (l == 1); swap (l, reg) while (reg == 1); test test Lock CPU 0 CPU 1 CPU 2 CPU 3 l = S l = S l = M l = I unlock (lock_var l) { l = 0; Distributed Operating Systems 2007 Marcus Völp 18

Locking Algorithms Fairness CPU 0 CPU 1 CPU 2 CPU 3 free lock unlock test test lock test test unlock free test lock test Distributed Operating Systems 2007 Marcus Völp 19

Locking Algorithms Fairness: Ticket Lock fetch and add (xadd) lock_struct { next_ticket, current_ticket current next 0 0 L.CPU0 [0]: 0 1 => Lockholder = CPU0 L.CPU1 [1]: 0 2 ticket_lock (lock_struct l) { L.CPU2 [2]: 0 3 my_ticket = xadd (l.next_ticket, 1) U.CPU0 [0]: 1 3 => Lockholder = CPU1 do { while (l.current_ticket!= my_ticket); unlock (lock_var l) { current_ticket ++; CPU 0 CPU 1 CPU 2 CPU 3 L.CPU3 [3]: 1 4 L.CPU0 [4]: 1 5 U.CPU1 [1]: 2 5 => Lockholder = CPU 2 Distributed Operating Systems 2007 Marcus Völp 20

Locking Algorithms Fairness: Ticket Lock fetch and add (xadd) lock_struct { next_ticket, current_ticket Spin on global variable CPU 0 CPU 1 CPU 2 CPU 3 my_ticket: 0 my_ticket: 1 write ticket_lock (lock_struct l) { my_ticket = xadd (l.next_ticket, 1) do { while (l.current_ticket!= my_ticket); CPU1, CPU3 updates not required (not next) unlock (lock_var l) { current_ticket ++; Distributed Operating Systems 2007 Marcus Völp 21

Local Spinning CPU 0 CPU 1 CPU 2 CPU 3 write parallel reads CPU 0 CPU 1 Need to propagate write on Bus 2-3 (or 1 3) CPU 2 CPU 3 CPU 0 CPU 1 msg 3 Network Messages CPU 2 CPU 3 Distributed Operating Systems 2007 Marcus Völp 22

MCS-Lock (Mellor Crummey Scott) Fair Lock with Local Spinning CPU 0 CPU 1 CPU 2 CPU 3 l next l next l next Lock L Distributed Operating Systems 2007 Marcus Völp 23

MCS-Lock (Mellor Crummey Scott) Fair Lock with Local Spinning CPU 0 CPU 1 CPU 2 CPU 3 l next l next l next l next Lock L Distributed Operating Systems 2007 Marcus Völp 24

MCS-Lock (Mellor Crummey Scott) Fair Lock with Local Spinning CPU 0 CPU 1 CPU 2 CPU 3 l next l next l next l next Lock L Distributed Operating Systems 2007 Marcus Völp 25

MCS Locks Fair, local spinning atomic compare exchange: cmpxchg (L == Old, New) lock (L, lock_element I) { I.next = null; I.lock = false; prev = xchg (L, I); if (prev!= null) { prev.next = I; do { while (I.lock == false); unlock (L, I) { if (I.next == Null) { if (cmpxchg (L == I, Null)) return; // no waiting cpu do { while (I.next == Null); // spin until the following process updates the next pointer I.next.lock = true; Distributed Operating Systems 2007 Marcus Völp 26

Lock-free synchronization prev next new insert(new, prev) { retry: new.next = next; if (not CAS(prev.next == next, new)) goto retry; prev next new insert(new, prev) { retry: new.next = next; new.prev = prev; if (not DCAS(prev.next == next && next.prev ==prev, prev.next = new, next.prev = new)) goto retry; Distributed Operating Systems 2007 Marcus Völp 27

Lock-free Synchronization Load Linked, Store Conditional insert (prev, new) { retry: ll (prev.next); new.next = prev.next; if (not stc (prev.next, new)) goto retry; Distributed Operating Systems 2007 Marcus Völp 28

Overview Introduction Hardware Primitives Locking Spin Lock (Test & Set Lock) Test & Test & Set Lock Ticket Locks MCS Locks Lock-free Synchronization Special Issues Timeouts Reader Writer Locks Lockholder Preemption Monitor, Mwait Performance Distributed Operating Systems 2007 Marcus Völp 29

Special Issues Timeouts No longer apply for lock after timeout Dequeue from MCS queue Reader Writer Locks Lock differentiates two types of lockers: reader, writer Multiple readers may hold lock at same time Writers hold lock exclusively Fairness Improve reader latency by allowing readers to overtake writers (unfair lock) Distributed Operating Systems 2007 Marcus Völp 30

Special Issues Fair Ticket Reader-Writer Lock combine read, write ticket in single word lock read (next, current) { my_ticket = xadd (next, 1); do { while (current.write!= my_ticket.write); write read lock write (next, current) { my_ticket = xadd (next.write, 1); do { while (current!= my_ticket); unlock_read () { xadd (current.read, 1); current next R0 R1 W2 R3 0 0 0 0 0 0 0 1 0 1 0 2 0 2 1 2 1 2 unlock write () { current.write ++; Distributed Operating Systems 2007 Marcus Völp 31

Special Issues Lockholder preemption Spin-time of other CPUs increases by preemption time of lockholder E.g., no packets can be sent when OS network code is preempted while holding xmit queue lock spin_lock(lock_var) { pushf; // store whether interrupts were already closed do { popf; reg = 1; do { while (lock_var == 1); spin_unlock(lock_var) { pushf; lock_var = 0; cli; popf; swap(lock_var, reg); while (reg == 1); Distributed Operating Systems 2007 Marcus Völp 32

Special Issues Monitor, Mwait Stop CPU / HT while waiting for lock (signal) Saves power Frees up processor resources (HT) Monitor: watch cacheline Mwait: stop CPU / HT until: cacheline has been written, or interrupt occurs while (trigger[0]!= value) { monitor (&trigger[0]) if (trigger[0]!= value) { mwait mwait CPU 0 CPU 1 write to trigger t Distributed Operating Systems 2007 Marcus Völp 33

Performance Source: Mellor Crummey, Scott : Algorithms for Scalable Synchronization on Shared Memory Multiprocessors Distributed Operating Systems 2007 Marcus Völp 34

References Scheduler-Conscious Synchronization LEONIDAS I. KONTOTHANASSIS, ROBERT W. WISNIEWSKI, MICHAEL L. SCOTT Scalable Reader- Writer Synchronization for Shared-Memory Multiprocessors John M. Mellor-Crummey, Michael L. Scottt Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors JOHN M. MELLOR-CRUMMEY, MICHAEL L. Concurrent Update on Multiprogrammed Shared Memory Multiprocessors Maged M. Michael, Michael L. Scott Scalable Queue-Based Spin Locks with Timeout Michael L. Scott and William N. Scherer III Reactive Synchronization Algorithms for Multiprocessors B. Lim, A. Agarwal Lock Free Data Structures John D. Valois (PhD Thesis) Distributed Operating Systems 2007 Marcus Völp 35