Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Similar documents
Multiprocessor Synchronization

M4 Parallelism. Implementation of Locks Cache Coherence

Lecture 19: Coherence and Synchronization. Topics: synchronization primitives (Sections )

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

SHARED-MEMORY COMMUNICATION

Module 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms.

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism

Lecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections )

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Role of Synchronization. CS 258 Parallel Computer Architecture Lecture 23. Hardware-Software Trade-offs in Synchronization and Data Layout

Lecture 33: Multiprocessors Synchronization and Consistency Professor Randy H. Katz Computer Science 252 Spring 1996

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

The MESI State Transition Graph

Chapter 5. Thread-Level Parallelism

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

Concurrency: a crash course

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

Lecture 19: Synchronization. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

Today: Synchronization. Recap: Synchronization

Cache Coherence and Atomic Operations in Hardware

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville. Review: Small-Scale Shared Memory

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Chapter-4 Multiprocessors and Thread-Level Parallelism

Concurrent Preliminaries

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CS5460: Operating Systems

Lecture 25: Multiprocessors

LOCK PREDICTION TO REDUCE THE OVERHEAD OF SYNCHRONIZATION PRIMITIVES. A Thesis ANUSHA SHANKAR

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Mutex Implementation

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology

Synchronization. Erik Hagersten Uppsala University Sweden. Components of a Synchronization Even. Need to introduce synchronization.

Synchronization for Concurrent Tasks

[537] Locks. Tyler Harter

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Page 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology

Spin Locks and Contention

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

Synchronization. CSE 2431: Introduction to Operating Systems Reading: Chapter 5, [OSC] (except Section 5.10)

Spinlocks. Spinlocks. Message Systems, Inc. April 8, 2011

CPE 631 Lecture 21: Multiprocessors

Synchronization COMPSCI 386

Spin Locks and Contention. Companion slides for Chapter 7 The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

Announcements. Office hours. Reading. W office hour will be not starting next week. Chapter 7 (this whole week) CMSC 412 S02 (lect 7)

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Other consistency models

Reminder from last time

Advance Operating Systems (CS202) Locks Discussion

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018

Lecture 25: Thread Level Parallelism -- Synchronization and Memory Consistency

C09: Process Synchronization

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Distributed Operating Systems

Multiprocessors and Locking

5008: Computer Architecture

Advanced Operating Systems (CS 202)

Threads. Concurrency. What it is. Lecture Notes Week 2. Figure 1: Multi-Threading. Figure 2: Multi-Threading

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

Locks. Dongkun Shin, SKKU

Week 3. Locks & Semaphores

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

Computer Architecture Lecture 10: Thread Level Parallelism II (Chapter 5) Chih Wei Liu 劉志尉 National Chiao Tung University

Adaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Lecture: Consistency Models, TM

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Synchronization. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. P&H Chapter 2.11

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors

Shared Memory Programming Models I

Distributed Operating Systems

Lecture #7: Implementing Mutual Exclusion

EEC 581 Computer Architecture. Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6)

Shared Memory Multiprocessors

G52CON: Concepts of Concurrency

Spin Locks and Contention. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Handout 3 Multiprocessor and thread level parallelism

Synchronization. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. P&H Chapter 2.11

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

CSC2/458 Parallel and Distributed Systems Scalable Synchronization

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

Large Scale Multiprocessors and Scientific Applications. By Pushkar Ratnalikar Namrata Lele

Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Computer Science 146. Computer Architecture

Transcription:

Synchronization Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data. Coherency protocols do not: make sure that only one thread accesses shared data or a shared hardware or software resource at a time Critical sections order thread access to shared data force threads to start executing particular sections of code together Barriers force threads to start executing particular sections of code together Winter 2006 CSE 548 - Synchronization 1

Critical Sections A critical section a sequence of code that only one thread can execute at a time provides mutual exclusion a thread has exclusive access to the code & the data that it accesses guarantees that only one thread can update the data at a time to execute a critical section, a thread acquires a lock that guards it executes its code releases the lock The effect is to synchronize/order the access of threads wrt their accessing shared data Winter 2006 CSE 548 - Synchronization 2

Barriers Barrier synchronization a barrier: point in a program which all threads must reach before any thread can cross threads reach the barrier & then wait until all other threads arrive all threads are released at once & begin executing code beyond the barrier example implementation of a barrier: set a lock-protected counter to the number of processors each thread (assuming 1/processor) decrements it when the lock value becomes 0, all threads have crossed the barrier code that implements a barrier is a critical section useful for: programs that execute in phases synchronizing after a parallel loop Winter 2006 CSE 548 - Synchronization 3

Locking Locking facilitates access to a critical section. Locking protocol: synchronization variable or lock 0: lock is available 1: lock is unavailable because another thread holds it a thread obtains the lock before it can enter a critical section sets the lock to 1 thread releases the lock before it leaves the critical section clears the lock Winter 2006 CSE 548 - Synchronization 4

Acquiring a Lock Atomic exchange instruction: swap a value in a register & a value in memory in one operation set the register to 1 swap the register value & the lock value in memory new register value determines whether got the lock AcquireLock: li R3, #1 /* create lock value swap R3, 0(R4) /* exchange register & lock bnez R3, AcquireLock /* have to try again */ also known as atomic read-modify-write a location in memory Other examples test & set: tests the value in a memory location & sets it to 1 fetch & increment: returns the value of a memory location + 1 Winter 2006 CSE 548 - Synchronization 5

Releasing a Lock Store a 0 in the lock Winter 2006 CSE 548 - Synchronization 6

Load-linked & Store Conditional Performance problem with atomic read-modify-write: 2 memory operations in one must hold the bus until both operations complete Pair of instructions appears atomic avoids need for uninterruptible memory read & write load-locked & store-conditional load-locked returns the original (lock) value in memory if the contents of lock memory has not changed when the storeconditional is executed, the processor still has the lock store-conditional returns a 1 if successful GetLk: li R3, #1 /* create lock value ll R2, 0(R1) /* read lock variable... sc R3, 0(R1) /* try to lock it beqz R3, GetLk /* cleared if sc failed... (critical section) Winter 2006 CSE 548 - Synchronization 7

Load-linked & Store Conditional Implemented with special lock-flag & lock-address registers load-locked sets lock-address register to memory address & lockflag register to 1 store-conditional updates memory if lock-flag register is still set & returns lock-flag register value to store register lock-flag register cleared when the address is written by another processor lock-flag register cleared if context switch or interrupt Winter 2006 CSE 548 - Synchronization 8

Synchronization APIs User-level software synchronization library routines constructed with atomic hardware primitives spin locks busywaiting until obtain the lock contention with atomic exchange causes invalidations (for the write) & coherency misses (for the rereads) avoid if separate reading the lock & testing it spinning done in the cache rather than over the bus getlk: li R2, #1 spinloop: ll R1, lockvariable blbs R1, spinloop sc R2, lockvariable beqz R2, getlk... (critical section) st R0, lockvariable blocking locks block the thread after a certain number of spins Winter 2006 CSE 548 - Synchronization 9

Synchronization Performance An example overall synchronization/coherence strategy: design cache coherency protocol for little interprocessor contention for locks (the common case) add techniques to avoid performance loss if there is contention for a lock & still provide low latency if no contention Have a race condition for acquiring a lock when it is unlocked O(n 2 ) bus transactions for n contending processors (writeinvalidate) exponential back-off - software solution each processor retries at a different time successive retries done an exponentially increasing time later queuing locks - hardware solution lock is passed from unlocking processor to waiting processor also addresses fairness Winter 2006 CSE 548 - Synchronization 10

Atomic Exchange in Practice Alpha load-linked, store-conditional UltraSPARCs (V9 architecture) several primitives compare & swap, test & set, etc. Pentium Pro compare & swap Winter 2006 CSE 548 - Synchronization 11