Symmetric Multiprocessors: Synchronization and Sequential Consistency

Similar documents
CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Consistency & Coherence. 4/14/2016 Sec5on 12 Colin Schmidt

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS252 Graduate Computer Architecture Fall 2015 Lecture 14: Synchroniza>on and Memory Models

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

Dr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

Beyond Sequential Consistency: Relaxed Memory Models

Relaxed Memory-Consistency Models

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Page 1. Cache Coherence

Using Relaxed Consistency Models

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Computer Architecture and Parallel Computing 并行结构与计算. Lecture 6 Coherence Protocols

CS5460: Operating Systems

Portland State University ECE 588/688. Memory Consistency Models

Memory Consistency Models

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

M4 Parallelism. Implementation of Locks Cache Coherence

Last Class: Synchronization

Shared Memory Consistency Models: A Tutorial

Distributed Operating Systems Memory Consistency

Distributed Shared Memory and Memory Consistency Models

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Lecture 25: Thread Level Parallelism -- Synchronization and Memory Consistency

Overview: Memory Consistency

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Today: Synchronization. Recap: Synchronization

SHARED-MEMORY COMMUNICATION

Systèmes d Exploitation Avancés

Cache Coherence and Atomic Operations in Hardware

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Other consistency models

Concept of a process

Role of Synchronization. CS 258 Parallel Computer Architecture Lecture 23. Hardware-Software Trade-offs in Synchronization and Data Layout

EC 513 Computer Architecture

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

Locks. Dongkun Shin, SKKU

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE 153 Design of Operating Systems

Administrivia. p. 1/20

Relaxed Memory-Consistency Models

Constructing a Weak Memory Model

Lecture #7: Implementing Mutual Exclusion

Advanced Operating Systems (CS 202)

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Relaxed Memory Consistency

Implementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430

Sequential Consistency & TSO. Subtitle

Computer Architecture and Engineering CS152 Quiz #5 May 2th, 2013 Professor Krste Asanović Name: <ANSWER KEY>

Shared Memory Architecture

ECE/CS 757: Advanced Computer Architecture II

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

A Memory Model for RISC-V

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

The Cache-Coherence Problem

CSE Traditional Operating Systems deal with typical system software designed to be:

Preemptive Scheduling and Mutual Exclusion with Hardware Support

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Synchronization I. Jo, Heeseung

Coherence and Consistency

Memory Consistency Models. CSE 451 James Bornholt

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Switch Gear to Memory Consistency

Synchronization. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Reminder from last time

Shared Memory Consistency Models: A Tutorial

Multiprocessors and Locking

MULTITHREADING AND SYNCHRONIZATION. CS124 Operating Systems Fall , Lecture 10

EE382 Processor Design. Processor Issues for MP

Mutex Implementation

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

The MESI State Transition Graph

Problem Set 5 Solutions CS152 Fall 2016

Shared Memory Consistency Models: A Tutorial

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

SPIN, PETERSON AND BAKERY LOCKS

Page 1. Goals for Today" Atomic Read-Modify-Write instructions" Examples of Read-Modify-Write "

Lecture 33: Multiprocessors Synchronization and Consistency Professor Randy H. Katz Computer Science 252 Spring 1996

CS162 Operating Systems and Systems Programming Lecture 7. Mutual Exclusion, Semaphores, Monitors, and Condition Variables

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Lecture 7: Implementing Cache Coherence. Topics: implementation details

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 8: Semaphores, Monitors, & Condition Variables

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Transcription:

Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-1

Symmetric Multiprocessors Processor Processor CPU-Memory bus bridge Memory I/O controller I/O bus I/O controller I/O controller Symmetric? All memory is equally far away from all processors Graphics output Any processor can do any I/O operation Networks November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-2

Synchronization needed even in single-processor systems The need for synchronization arises whenever there are parallel processes in a system Forks and Joins: A parallel process may want to wait until several events have occurred P1 fork P2 Producer-Consumer: A consumer process must wait until the producer process has produced data Mutual Exclusion: Operating system has to ensure that a resource is used by only one process at a given time producer consumer join November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-3

A Producer-Consumer Example Producer tail head Consumer R tail R tail R head R Producer posting Item x: Load R tail, tail Store (R tail ), x R tail =R tail +1 Store tail, R tail The program is written assuming instructions are executed in order. Consumer: Load R head, head spin: Load R tail, tail if R head ==R tail goto spin Load R, (R head ) R head =R head +1 Store head, R head process(r) Problems? November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-4

A Producer-Consumer Example continued Producer posting Item x: Load R tail, (tail) 1 Store (R tail ), x R tail =R tail +1 2 Store tail, R tail Can the tail pointer get updated before the item x is stored? Consumer: Load R head, head spin: Load R tail, tail 3 if R head ==R tail goto spin Load R, (R head ) 4 R head =R head +1 Store head, R head process(r) Programmer assumes that if 3 happens after 2, then 4 happens after 1. Problem sequences are: 2, 3, 4, 1 4, 1, 2, 3 November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-5

Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-6

Sequential Consistency Sequential concurrent tasks: T1, T2 Shared variables: X, Y (initially X = 0, Y = 0) T1: T2: Store X, 1 (X = 1) Load R 1, Y Store Y, 2 (Y = 2) Store Y, R 1 (Y = Y) Load R 2, X Store X, R 2 (X = X) what are the legitimate answers for X and Y? (X,Y ) {(1,2), (0,0), (1,0), (0,2)}? If y is 2 then x cannot be 1 November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-7

Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example? additional SC requirements ( ) T1: T2: Store X, 1 (X = 1) Load R 1, Y Store Y, 2 (Y = 2) Store Y, R 1 (Y = Y) Load R 2, X Store X, R 2 (X = X) High-performance processor implementations often violate SC Example Store Buffer November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-8

Store Buffers A processor considers a Store to have been executed as soon as it is stored in the Store buffer, that is, before it is put in L1 Stores can be moved from the store buffer to L1 in a different order A load can read values from the local store buffer (forwarding) P Cache P Cache The net effect of store buffers is that Loads/Stores can appear to be ordered differently to different processors breaks SC Memory November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-9

Violations of SC Example 1 Process 1 Process 2 Store flag 1,1; Store flag 2,1; Load r 1, flag 2 ; Load r 2, flag 1 ; Question: Is it possible that r 1 =0 and r 2 =0? Sequential consistency: No Initially, all memory locations contain zeros Suppose Stores don t leave the store buffers before the Loads are executed: Yes! Total Store Order (TSO): IBM 370, Sparc s TSO memory model, x86 November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-10

Violations of SC Example 2: Non-FIFO Store buffers Process 1 Process 2 Store a, 1; Load r 1, flag; Store flag, 1; Load r 2, a; Question: Is it possible that r 1 =1 but r 2 =0? Sequential consistency: No With non-fifo store buffers: Yes Sparc s PSO memory model November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-11

Violations of SC Example 3: Non-Blocking Caches Process 1 Process 2 Store a, 1; Load r 1, flag; Store flag, 1; Load r 2, a; Question: Is it possible that r 1 =1 but r 2 =0? Sequential consistency: No Assuming stores are ordered: Yes because Loads can be reordered Sparc s RMO, PowerPC s WO, Alpha November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-12

Memory Model Issue Architectural optimizations that are correct for uniprocessors, often violate sequential consistency and result in a new memory model for multiprocessors Memory model issues are subtle and contentious because most ISA specifications (X86, ARM, PowerPC, Sparc, MIPS) are ambiguous For the rest of the lecture we will assume the architecture is SC and focus on synchronization issues November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-13

Multiple Consumer Example Producer tail head Consumer 1 R head R tail R R tail Consumer 2 R head R tail R Producer posting Item x: Load R tail, tail Store (R tail ), x R tail =R tail +1 Store tail, R tail Critical section: Needs to be executed atomically by one consumer locks Consumer: Load R head, head spin: Load R tail, tail if R head ==R tail goto spin Load R, (R head ) R head =R head +1 Store head, R head process(r) What is wrong with this code? November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-14

Locks or Semaphores E. W. Dijkstra, 1965 Process i lock(s) <critical section> unlock(s) The execution of the critical section is protected by lock s. Only one process can hold the lock. Suppose the lock s can have only two values: s=0 means that no process has the lock s=1 means that exactly one process has the lock and therefore can access the critical section Once a process successfully acquires a lock, it executes the critical section and then sets s to zero by executing unlock(s) Implementation of locks is quite difficult using just Loads and Stores. ISAs provide special atomic instructions to implement locks November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-15

atomic read-modify-write instructions m is a memory location, R is a register Test&Set m, R: R M[m]; if R==0 then M[m] 1; Location m can be set to one only if it contains a zero Swap m, R: R t M[m]; M[m] R; R R t ; Location m is first read and then set to the new value; the old value is returned in a register November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-16

Multiple Consumers Example using the Test&Set Instruction lock: spin: Test&Set mutex, R temp if (R temp =1) goto lock Load R head, head Load R tail, tail if R head ==R tail goto spin Load R, (R head ) R head =R head +1 Store head, R head unlock: Store mutex, 0 process(r) Critical Section What if the process stops or is swapped out while in the critical section? November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-17

Nonblocking Synchronization Load-reserve & Store-conditional Special register(s) to hold reservation flag and address, and the outcome of store-conditional Load-reserve R, m: <flag, adr> <1, m>; R M[m]; Store-conditional m, R: if <flag, adr> == <1, m> then cancel other procs reservation on m; M[m] R; status succeed; else status fail; try: spin: Load-reserve R head, head Load R tail, tail if R head ==R tail goto spin Load R, (R head ) R head = R head + 1 Store-conditional head, R head if (status==fail) goto try process(r) The corresponding instructions in RISC V are called lr and sc, respectively November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-18

Nonblocking Synchronization Load-reserve R, (m): <flag, adr> <1, m>; R M[m]; Store-conditional (m), R: if <flag, adr> == <1, m> then cancel other procs reservation on m; M[m] R; status succeed; else status fail; The flag is cleared in other processors on a Store using the CC protocol s invalidation mechanism Usually address m is not remembered by Load-reserve; the flag is cleared on any invalidation works as long as the Load-reserve instructions are not used in a nested manner These instructions won t work properly if Loads and Stores can be dynamically reordered November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-19

Memory Fences Instructions to sequentialize memory accesses Processors with weak or non-sequentially-consistent memory models need to provide memory fence instructions to force the serialization of memory accesses Producer posting Item x: Load R tail, (tail) Store (R tail ), x Membar SS R tail =R tail +1 Store tail, R tail ensures that tail ptr is not updated before x has been stored ensures that R is not loaded before x has been stored Consumer: Load R head, (head) spin: Load R tail, (tail) if R head ==R tail goto spin Membar LL Load R, (R head ) R head =R head +1 Store head, R head process(r) RISC-V has one instruction called fence November 23, 2015 http://www.csg.csail.mit.edu/6.175 L24-20