COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Similar documents
Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

An introduction to weak memory consistency and the out-of-thin-air problem

Overview: Memory Consistency

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Memory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Taming release-acquire consistency

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Replication of Data. Data-Centric Consistency Models. Reliability vs. Availability

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Relaxed Memory Consistency

Relaxed Memory-Consistency Models

Symmetric Multiprocessors: Synchronization and Sequential Consistency

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

Distributed Shared Memory

Memory Consistency Models

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

G52CON: Concepts of Concurrency

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Replication. Consistency models. Replica placement Distribution protocols

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Transaction Management: Introduction (Chap. 16)

Declarative semantics for concurrency. 28 August 2017

Multiple-Writer Distributed Memory

Memory Consistency Models

Today CSCI Data Replication. Example: Distributed Shared Memory. Data Replication. Data Consistency. Instructor: Abhishek Chandra

Using Relaxed Consistency Models

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

1. Memory technology & Hierarchy

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Synchronization. Before We Begin. Synchronization. Credit/Debit Problem: Race Condition. CSE 120: Principles of Operating Systems.

Computer Architecture

Relaxed Memory-Consistency Models

Distributed Systems. Lec 11: Consistency Models. Slide acks: Jinyang Li, Robert Morris

Shared Memory. SMP Architectures and Programming

CS533 Concepts of Operating Systems. Jonathan Walpole

Systèmes d Exploitation Avancés

Distributed Systems (5DV147)

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Consistency and Replication

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory.

DISTRIBUTED COMPUTER SYSTEMS

Concurrency control (1)

Computer Science 146. Computer Architecture

CSE 153 Design of Operating Systems

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Concept of a process

CSE Traditional Operating Systems deal with typical system software designed to be:

The New Java Technology Memory Model

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS5460: Operating Systems

Coherence and Consistency

CS 347 Parallel and Distributed Data Processing

Sequential Consistency & TSO. Subtitle

Synchronization. Before We Begin. Synchronization. Example of a Race Condition. CSE 120: Principles of Operating Systems. Lecture 4.

Introduction to OS Synchronization MOS 2.3

Distributed Memory and Cache Consistency. (some slides courtesy of Alvin Lebeck)

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Is coherence everything?

Last time: Saw how to implement atomicity in the presence of crashes, but only one transaction running at a time

Distributed Shared Memory and Memory Consistency Models

Portland State University ECE 588/688. Memory Consistency Models

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Multiprocessors and Locking

The Cache-Coherence Problem

Reminder from last time

CS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Dan Grossman Spring 2011

M4 Parallelism. Implementation of Locks Cache Coherence

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CSE 120: Principles of Operating Systems. Lecture 4. Synchronization. October 7, Prof. Joe Pasquale

Replication and Consistency

Handout 3 Multiprocessor and thread level parallelism

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Memory Consistency Models. CSE 451 James Bornholt

Concurrency Control & Recovery

The C/C++ Memory Model: Overview and Formalization

SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Lecture 12: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Administrivia. Review: Thread package API. Program B. Program A. Program C. Correct answers. Please ask questions on Google group

Lecture 11: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

CSE332: Data Abstractions Lecture 25: Deadlocks and Additional Concurrency Issues. Tyler Robison Summer 2010

What is consistency?

Lecture: Consistency Models, TM

Consistency and Replication. Why replicate?

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

Review of last lecture. Goals of this lecture. DPHPC Overview. Lock-based queue. Lock-based queue

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Multiprocessor Synchronization

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Transcription:

COMP 633 - Parallel Computing Lecture 11 September 26, 2017 Memory Consistency Reading Patterson & Hennesey, Computer Architecture (2 nd Ed.) secn 8.6 a condensed treatment of consistency models

Coherence and Consistency Memory coherence behavior of a single memory location M viewed from one or more processors informally all writes to M are seen in the same order by all processors Memory consistency behavior of multiple memory locations read and written by multiple processors viewed from one or more processors informally concerned with the order in which writes on different locations may be seen 2

Coherence of memory location x Defined by three properties (assume x = 0 initially) time (a) P 1 : W(x,1) 1 = R(x) (b) P 1 : P 2 : no intervening write of x by P 1 or other processor W(x,1) sufficiently large interval and no other write of x 1 = R(x) (c) P 1 : W(x,1) a = R(x) P 2 : W(x,2) a = R(x) P 3 : a = R(x) a {1,2} sufficiently large interval and no other writes of x 3

Consistency Models The consistency problem Performance motivates replication Keep data in caches close to processors Replication of read-only blocks is easy No consistency problem Replication of read/write blocks is hard What are the semantics of overlapping read and write operations? What are the semantics of overlapping write operations? Fundamental trade-offs Programmer-friendly models perform poorly 4

Consistency Models The importance of a memory consistency model initially A = B = 0 P1 P2 A := 1; B := 1; if (B == 0) if (A == 0)... P1 wins... P2 wins P1 and P2 may both win in some consistency models! Violates our (simplistic) mental model of the order of events Some consistency models Strict consistency Sequential consistency Processor consistency Release consistency 5

Strict Consistency Uniprocessor memory semantics Any read of memory location x returns the value stored by the most recent write operation to x Natural, simple to program P 1 : W(x, 1) P 2 : 1 = R(x) P 1 : W(x, 1) P 2 : 0 = R(x) 1 = R(x) Strictly Consistent Non-Strictly Consistent 6

Strict Consistency Implementable in a real system? Requires... absolute measure of time (i.e., global time) slow operation else violation of theory of relativity! Write Remote Memory P 1 P 2 (1 km apart) Read (1 m apart) Claim: Not what we really wanted (or needed) in the first place! Bad to have correctness depend on relative execution speeds 7

Sequential Consistency Mapping concurrent operations into a single total ordering The result of any execution is the same as if the operations of each processor were performed in sequential order and are interleaved in some fashion to define the total order P 1 : W(x, 1) P 1 : W(x, 1) P 2 : 0 = R(x) 1 = R(x) P 2 : 1 = R(x) 1 = R(x) Both executions are sequentially consistent 8

Sequential Consistency: Example Earlier in time does not imply earlier in the merged sequence is the following sequence of observations sequentially consistent? what is the value of y? P 1 : W(x, 1)? = R(y) P 2 : W(y, 2) P 3 : 2 = R(y) 0 = R(x) 1 = R(x) 9

Processor Consistency Concurrent writes by different processors on different variables may be observed in different orders there may not be a single total order of operations observed by all processors Writes from a given processor are seen in the same order at all other processors writes on a processor are pipelined P 1 : W(x, 1) 0 = R(y) 1 = R(y) P 2 : W(y,1) 0 = R(x) 1 = R(x) P 3 : 1 = R(x) 0 = R(y) 1 = R(y) P 4 : 0 = R(x) 1 = R(y) 1 = R(x) 10

Processor consistency Typical level of consistency found in shared memory multiprocessors insufficient to ensure correct operation of many programs Ex: Peterson s mutual exclusion algorithm program mutex var enter1, enter2 : Boolean; turn: Integer process P1 repeat forever enter1 := true turn := 2 while enter2 and turn=2 do skip end... critical section... enter1 := false... non-critical section... end repeat end P1; process P2 repeat forever enter2 := true turn := 1 while enter1 and turn=1 do skip end... critical section... enter2 := false... non-critical section... end repeat end P2; begin enter1, enter2, turn := false, false, 1 cobegin P1 P2 coend end 11

Weak Consistency Observation memory fence if all memory operations up to a checkpoint are known to have completed, the detailed completion order may not be of importance defining a checkpoint a synchronizing operation S issued by processor P i e.g. acquiring a lock, passing a barrier, or being released from a condition wait delays P i until all outstanding memory operations from P i have been completed in other processors Execution rules synchronizing operations exhibit sequential consistency a synchronizing operation is a memory fence if P i and P j are synchronized then all memory operations in P i complete before any memory operations in P j can start 12

Weak Consistency: Examples P 1 : W(x, 1) W(y, 2) S P 2 : 1 = R(x) 0 = R(y) S 1 = R(x), 2 = R(y) P 3 : 0 = R(x) 2 = R(y) S 1 = R(x), 2 = R(y) Weakly consistent P 1 : W(x, 1) W(x, 2) S P 2 : S 1 = R(x) Not weakly consistent 13

Memory consistency: processor-centric definition A memory consistency model defines which orderings of memory-references made by a processor are preserved for external observers Reference order defined by Instruction order Reference type {R,W} or synchronizing operation (S) location referenced {a,b} A memory consistency model preserves some of the reference orders Sequential Consistency (SC), Processor consistency = Total store ordering (TSO), Partial store ordering (PSO), weak consistency reference Consistency Model order a = b a b (coherence) SC TSO PSO weak Ra Rb * * * Ra Wb * * * * Wa Wb * * * Wa Rb * *?a S?b * * * * * 14

Consistency models: ordering of writes Sequential consistency all processors see all writes in the same order Processor consistency All processors see writes from a given processor in the order they were performed (TSO) or in some unknown but fixed order (PSO) writes from different processors may be observed in varying interleavings at different processors Weak consistency All processors see same state only after explicit synchronization 15

Memory consistency: Summary Memory consistency contract between parallel programmer and parallel processor regarding observable order of memory operations with multiple processors and shared memory, more opportunities to observe behavior therefore more complex contracts Where is memory consistency critical? fine-grained parallel programs in a shared memory concurrent garbage collection avoiding race conditions: Java instance constructors constructing high-level synchronization primitives wait-free and lock-free programs 17

Memory consistency: Summary Why memory consistency contracts are difficult to use What memory references does a program perform? Need to understand the output of optimizing compilers In what order may they be observed? Need to understand the memory consistency model How can we construct a correct parallel programs that accommodate these possibilities? Need deep thought and formal methods What is a parallel programmer to do, then? Use higher-level concurrency constructs such as loop-level parallelization and synchronized methods (Java) the synchronization inherent in these constructs enables weak consistency models to be used Use machines that provide sequential consistency Increasingly hard to find Leave fine-grained unsynchronized memory interaction to the pros 18