Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Similar documents
Lecture 12: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Lecture 11: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Lecture: Consistency Models, TM

Portland State University ECE 588/688. Memory Consistency Models

Shared Memory Consistency Models: A Tutorial

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Overview: Memory Consistency

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Relaxed Memory-Consistency Models

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Memory Consistency Models

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Lecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing

Memory Consistency Models. CSE 451 James Bornholt

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

CS533 Concepts of Operating Systems. Jonathan Walpole

Lecture 26: Multiprocessors. Today s topics: Synchronization Consistency Shared memory vs message-passing

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Using Relaxed Consistency Models

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Shared Memory Consistency Models: A Tutorial

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Beyond Sequential Consistency: Relaxed Memory Models

Shared Memory Consistency Models: A Tutorial

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Advanced Operating Systems (CS 202)

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Relaxed Memory Consistency

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Distributed Operating Systems Memory Consistency

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

A Unified Formalization of Four Shared-Memory Models

Multiprocessor Synchronization

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Computer Architecture

Lecture 25: Multiprocessors

EE382 Processor Design. Processor Issues for MP

Sequential Consistency & TSO. Subtitle

Lecture 17: Transactional Memories I

CSE502: Computer Architecture CSE 502: Computer Architecture

Page 1. Cache Coherence

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Administrivia. p. 1/20

Relaxed Memory-Consistency Models

Lecture: Transactional Memory. Topics: TM implementations

The Cache-Coherence Problem

Topic C Memory Models

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections )

Handout 3 Multiprocessor and thread level parallelism

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Foundations of the C++ Concurrency Memory Model

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Distributed Shared Memory and Memory Consistency Models

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Lecture 24: Virtual Memory, Multiprocessors

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

RELAXED CONSISTENCY 1

Lecture 19: Coherence and Synchronization. Topics: synchronization primitives (Sections )

ECE/CS 757: Advanced Computer Architecture II

The memory consistency. model of a system. affects performance, programmability, and. portability. This article. describes several models

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Efficient Sequential Consistency via Conflict Ordering

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Shared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16

Computer Science 146. Computer Architecture

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Recent Advances in Memory Consistency Models for Hardware Shared Memory Systems

EECS 570 Final Exam - SOLUTIONS Winter 2015

Lecture 2: Parallel Programs. Topics: consistency, parallel applications, parallelization process

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2014

Lecture 33: Multiprocessors Synchronization and Consistency Professor Randy H. Katz Computer Science 252 Spring 1996

CSE 506: Opera.ng Systems Memory Consistency

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model

Transcription:

Lecture 13: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models 1

Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors, and (ii) write serialization (all processors see writes to the same location in the same order) The consistency model defines the ordering of writes and reads to different memory locations the hardware guarantees a certain consistency model and the programmer attempts to write correct programs with those assumptions 2

Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section critical section P1 P2 Data = 2000 while (Head == 0) Head = 1 { } = Data Initially, A = B = 0 P1 P2 P3 A = 1 if (A == 1) B = 1 if (B == 1) register = A 3

Consistency Example - I Consider a multiprocessor with bus-based snooping cache coherence and a write buffer between CPU and cache Initially A = B = 0 P1 P2 A 1 B 1 if (B == 0) if (A == 0) Crit.Section Crit.Section The programmer expected the above code to implement a lock because of write buffering, both processors can enter the critical section The consistency model lets the programmer know what assumptions they can make about the hardware s reordering capabilities 4

Consistency Example - 2 P1 P2 Data = 2000 while (Head == 0) { } Head = 1 = Data Sequential consistency requires program order -- the write to Data has to complete before the write to Head can begin -- the read of Head has to complete before the read of Data can begin 5

Consistency Example - 3 P1 P2 P3 P4 A = 1 A = 2 while (B!= 1) { } while (B!= 1) { } B = 1 C = 1 while (C!= 1) { } while (C!= 1) { } register1 = A register2 = A register1 and register2 having different values is a violation of sequential consistency possible if updates to A appear in different orders Cache coherence guarantees write serialization to a single memory location 6

Consistency Example - 4 Initially, A = B = 0 P1 P2 P3 A = 1 if (A == 1) B = 1 if (B == 1) register = A Sequential consistency can be had if a process makes sure that everyone has seen an update before that value is read else, write atomicity is violated 7

Implementing Atomic Updates The above problem can be eliminated by not allowing a read to proceed unless all processors have seen the last update to that location Easy in an invalidate-based system: memory will not service the request unless it has received acks from all processors In an update-based system: a second set of messages is sent to all processors informing them that all acks have been received; reads cannot be serviced until the processor gets the second message 8

Sequential Consistency A multiprocessor is sequentially consistent if the result of the execution is achieveable by maintaining program order within a processor and interleaving accesses by different processors in an arbitrary fashion The multiprocessors in the previous examples are not sequentially consistent Can implement sequential consistency by requiring the following: program order, write serialization, everyone has seen an update before a value is read very intuitive for the programmer, but extremely slow 9

Performance Optimizations Program order is a major constraint the following try to get around this constraint without violating seq. consistency if a write has been stalled, prefetch the block in exclusive state to reduce traffic when the write happens allow out-of-order reads with the facility to rollback if the ROB detects a violation Get rid of sequential consistency in the common case and employ relaxed consistency models if one really needs sequential consistency in key areas, insert fence instructions between memory operations 10

Relaxed Consistency Models We want an intuitive programming model (such as sequential consistency) and we want high performance We care about data races and re-ordering constraints for some parts of the program and not for others hence, we will relax some of the constraints for sequential consistency for most of the program, but enforce them for specific portions of the code Fence instructions are special instructions that require all previous memory accesses to complete before proceeding (sequential consistency) 11

Potential Relaxations Program Order: (all refer to different memory locations) Write to Read program order Write to Write program order Read to Read and Read to Write program orders Write Atomicity: (refers to same memory location) Read others write early Write Atomicity and Program Order: Read own write early 12

Relaxations Relaxation W R Order W W Order R RW Order Rd others Wr early Rd own Wr early IBM 370 X TSO X X PC X X X SC X IBM 370: a read can complete before an earlier write to a different address, but a read cannot return the value of a write unless all processors have seen the write SPARC V8 Total Store Ordering (TSO): a read can complete before an earlier write to a different address, but a read cannot return the value of a write by another processor unless all processors have seen the write (it returns the value of own write before others see it) Processor Consistency (PC): a read can complete before an earlier write (by any processor to any memory location) has been made visible to all 13

Safety Nets To explicitly enforce sequential consistency, safety nets or fence instructions can be used Note that read-modify-write operations can double up as fence instructions replacing the read or write with a r-m-w effectively achieves sequential consistency the read and write of the r-m-w can have no intervening operations and successive reads or successive writes must be ordered in some of the memory models 14

Release Consistency RCsc relaxes constraints similar to WO, while RCpc also allows reading others writes early More distinctions among memory operations RCsc maintains SC between special, while RCpc maintains PC between special ops RCsc maintains orders: acquire all, all release, special special RCpc maintains orders: acquire all, all release, special special, except for sp.wr followed by sp.rd shared special sync nsync acquire release ordinary 15

Performance Comparison Taken from Gharachorloo, Gupta, Hennessy, ASPLOS 91 Studies three benchmark programs and three different architectures: MP3D: 3-D particle simulator LU: LU-decomposition for dense matrices PTHOR: logic simulator LFC: aggressive; lockup-free caches, write buffer with bypassing RDBYP: only write buffer with bypassing BASIC: no write buffer, no lockup-free caches 16

Performance Comparison 17

Summary Sequential Consistency restricts performance (even more when memory and network latencies increase relative to processor speeds) Relaxed memory models relax different combinations of the five constraints for SC Most commercial systems are not sequentially consistent and rely on the programmer to insert appropriate fence instructions to provide the illusion of SC 18

Title Bullet 19