Lecture 12: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Similar documents
Lecture 11: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Portland State University ECE 588/688. Memory Consistency Models

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Shared Memory Consistency Models: A Tutorial

Using Relaxed Consistency Models

Relaxed Memory-Consistency Models

Overview: Memory Consistency

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Beyond Sequential Consistency: Relaxed Memory Models

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Distributed Operating Systems Memory Consistency

Relaxed Memory Consistency

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Memory Consistency Models. CSE 451 James Bornholt

Relaxed Memory-Consistency Models

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Shared Memory Consistency Models: A Tutorial

Administrivia. p. 1/20

Memory Consistency Models

Topic C Memory Models

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Advanced Operating Systems (CS 202)

The memory consistency. model of a system. affects performance, programmability, and. portability. This article. describes several models

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Shared Memory Consistency Models: A Tutorial

CS533 Concepts of Operating Systems. Jonathan Walpole

Lecture: Consistency Models, TM

Recent Advances in Memory Consistency Models for Hardware Shared Memory Systems

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

RELAXED CONSISTENCY 1

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan

Foundations of the C++ Concurrency Memory Model

Program logics for relaxed consistency

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

A Unified Formalization of Four Shared-Memory Models

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors

CSE502: Computer Architecture CSE 502: Computer Architecture

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Sequential Consistency & TSO. Subtitle

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

The Cache-Coherence Problem

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

EE382 Processor Design. Processor Issues for MP

An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Computer Architecture

Combined Performance Gains of Simple Cache Protocol Extensions

Handout 3 Multiprocessor and thread level parallelism

Linearizability of Persistent Memory Objects

Multiprocessor Synchronization

ECE/CS 757: Advanced Computer Architecture II

The C/C++ Memory Model: Overview and Formalization

ECE 669 Parallel Computer Architecture

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Computer Architecture

Release Consistency. Draft material for 3rd edition of Distributed Systems Concepts and Design

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

Distributed Shared Memory and Memory Consistency Models

CSE 506: Opera.ng Systems Memory Consistency

Hardware Memory Models: x86-tso

Consistency Models. ECE/CS 757: Advanced Computer Architecture II. Readings. ECE 752: Advanced Computer Architecture I 1. Instructor:Mikko H Lipasti

CONSISTENCY MODELS IN DISTRIBUTED SHARED MEMORY SYSTEMS

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Shared Memory Systems

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Can Seqlocks Get Along with Programming Language Memory Models?

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

Page 1. Cache Coherence

A a shared-memory multiprocessor system is a formal. A Unified Formalization of Four Shared-Memory Models

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Taming release-acquire consistency

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Linearizability of Persistent Memory Objects

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Memory Consistency and Multiprocessor Performance

Distributed Shared Memory Consistency Object-based Model

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model

Instruction Level Parallelism (ILP)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Memory Consistency and Multiprocessor Performance. Adapted from UCB CS252 S01, Copyright 2001 USB

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2014

3. Memory Persistency Goals. 2. Background

Lecture 19: Coherence and Synchronization. Topics: synchronization primitives (Sections )

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

GPU Concurrency: Weak Behaviours and Programming Assumptions

Shared Memory Consistency Models: A Tutorial

Transcription:

Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison 1

Relaxed Memory Models Recall that sequential consistency has two requirements: program order and write atomicity Different consistency models can be defined by relaxing some of the above constraints this can improve performance, but the programmer must have a good understanding of the program and the hardware 2

Potential Relaxations Program Order: (all refer to different memory locations) Write to Read program order Write to Write program order Read to Read and Read to Write program orders Write Atomicity: (refers to same memory location) Read others write early Write Atomicity and Program Order: Read own write early 3

Write Read Program Order Consider three example implementations that relax the write to read program order: IBM 370: a read can complete before an earlier write to a different address, but a read cannot return the value of a write unless all processors have seen the write SPARC V8 Total Store Ordering (TSO): a read can complete before an earlier write to a different address, but a read cannot return the value of a write by another processor unless all processors have seen the write (it returns the value of own write before others see it) Processor Consistency (PC): a read can complete before an earlier write (by any processor to any memory location) has been made visible to all 4

Relaxations Relaxation W R Order W W Order R RW Order Rd others Wr early Rd own Wr early IBM 370 TSO PC IBM 370: a read can complete before an earlier write to a different address, but a read cannot return the value of a write unless all processors have seen the write SPARC V8 Total Store Ordering (TSO): a read can complete before an earlier write to a different address, but a read cannot return the value of a write by another processor unless all processors have seen the write (it returns the value of own write before others see it) Processor Consistency (PC): a read can complete before an earlier write (by any processor to any memory location) has been made visible to all 5

Examples Initially, A=Flag1=Flag2=0 Initially, A=B=0 P1 P2 P1 P2 P3 Flag1=1 Flag2=1 A=1 A=1 A=2 if (A==1) register1=a register3=a B=1 register2=flag2 register4=flag1 Result: reg1=1;reg3=2;reg2=reg4=0 Result: B=1,reg1=0 if (B==1) register1=a Relaxation W R Order W W Order R RW Order Rd others Wr early Rd own Wr early IBM 370 TSO PC 6

Safety Nets To explicitly enforce sequential consistency, safety nets or fence instructions can be used Note that read-modify-write operations can double up as fence instructions replacing the read or write with a r-m-w effectively achieves sequential consistency the read and write of the r-m-w can have no intervening operations and successive reads or successive writes must be ordered in some of the memory models 7

Optimizations Enabled W R : takes writes off the critical path W W: memory parallelism (bandwidth utilization) R WR: non-blocking caches, overlaps other useful work with a read miss 8

Weak Ordering An example of a model that relaxes all of the above constraints (except reading others write early) Operations are classified as data and synchronization A counter tracks the number of outstanding data operations and does not issue a synchronization until the counter is zero; data ops cannot begin unless the previous synchronization op has completed 9

Release Consistency RCsc relaxes constraints similar to WO, while RCpc also allows reading others writes early More distinctions among memory operations RCsc maintains SC between special, while RCpc maintains PC between special ops RCsc maintains orders: acquire all, all release, special special RCpc maintains orders: acquire all, all release, special special, except for sp.wr followed by sp.rd shared special sync nsync acquire release ordinary 10

Programmer Viewpoint Weak ordering will yield high performance, but the programmer has to identify data and synch operations An operation is defined as a synch operation if it forms a race with another operation in any seq. consistent execution Given a seq. consistent execution, an operation forms a race with another operation if the two operations access the same location, at least one of them is a write, and there are no other intervening operations between them P1 P2 Data = 2000 while (Head == 0) { } Head = 1 = Data 11

Performance Comparison Taken from Gharachorloo, Gupta, Hennessy, ASPLOS 91 Studies three benchmark programs and three different architectures: MP3D: 3-D particle simulator LU: LU-decomposition for dense matrices PTHOR: logic simulator LFC: aggressive; lockup-free caches, write buffer with bypassing RDBYP: only write buffer with bypassing BASIC: no write buffer, no lockup-free caches 12

Performance Comparison 13

Summary Sequential Consistency restricts performance (even more when memory and network latencies increase relative to processor speeds) Relaxed memory models relax different combinations of the five constraints for SC Most commercial systems are not sequentially consistent and rely on the programmer to insert appropriate fence instructions to provide the illusion of SC 14

Title Bullet 15