Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Similar documents
Symmetric Multiprocessors: Synchronization and Sequential Consistency

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Overview: Memory Consistency

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Coherence and Consistency

Relaxed Memory Consistency

Memory Consistency Models. CSE 451 James Bornholt

Relaxed Memory-Consistency Models

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

Portland State University ECE 588/688. Memory Consistency Models

Using Relaxed Consistency Models

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Advanced Operating Systems (CS 202)

Beyond Sequential Consistency: Relaxed Memory Models

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Memory Consistency Models

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Distributed Operating Systems Memory Consistency

Shared Memory Consistency Models: A Tutorial

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Sequential Consistency & TSO. Subtitle

Computer Science 146. Computer Architecture

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Other consistency models

Systèmes d Exploitation Avancés

CS533 Concepts of Operating Systems. Jonathan Walpole

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Program logics for relaxed consistency

Release Consistency. Draft material for 3rd edition of Distributed Systems Concepts and Design

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models

Foundations of the C++ Concurrency Memory Model

Lecture 13: Memory Consistency. + course-so-far review (if time) Parallel Computer Architecture and Programming CMU /15-618, Spring 2017

Multiplying Performance. Application Domains for Multiprocessors. Multicore: Mainstream Multiprocessors. Identifying Parallelism

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Operating Systems. Synchronization

Lecture: Consistency Models, TM

The Cache-Coherence Problem

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

CS5460: Operating Systems

Thread-Level Parallelism. Shared Memory. This Unit: Shared Memory Multiprocessors. U. Wisconsin CS/ECE 752 Advanced Computer Architecture I

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Synchronization. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

EE382 Processor Design. Processor Issues for MP

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Multiprocessor Synchronization

Lock-Free, Wait-Free and Multi-core Programming. Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2014

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

CS 152, Spring 2012 Section 11

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

Shared Memory Consistency Models: A Tutorial

Memory Consistency Models

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

GPU Concurrency: Weak Behaviours and Programming Assumptions

RELAXED CONSISTENCY 1

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Bus-Based Coherent Multiprocessors

C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

Chapter 2. OS Overview

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE502: Computer Architecture CSE 502: Computer Architecture

The Java Memory Model

Wednesday, October 15, 14. Functions

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Relaxed Memory-Consistency Models

Consistency & Coherence. 4/14/2016 Sec5on 12 Colin Schmidt

Multiprocessors and Locking

Lecture 11 Reducing Cache Misses. Computer Architectures S

C++ Memory Model. Don t believe everything you read (from shared memory)

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Software-Controlled Multithreading Using Informing Memory Operations

Hardware Memory Models: x86-tso

CS252 Graduate Computer Architecture Fall 2015 Lecture 14: Synchroniza>on and Memory Models

Replication and Consistency. Fall 2010 Jussi Kangasharju

Cache Coherence and Atomic Operations in Hardware

CSE 506: Opera.ng Systems Memory Consistency

Reasoning about the C/C++ weak memory model

Administrivia. p. 1/20

This Unit: Shared Memory Multiprocessors. CIS 501 Computer Architecture. Beyond Implicit Parallelism. Readings

Atomic Instructions! Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University. P&H Chapter 2.11

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Lecture 4: Instruction Set Design/Pipelining

Role of Synchronization. CS 258 Parallel Computer Architecture Lecture 23. Hardware-Software Trade-offs in Synchronization and Data Layout

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Taming release-acquire consistency

A Memory Model for RISC-V

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 19 Summary

Potential violations of Serializability: Example 1

Improving Cache Performance. Reducing Misses. How To Reduce Misses? 3Cs Absolute Miss Rate. 1. Reduce the miss rate, Classifying Misses: 3 Cs

Relaxed Memory: The Specification Design Space

Dr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

Transcription:

Unit 12: Memory Consistency Models Includes slides originally developed by Prof. Amir Roth 1

Example #1 int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 int t1 = x;! x = 1;! int t2 = y;! print(t1,t2)! What outcomes are allowed? 2

Example #2 (Somewhat More Real) int A = 0;! int flag = 0;! thread 1 thread 2 A = 1;! while (flag == 0) {! flag = 1;! // do nothing! print(a);! What outcomes are allowed? 3

Example #3 int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 x = 1;! print(x);! print(y);! What outcomes are allowed? 4

Example #4 (Double Checked Locking) int* ptr = NULL;! int val = 0;! thread 1 if (ptr == NULL) {! acquire(lock);! if (ptr == NULL) {! val = 1;! ptr = &val;! release(lock);! print(*ptr);! What outcomes are allowed? thread 2 if (ptr == NULL) {! acquire(lock);! if (ptr == NULL) {! val = 1;! ptr = &val;! release(lock);! print(*ptr);! 5

6

Example #4 (Double Checked Locking) int* ptr = NULL;! int val = 0;! thread 1 if (ptr == NULL) {! acquire(lock);! if (ptr == NULL) {! val = 1;! ptr = &val;! release(lock);! print(*ptr);! What outcomes are allowed? thread 2 if (ptr == NULL) {! acquire(lock);! if (ptr == NULL) {! val = 1;! ptr = &val;! release(lock);! print(*ptr);! 7

Example #5 int x = 0;! int y = 0;! thread 1 if (x!= 0) {! thread 2 y = 2;! y = 1;! print(y);! What outcomes are allowed? 8

Example #6 int x = 0;! int y = 0;! thread 1 thread 2 y = ((x!= 0)? 1 : y)! y = 2;! print(y);! What outcomes are allowed? 9

What is Going On? Memory reordering In the compiler Compiler is generally allowed to re-order memory operations to different addresses Many other compiler optimizations also cause problems In the hardware To tolerate write latency Processes don t wait for writes to complete And why should they? No reason on a uniprocessors To simplify out-of-order execution 10

Memory Consistency Memory coherence Creates globally uniform (consistent) view Of a single memory location (in other words: cache line) Not enough Cache lines A and B can be individually consistent But inconsistent with respect to each other Memory consistency Creates globally uniform (consistent) view Of all memory locations relative to each other Who cares? Programmers Globally inconsistent memory creates mystifying behavior

Coherence vs. Consistency Processor 0 A=1; flag=1; A=0 flag=0 Intuition says: P1 prints A=1 Coherence says: absolutely nothing P1 can see P0 s write of flag before write of A!!! How? P0 has a coalescing store buffer that reorders writes Or out-of-order execution Or compiler re-orders instructions Processor 1 while (!flag); // spin print A; Imagine trying to figure out why this code sometimes works and sometimes doesn t Real systems act in this strange manner What is allowed is defined as part of the ISA of the processor

Store Buffers & Consistency Processor 0 A=1; flag=1; Consider the following execution: Processor 0 s write to A, misses the cache. Put in store buffer Processor 0 keeps going Processor 0 write 1 to flag hits, completes Processor 1 reads flag sees the value 1 Processor 1 exits loop A=0 flag=0 Processor 1 prints 0 for A Ramification: store buffers can cause strange behavior How strange depends on lots of things Processor 1 while (!flag); // spin print A;

Hardware Memory Consistency Models Sequential consistency (SC) (MIPS, PA-RISC) Formal definition of memory view programmers expect Processors see their own loads and stores in program order + Provided naturally, even with out-of-order execution But also: processors see others loads and stores in program order And finally: all processors see same global load/store ordering Last two conditions not naturally enforced by coherence Corresponds to some sequential interleaving of uniprocessor orders Indistinguishable from multi-programmed uni-processor Processor consistency (PC) (x86, SPARC) Allows a in-order store buffer Stores can be deferred, but must be put into the cache in order Release consistency (RC) (ARM, Itanium, PowerPC) Allows an un-ordered store buffer Stores can be put into cache in any order. Loads re-ordered, too.

Reordering Loads and Stores int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 int t1 = x;! x = 1;! int t2 = y;! print(t1,t2)! Allowed by some processors today PowerPC, ARM, Itanium 15

Delaying Stores (Past Loads) int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 x = 1;! print(x);! print(y);! Allowed by most hardware today PowerPC, ARM, Itanium Plus: SPARC TSO, Intel/AMD x86 16

Restoring Order Sometimes we need ordering (mostly we don t) Prime example: ordering between lock and data How? insert Fences (memory barriers) Special instructions, part of ISA Example Ensure that loads/stores don t cross lock acquire/release operation acquire fence critical section fence release How do fences work? They stall execution until write buffers are empty Makes lock acquisition and release slow(er) Use synchronization library, don t write your own