Using Relaxed Consistency Models

Similar documents
Relaxed Memory-Consistency Models

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Relaxed Memory-Consistency Models

Relaxed Memory Consistency

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Beyond Sequential Consistency: Relaxed Memory Models

Portland State University ECE 588/688. Memory Consistency Models

Relaxed Memory-Consistency Models

Lecture 11: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Lecture 12: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Symmetric Multiprocessors: Synchronization and Sequential Consistency

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Overview: Memory Consistency

Distributed Operating Systems Memory Consistency

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Shared Memory Consistency Models: A Tutorial

Advanced Operating Systems (CS 202)

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Memory Consistency Models

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

CS533 Concepts of Operating Systems. Jonathan Walpole

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Shared Memory Consistency Models: A Tutorial

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Parallel access to linked data structures

Distributed Shared Memory and Memory Consistency Models

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

RELAXED CONSISTENCY 1

Foundations of the C++ Concurrency Memory Model

Memory Consistency Models. CSE 451 James Bornholt

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

The need for atomicity This code sequence illustrates the need for atomicity. Explain.

Memory Consistency Models

GPU Concurrency: Weak Behaviours and Programming Assumptions

Hardware Memory Models: x86-tso

Consistency & Coherence. 4/14/2016 Sec5on 12 Colin Schmidt

The Java Memory Model

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Other consistency models

Administrivia. p. 1/20

Shared Memory Consistency Models: A Tutorial

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Audience. Revising the Java Thread/Memory Model. Java Thread Specification. Revising the Thread Spec. Proposed Changes. When s the JSR?

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

ECE/CS 757: Advanced Computer Architecture II

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

CSE 506: Opera.ng Systems Memory Consistency

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

SHARED-MEMORY COMMUNICATION

Systèmes d Exploitation Avancés

Sequential Consistency & TSO. Subtitle

C++ Memory Model. Don t believe everything you read (from shared memory)

Consistency Models. ECE/CS 757: Advanced Computer Architecture II. Readings. ECE 752: Advanced Computer Architecture I 1. Instructor:Mikko H Lipasti

Memory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Role of Synchronization. CS 258 Parallel Computer Architecture Lecture 23. Hardware-Software Trade-offs in Synchronization and Data Layout

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

CONSISTENCY MODELS IN DISTRIBUTED SHARED MEMORY SYSTEMS

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Page 1. Cache Coherence

19 File Structure, Disk Scheduling

G52CON: Concepts of Concurrency

CS252 Graduate Computer Architecture Fall 2015 Lecture 14: Synchroniza>on and Memory Models

Taming release-acquire consistency

EE382 Processor Design. Processor Issues for MP

The memory consistency. model of a system. affects performance, programmability, and. portability. This article. describes several models

CSE502: Computer Architecture CSE 502: Computer Architecture

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

Problem Set 5 Solutions CS152 Fall 2016

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

CS5460: Operating Systems

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Need for synchronization: If threads comprise parts of our software systems, then they must communicate.

The MESI State Transition Graph

Consistency: Relaxed. SWE 622, Spring 2017 Distributed Software Engineering

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2014

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lock-Free, Wait-Free and Multi-core Programming. Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap

Lecture: Consistency Models, TM

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Lecture 19: Coherence and Synchronization. Topics: synchronization primitives (Sections )

Lecture #7: Implementing Mutual Exclusion

Recent Advances in Memory Consistency Models for Hardware Shared Memory Systems

Chapter 6. Parallel Processors from Client to Cloud Part 2 COMPUTER ORGANIZATION AND DESIGN. Homogeneous & Heterogeneous Multicore Architectures

Linearizability of Persistent Memory Objects

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Is coherence everything?

P1202R1: Asymmetric Fences

Transcription:

Using Relaxed Consistency Models CS&G discuss relaxed consistency models from two standpoints. The system specification, which tells how a consistency model works and what guarantees of ordering it provides. The programmer s interface, which tells what code needs to be written to invoke operations provided by the system specification. The system specification [ 9.1.1] CS&G divide relaxed consistency models into three classes based on the kinds of reorderings they allow. Write read reordering. These models only allow a write to bypass (complete before) an incomplete earlier read. Examples: Total store ordering, processor consistency Write write reordering. These models also allow writes to bypass previous writes. Example: Partial store ordering. All reorderings. These models also allow reads and writes to bypass previous reads. Example: Weak ordering, release consistency. Write read reordering Models that allow reads to bypass pending writes are helpful because they help hide the latency of write operations. While a write-miss is still in the write buffer, and not yet visible to other processors, the processor can perform reads that hit in its cache, or a single read that misses in its cache. These models require minimal changes to programs. For example, no change is needed to the code for spinning on a flag: A = 1; flag = 1; while (flag == 0); print A; Lecture 23 Architecture of Parallel Computers 1

(In this and later code fragments, we assume all variables are initialized to 0.) This is because the model does not permit writes to be reordered. In particular, the write to will not be allowed to complete out of order with respect to the write to flag. What would write read reordering models guarantee about this fragment? A = 1; B = 1; print B; print A; Recall from Lecture 22 that processor consistency does not require that all processors see writes in the same order if those writes come from different processors. Under processor consistency (PC), what value will be printed by the following code? P 3 A = 1; while (A==0); while (B==0); B = 1; print A; However, this problem does not occur with total store ordering (TSO). TSO provides Store atomicity there is a global order defined between all writes. A processor s own reads and writes are ordered with respect to itself. X = 3; print X will print 3. 2003 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2003 2

The processor issuing the write may observe it sooner then the other processors. Special membar instructions are used to impose global ordering between write read. Now consider this code fragment. 1a. A = 1; 1b. print B; 2a. B = 1; 2b. print A; With sequential consistency, what will be printed? How do we know this? Might something else happen under PC? What about under TSO? To guarantee SC semantics when desired, we need to use the membar (or fence ) instruction. This instruction prevents a following read from completing before previous writes have completed. If an architecture doesn t have a membar instruction, an atomic readmodify-write operation can be substituted for a read. Why do we know this will work? Such an instruction is provided on the Sun Sparc V9. Write write reordering What is the advantage of allowing write write reordering? Well, the write buffer can merge or retire writes before previous writes in program order complete. Lecture 23 Architecture of Parallel Computers 3

What would it mean to merge two writes? Multiple write misses (to different locations) can thus become fully overlapped and visible out of program order. What happens to our first code fragment under this model? 1a. A = 1 2a. while (flag == 0); 1b. B = 1 2b. u = A 1c. flag = 1 2c. v = B In order to make the model work in general, we need an instruction that enforces write-to-write ordering when necessary. On the Sun Sparc V9, the membar instruction has a write-to-write flavor that can be turned on to provide this. Where would we need to insert this instruction in the code fragment above? Relaxing all program orders Models in this class don t guarantee that anything becomes visible to other processors in program order. This allows multiple read and write requests to be outstanding at the same time and complete out of order. This allows read latency to be hidden as well as write latency. (In earlier models, writes couldn t bypass earlier reads, so everything in a process had to wait for a pending read to finish.) These models are the only ones that allow optimizing compilers to use many key reorderings and eliminate unnecessary accesses. Two models, and three implementations, fall into this category. Weak ordering 2003 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2003 4

Release consistency Digital Alpha implementation Sparc V9 Relaxed Memory Ordering (RMO) IBM PowerPC implementation Exercise: What categories do the other models from Lecture 22 fit into? (Causal consistency, PRAM consistency, and processor consistency.) We have already seen the two models in Lecture 22. To get a better appreciation for the differences between the two, let s consider the two examples from CS&G. Here is a piece of code that links a new task onto the head of a doubly linked list. It may be executed in parallel by all processors, so some sort of concurrency control is needed. lock(taskq); newtask next = head; if (head!= NULL) head prev = newtask; head = newtask; unlock(taskq); In this code, taskq serves as a synchronization variable. If accesses to it are kept in order, and accesses by a single processor are kept in program order, then it will serve to prevent two processes from manipulating the queue simultaneously. This example uses two flag variables to implement a producer/consumer interaction. Lecture 23 Architecture of Parallel Computers 5

TOP: while (flag2 == 0); A = 1; u = B; v = C; D = B * C; flag2 = 0; flag1 = 1; goto TOP; Which shared variables are produced by P 1? Which shared variables are produced by P 2? What are the synchronization variables here? TOP: while (flag1 == 0); x = A; y = D; B = 3; C = D / B; flag1 = 0; flag2 = 1; goto TOP; Again, as long as one access to them completes throughout the system before the next one starts, other reads and writes can complete out of order, and the program will still work. The diagram below (similar to the one at the end of Lecture 22) illustrates the difference between weak ordering and release consistency. Each block with reads and writes represents a run of nonsynchronization operations from a single processor. With weak ordering, before a synchronization operation, the processor waits for all previous Also, a synchronization operation has to complete before later reads and writes can be issued. Release consistency distinguishes between an acquire performed to gain access to a critical section, and a release performed to leave a critical section. An acquire can be issued before previous reads (in block 1) complete. 2003 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2003 6

After a release is issued, instructions in block 3 can be issued without waiting for it to complete. 1 1 acquire (read) sync 2 2 sync release (write) 3 3 Release consistency Weak ordering Modern processors provide instructions that can be used to implement these two models. The Alpha architecture supports two fence instructions. The memory barrier (MB) operation is like a synch operation in WO. The write memory barrier (WMB) operation imposes program order only between writes. A read issued after a WMB can bypass (complete before) a write issued before a WMB. The Sparc V9 relaxed memory order (RMO) provides an membar (fence) with four flavor bits. Lecture 23 Architecture of Parallel Computers 7

Each flavor bit guarantees enforcement between a certain combination of previous & following memory operations. read to read read to write write to read write to write The PowerPC provides only a single sync (fence) instruction; it can only implement WO. The programmer s interface When writing code for a system that uses a relaxed memoryconsistency model, the programmer must label synchronization operations. This can be done using system-specified programming primitives such as locks and barriers. For example, how are lock operations implemented in release consistency? A lock operation translates to An unlock operation translates to Arrival at a barrier indicates that previous operations have completed, so it is Leaving a barrier indicates that new accesses may begin, so it is What about accesses to ordinary variables, like the flag variables earlier in this lecture? Sometimes the programmer will just know how they should be labeled. But if not, here s how to determine it. 1. Decide which operations are conflicting. Two memory operations from different processes conflict if they access the same memory location and at least one of them is a write. 2003 Edward F. Gehringer CSC/ECE 506 Lecture Notes, Spring 2003 8

2. Decide which conflicting operations are competing. Two conflicting operations from different processes compete if they could appear next to each other in a sequentially consistent total order. This means that one could immediately follow another with no intervening memory operations on shared data. 3. Label all competing memory operations as synchronization operations of the appropriate type. Notice that we can decide where synchronization operations are needed by assuming SC, even if we are not using an SC memory model. To see how this is done, consider our first code fragment again. 1a. A = 1 2a. while (flag == 0); 1b. B = 1 2b. u = A 1c. flag = 1 2c. v = B Which operations are conflicting? Which operations are competing? Which operations need to be labeled as synchronization operations? How should these operations be labeled in weak ordering? How should these operations be labeled in release consistency? Lecture 23 Architecture of Parallel Computers 9