Implementing the C11 memory model for ARM processors. Will Deacon February 2015

Size: px
Start display at page:

Download "Implementing the C11 memory model for ARM processors. Will Deacon February 2015"

Transcription

1 1 Implementing the C11 memory model for ARM processors Will Deacon February 2015

2 Introduction 2 ARM ships intellectual property, specialising in RISC microprocessors Over 50 billion chips shipped, around 2.5 billion per quarter Upstream kernel developer at ARM, Cambridge (UK!) Enable new architectural features in Linux before silicon Influence future hardware designs with feedback and prototypes I m going to talk about memory models, which form a crucial part of low-level system architecture and are needed to ensure portability of high-level multi-threaded user code.

3 What is memory ordering? (1) We expect a single CPU, executing a single thread of execution to operate in program order. Easy to reason about but terribly slow! Prohibits common compiler transformations (e.g. hoisting) Forbids common hardware optimisations (e.g. store buffers and caches) Increases memory subsystem bottleneck Instead, allow the program to run out-of-order as long as the programmer can t tell. 3

4 What is memory ordering? (2) We can t have our cake and eat it. With multiple CPUs, we can observe many of the tricks being played on us! SB (Dekker's) - Initially: A = B = 0 p0 a: A = 1; b: C = B; p1 c: B = 1; d: D = A; Results (C, D) == (1, 1) (C, D) == (0, 1) (C, D) == (1, 0) (C, D) == (0, 0) Question: What can cause this apparent reordering in practice? 4

5 Store buffering Buffering allows a variable to be live in multiple locations at once. 5 The memory model defines the set of permitted behaviours that may be observed in a system. Interesting cases are expressed as litmus tests.

6 6 Sequential Consistency Sequential consistency is easy to reason about, as there is a single global ordering. Sequential Consistency (SC): A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. Leslie Lamport (1979) Question: How does this constrain SB?

7 7 SB+SC No valid interleaving for (C,D) == (0,0), therefore forbidden by SC p0 p1, Results (C,D) a: A = 1; b: C = B; c: B = 1; d: D = A; (a,b,c,d) : (0,1) (c,d,a,b) : (1,0) (a,c,b,d) : (1,1)... (b,c,d,a) : (0,0) (b,d,c,a) : (0,0) (b,d,a,c) : (0,0) (b,a,d,c) : (0,0) (d,c,b,a) : (0,0) (d,b,c,a) : (0,0) (d,b,a,c) : (0,0) (d,a,b,c) : (0,0)

8 8 C11 Threads cannot be implemented as a library. Hans Boehm [1] The C11 standard introduced native threads and atomic operations: Atomic types (atomic_int) defined in <stdatomic.h> Atomic operations such as atomic_compare_exchange, atomic_load Formal relations to describe ordering and data races This necessitates a memory model, the default behaviour being SC-DRF.

9 memory_order_* Maintaining SC is expensive! Atomic operations are parameterised by enum memory_order for finer-grained control: memory_order_seq_cst: sequential consistency (default) memory_order_acq_rel: RCpc [2] LOAD-acquire/STORE-release semantics memory_order_consume: data dependent acquisition [3] memory_order_relaxed: no inter-thread synchronisation This provides a portable mechanism to expose weakly-ordered hardware to applications for optimal performance. atomic_int foo;... return atomic_load_explicit(&foo, memory_order_acquire); 9

10 Relations The C11 memory model can be described by a series of relations: sequenced-before (sb) reads-from (rf) synchronizes-with (sw) happens-before (hb) and some complications thanks to consume (cad, dob, ithb) All writes to an atomic variable form part of a total modification order mo for that location, consistent with hb. We ll focus on SC and acquire/release operations, ignoring consume. 10

11 sequenced-before (sb) sb Describes intra-thread evaluation order and applies to operations on arbitrary types. static int x; static int y; /* The store to x is sequenced-before the store to y */ int main(void) { x = 1; y = 2; return 0; } Matches the single-threaded intuition already present in the language. 11

12 reads-from (rf) rf An operation reads a value written by another. Not strictly defined by the standard, but useful as a building block Can be applied to arbitrary types Can be applied between threads for atomic types No ordering implications on its own static atomic_int x; /* T1's load of x reads-from T0's store iff y == 1 */ void t0(void) { x = 1; } void t1(void) { int y = x;... } 12

13 synchronizes-with (sw) 13 sw Applies only to operations on atomic types and is defined differently for each family of memory orderings. If A and B are atomic operations of the specified memory order, then: SC: A sw B if A rf B acquire/release: A sw B if A rf B and A is a release and B is an acquire. or sw = Wsc rf rf Rsc Wrel Racq relaxed atomics do not participate in sw!

14 happens-before (hb) 14 hb An operation A happens-before B if A sb B or A sw B (consume adds complications). The relation is transitive, meaning that atomic variables can be used to stitch together thread-local code: hb = (sb sw) + A data race exists if a program contains two actions, at least one of which is a write and one of which is not atomic, in different threads on the same memory location, neither of which happens-before the other. A program exhibiting such a race has undefined behaviour (SC-DRF).

15 modification-order (mo) 15 mo The modification-order of an atomic variable indicates the sequence of visible side-effects (i.e. writes) to that variable. It is a single total order consistent with hb and is strictly per-location. static atomic_int x; /* MO of x is either {0, 1, 2} or {0, 2, 1} */ void t0(void) { x = 1; } void t1(void) { x = 2; } Additionally, there is a single total order sc on all SC operations that is consistent with hb and mo.

16 Formal tools Recall the SB example 16 int main() { atomic_int x=0; atomic_int y=0; {{{ { y.store(1,memory_order_seq_cst); r1=x.load(memory_order_seq_cst); } { x.store(1,memory_order_seq_cst); r2=y.load(memory_order_seq_cst); } }}} return 0; } We can feed this to a formal model and visualise the set of consistent executions.

17 CppMem 17 a:wna x=0 sb,hb b:wna y=0 rf mo,hb,sw mo hb,sw c:wsc y=1 sb,sc,hb d:rsc x=0 rf,hb,sw sc e:wsc x=1 sb,sc,hb f:rsc y=1

18 18 SB+acq+rel We can modify SB to use acquire/release: int main() { atomic_int x=0; atomic_int y=0; {{{ { y.store(1,memory_order_release); r1=x.load(memory_order_acquire); } { x.store(1,memory_order_release); r2=y.load(memory_order_acquire); } }}} return 0; } r1 == r2 == 0 is now permitted.

19 Acquire/release 19 a:wna x=0 sb,hb b:wna y=0 rf mo,hb,sw mo hb,sw c:wrel y=1 rf sb,hb d:racq x=0 e:wrel x=1 sb,hb f:racq y=0 Acquire/release in C11 is not SC!

20 20 Message Passing (1) Passing messages between a producer and a consumer thread is ideally suited to acquire/release: MP int main() { int x=0; atomic_int y=0; {{{ { x=1; y.store(1,memory_order_release); } { r1=y.load(memory_order_acquire).readsvalue(1); /* If we read y == 1 */ r2=x; } /* Then we must read x == 1 */ }}} return 0; } y is an atomic flag indicating the validity of the data x.

21 Message Passing (2) Only one consistent execution: a:wna x=0 sb,hb b:wna y=0 hb,sw mo sw c:wna x=1 sb,hb d:wrel y=1 rf rf,hb,sw e:racq y=1 sb,hb f:rna x=1 21 Question: What happens if e reads y == 0 instead?

22 22 Message Passing (3) Data race on x between c and f! a:wna x=0 sb,hb b:wna y=0 hb,sw mo rf,hb,sw rf c:wna x=1 e:racq y=0 sb,hb dr sb,hb d:wrel y=1 f:rna x=0 Intuition: don t read data if!valid.

23 23 Acquire/release instructions ARMv8 introduced native LDAR and STLR instructions: LDAR <Xt>, [<Xn>] ordered against subsequent accesses in program order STLR <Xt>, [<Xn>] ordered against prior accesses in program order and any prior observed writes A variation on MP, memory initialised to zero: LDR X0, [Xa, #4] ADD X0, X0, #1 STR X0, [Xa, #4] STLR #1, [Xa] Looks an awful lot like hb! 1: LDAR X0, [Xa] CBZ X0, 1b LDR X1, [Xa, #4] X1 == 1

24 24 SC acquire/release The fun doesn t stop here: STLR LDAR STLR LDAR globally observed in program order STLR is multi-copy atomic when observed by LDAR #1, [Xa] X0, [Xb] A variation on SB, memory initialised to zero: STLR #1, [Xb] LDAR X1, [Xa] X0 == X1 == 0 forbidden Unlike C11, provide SC when paired and map directly onto memory_order_seq_cst. Rsc Wsc Racq Wrel ARMv8 LDAR STLR LDAR STLR

25 Conclusion We ve only scratched the surface of the C11 and ARM memory models: Compound atomic operations (cmpxchg) memory_order_consume Explicit fences However, there is a deliberate mapping from ARMv8 to C11 SC and formal tools for the ARM model are under active development. 25

26 26 Thank You Hans-J. Boehm Threads Cannot be Implemented as a Library K. Gharachorloo Shared Memory Consistency Models: A Tutorial Paul McKenney et al. N4215: Towards Implementation and Use of memory_order_consume The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

27 27 Sequential Consistency (2) Program A B C A p0 Parallel Interleaving B p1 C p2

28 SC example SC is easy to reason about, as there is a single global ordering. This can be demonstrated by the IRIW litmus test: X = Y = 0 T0 X = 1 T1 Y = 1 T2 X2 = X Y2 = Y T3 Y3 = Y X3 = X (X2, Y2) = (1, 0) (X3, Y3) = (0, 1) not permitted (X2, Y2) = (0, 1) (X3, Y3) = (1, 0) not permitted SC actually forbids any reordering of reads and writes. 28

29 The ARM weak memory model The ARM architecture features a relatively weak memory model: No multi-copy atomicity requirement (unlike TSO) Arbitrary reordering of independent reads and writes Explicit barriers and instructions to enforce ordering System-level ordering (e.g. MMIO, Cache/TLB maintenance, ) Like many (all?) other architectures, the memory model is not formally defined and is driven by pragmatism rather than pure mathematics. 29

30 Observability 30 Ordering is defined in terms of observability by memory masters (observers). Writes A write to a location in memory is said to be observed by an observer when: (1) A subsequent read of the location by the same observer will return the value written by the observed write, or written by a write to that location by any observer that is sequenced in the coherence order of the location after the observed write and (2) A subsequent write of the location by the same observer will be sequenced in the coherence order of the location after the observed write This is actually pretty intuitive

31 Observability (2) 31 but reads are observable too! Reads A read of a location in memory is said to be observed by an observer when a subsequent write to the location by the same observer will have no effect on the value returned by the read. These definitions clearly have relations with rf and mo.

32 Global Observability and Completion A normal memory access is globally observed for a shareability domain when it is observed by all observers in that domain. n A table walk is complete for a shareability domain when its accesses are globally observed in that domain and the TLB is updated. n An access is complete for a shareability domain when it is globally observed in that domain and any table walks associated with it have completed in the same domain. Much more difficult to correlate with C11, which cares only about application-level ordering. n 32

33 Explicit barriers The ARM architecture defines three barrier instructions: ISB Pipeline flush and context synchronisation DMB <option> Ensure ordering of memory accesses DMB <option> Ensure completion of memory accesses The <option> argument specifies the required shareability domain (NSH, ISH, OSH, SY) and access type (ST). Defaults to full system, all access types if omitted. Userspace runs in the same inner-shareable domain. 33

34 Dependencies In the absence of explicit barriers, dependencies define observation order of normal memory accesses. 34 Address: value returned by a read is used to compute the address of a subsequent access. Control: value returned by a read is used to determine the condition flags and the flags are used in the condition code checking that determines the address of a subsequent access. Data: value returned by a read is used as data written by a subsequent write. There are also a few other rules (RaR, store speculation).

35 Dependency Examples 35 ldr r1, [r0, #4] and r1, #0xfff ldr r3, [r2, r1] ldr r1, [r0, #4] cmp r1, #1 addeq r2, #4 ldr r3, [r2] ldr r1, [r0, #4] add r1, #5 str r1, [r2] (address) (control) (data) Question: Which dependencies enforce ordering of observability?

36 Mapping to C11 Typically, architectures provide either stronger (x86) or weaker (PowerPC) guarantees than those required by the C11 relaxed memory models: Architecture SC Acq/rel Relaxed x86 ARMv7 = PowerPC = ia64 = ARMv8 = = Explicit fences are used to convert into. Rsc Wsc Racq Wrel ARMv7 LDR; DMB DMB; STR; DMB LDR; DMB DMB; STR ARMv8 LDAR STLR LDAR STLR 36

Relaxed Memory: The Specification Design Space

Relaxed Memory: The Specification Design Space Relaxed Memory: The Specification Design Space Mark Batty University of Cambridge Fortran meeting, Delft, 25 June 2013 1 An ideal specification Unambiguous Easy to understand Sound w.r.t. experimentally

More information

P1202R1: Asymmetric Fences

P1202R1: Asymmetric Fences Document number: P1202R1 Date: 2018-01-20 (pre-kona) Reply-to: David Goldblatt Audience: SG1 P1202R1: Asymmetric Fences Overview Some types of concurrent algorithms can be split

More information

ARMv8-A Memory Systems. Systems. Version 0.1. Version 1.0. Copyright 2016 ARM Limited or its affiliates. All rights reserved.

ARMv8-A Memory Systems. Systems. Version 0.1. Version 1.0. Copyright 2016 ARM Limited or its affiliates. All rights reserved. Connect ARMv8-A User Memory Guide Systems Version 0.1 Version 1.0 Page 1 of 17 Revision Information The following revisions have been made to this User Guide. Date Issue Confidentiality Change 28 February

More information

The C/C++ Memory Model: Overview and Formalization

The C/C++ Memory Model: Overview and Formalization The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions

More information

Memory Consistency Models. CSE 451 James Bornholt

Memory Consistency Models. CSE 451 James Bornholt Memory Consistency Models CSE 451 James Bornholt Memory consistency models The short version: Multiprocessors reorder memory operations in unintuitive, scary ways This behavior is necessary for performance

More information

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm Using Weakly Ordered C++ Atomics Correctly Hans-J. Boehm 1 Why atomics? Programs usually ensure that memory locations cannot be accessed by one thread while being written by another. No data races. Typically

More information

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

C11 Compiler Mappings: Exploration, Verification, and Counterexamples C11 Compiler Mappings: Exploration, Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu http://check.cs.princeton.edu November 22 nd, 2016 1 Compilers Must Uphold

More information

Overview: Memory Consistency

Overview: Memory Consistency Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering

More information

Multicore Programming: C++0x

Multicore Programming: C++0x p. 1 Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 p. 2 C++0x: the next C++ Specified by the

More information

C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar

C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar C++ 11 Memory Gerstenberg NUMA Seminar Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion

More information

C++ Concurrency - Formalised

C++ Concurrency - Formalised C++ Concurrency - Formalised Salomon Sickert Technische Universität München 26 th April 2013 Mutex Algorithms At most one thread is in the critical section at any time. 2 / 35 Dekker s Mutex Algorithm

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect

More information

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016 weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb Sutter s talks atomic Weapons: The C++ Memory Model and Modern Hardware Lock-Free Programming (or, Juggling Razor Blades)

More information

GPU Concurrency: Weak Behaviours and Programming Assumptions

GPU Concurrency: Weak Behaviours and Programming Assumptions GPU Concurrency: Weak Behaviours and Programming Assumptions Jyh-Jing Hwang, Yiren(Max) Lu 03/02/2017 Outline 1. Introduction 2. Weak behaviors examples 3. Test methodology 4. Proposed memory model 5.

More information

Program logics for relaxed consistency

Program logics for relaxed consistency Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 1st Lecture, 28 July 2014 Outline Part I. Weak memory models 1. Intro

More information

Shared Memory Consistency Models: A Tutorial

Shared Memory Consistency Models: A Tutorial Shared Memory Consistency Models: A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The

More information

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth Unit 12: Memory Consistency Models Includes slides originally developed by Prof. Amir Roth 1 Example #1 int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 int t1 = x;! x = 1;! int t2 = y;! print(t1,t2)!

More information

Memory Consistency Models

Memory Consistency Models Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt

More information

Portland State University ECE 588/688. Memory Consistency Models

Portland State University ECE 588/688. Memory Consistency Models Portland State University ECE 588/688 Memory Consistency Models Copyright by Alaa Alameldeen 2018 Memory Consistency Models Formal specification of how the memory system will appear to the programmer Places

More information

Declarative semantics for concurrency. 28 August 2017

Declarative semantics for concurrency. 28 August 2017 Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017 An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program

More information

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency

More information

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency Shared Memory Consistency Models Authors : Sarita.V.Adve and Kourosh Gharachorloo Presented by Arrvindh Shriraman Motivations Programmer is required to reason about consistency to ensure data race conditions

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

RELAXED CONSISTENCY 1

RELAXED CONSISTENCY 1 RELAXED CONSISTENCY 1 RELAXED CONSISTENCY Relaxed Consistency is a catch-all term for any MCM weaker than TSO GPUs have relaxed consistency (probably) 2 XC AXIOMS TABLE 5.5: XC Ordering Rules. An X Denotes

More information

An introduction to weak memory consistency and the out-of-thin-air problem

An introduction to weak memory consistency and the out-of-thin-air problem An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) CONCUR, 7 September 2017 Sequential consistency 2 Sequential

More information

Taming release-acquire consistency

Taming release-acquire consistency Taming release-acquire consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) POPL 2016 Weak memory models Weak memory models provide formal sound semantics

More information

Memory Consistency Models

Memory Consistency Models Calcolatori Elettronici e Sistemi Operativi Memory Consistency Models Sources of out-of-order memory accesses... Compiler optimizations Store buffers FIFOs for uncommitted writes Invalidate queues (for

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs

More information

High-level languages

High-level languages High-level languages High-level languages are not immune to these problems. Actually, the situation is even worse: the source language typically operates over mixed-size values (multi-word and bitfield);

More information

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics Shared Memory Programming with OpenMP Lecture 8: Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the

More information

Reasoning about the C/C++ weak memory model

Reasoning about the C/C++ weak memory model Reasoning about the C/C++ weak memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 13 October 2014 Talk outline I. Introduction Weak memory models The C11 concurrency model

More information

Hardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13

Hardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13 Hardware models: inventing a usable abstraction for Power/ARM 1 Hardware models: inventing a usable abstraction for Power/ARM Disclaimer: 1. ARM MM is analogous to Power MM all this is your next phone!

More information

Lowering C11 Atomics for ARM in LLVM

Lowering C11 Atomics for ARM in LLVM 1 Lowering C11 Atomics for ARM in LLVM Reinoud Elhorst Abstract This report explores the way LLVM generates the memory barriers needed to support the C11/C++11 atomics for ARM. I measure the influence

More information

Load-reserve / Store-conditional on POWER and ARM

Load-reserve / Store-conditional on POWER and ARM Load-reserve / Store-conditional on POWER and ARM Peter Sewell (slides from Susmit Sarkar) 1 University of Cambridge June 2012 Correct implementations of C/C++ on hardware Can it be done?...on highly relaxed

More information

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem. Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which

More information

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important

More information

Coherence and Consistency

Coherence and Consistency Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.

More information

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this

More information

Heterogeneous-Race-Free Memory Models

Heterogeneous-Race-Free Memory Models Heterogeneous-Race-Free Memory Models Jyh-Jing (JJ) Hwang, Yiren (Max) Lu 02/28/2017 1 Outline 1. Background 2. HRF-direct 3. HRF-indirect 4. Experiments 2 Data Race Condition op1 op2 write read 3 Sequential

More information

C++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model

C++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model C++ Memory Model (mkempf@hsr.ch) December 26, 2012 Abstract Multi-threaded programming is increasingly important. We need parallel programs to take advantage of multi-core processors and those are likely

More information

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05 New Programming Abstractions for Concurrency Red Hat 12/04/05 1 Concurrency and atomicity C++11 atomic types Transactional Memory Provide atomicity for concurrent accesses by different threads Both based

More information

Distributed Operating Systems Memory Consistency

Distributed Operating Systems Memory Consistency Faculty of Computer Science Institute for System Architecture, Operating Systems Group Distributed Operating Systems Memory Consistency Marcus Völp (slides Julian Stecklina, Marcus Völp) SS2014 Concurrent

More information

C++ Memory Model. Don t believe everything you read (from shared memory)

C++ Memory Model. Don t believe everything you read (from shared memory) C++ Memory Model Don t believe everything you read (from shared memory) The Plan Why multithreading is hard Warm-up example Sequential Consistency Races and fences The happens-before relation The DRF guarantee

More information

Can Seqlocks Get Along with Programming Language Memory Models?

Can Seqlocks Get Along with Programming Language Memory Models? Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. Boehm HP Labs Hans-J. Boehm: Seqlocks 1 The setting Want fast reader-writer locks Locking in shared (read) mode allows concurrent

More information

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CS510 Advanced Topics in Concurrency. Jonathan Walpole CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to

More information

The C1x and C++11 concurrency model

The C1x and C++11 concurrency model The C1x and C++11 concurrency model Mark Batty University of Cambridge January 16, 2013 C11 and C++11 Memory Model A DRF model with the option to expose relaxed behaviour in exchange for high performance.

More information

Shared Memory Consistency Models: A Tutorial

Shared Memory Consistency Models: A Tutorial Shared Memory Consistency Models: A Tutorial By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster Contents Overview Uniprocessor Review Sequential Consistency Relaxed

More information

HSA MEMORY MODEL HOT CHIPS TUTORIAL - AUGUST 2013 BENEDICT R GASTER

HSA MEMORY MODEL HOT CHIPS TUTORIAL - AUGUST 2013 BENEDICT R GASTER HSA MEMORY MODEL HOT CHIPS TUTORIAL - AUGUST 2013 BENEDICT R GASTER WWW.QUALCOMM.COM OUTLINE HSA Memory Model OpenCL 2.0 Has a memory model too Obstruction-free bounded deques An example using the HSA

More information

Programming Language Memory Models: What do Shared Variables Mean?

Programming Language Memory Models: What do Shared Variables Mean? Programming Language Memory Models: What do Shared Variables Mean? Hans-J. Boehm 10/25/2010 1 Disclaimers: This is an overview talk. Much of this work was done by others or jointly. I m relying particularly

More information

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05 New Programming Abstractions for Concurrency in GCC 4.7 Red Hat 12/04/05 1 Concurrency and atomicity C++11 atomic types Transactional Memory Provide atomicity for concurrent accesses by different threads

More information

Memory Consistency Models: Convergence At Last!

Memory Consistency Models: Convergence At Last! Memory Consistency Models: Convergence At Last! Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign sadve@cs.uiuc.edu Acks: Co-authors: Mark Hill, Kourosh Gharachorloo,

More information

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Module 15: Memory Consistency Models Lecture 34: Sequential Consistency and Relaxed Models Memory Consistency Models. Memory consistency Memory Consistency Models Memory consistency SC SC in MIPS R10000 Relaxed models Total store ordering PC and PSO TSO, PC, PSO Weak ordering (WO) [From Chapters 9 and 11 of Culler, Singh, Gupta] [Additional

More information

Understanding POWER multiprocessors

Understanding POWER multiprocessors Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2,3 Luc Maranget 3 Derek Williams 4 1 University of Cambridge 2 Oxford University 3 INRIA 4 IBM June 2011 Programming shared-memory

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency

More information

Overhauling SC atomics in C11 and OpenCL

Overhauling SC atomics in C11 and OpenCL Overhauling SC atomics in C11 and OpenCL John Wickerson, Mark Batty, and Alastair F. Donaldson Imperial Concurrency Workshop July 2015 TL;DR The rules for sequentially-consistent atomic operations and

More information

Memory barriers in C

Memory barriers in C Memory barriers in C Sergey Vojtovich Software Engineer @ MariaDB Foundation * * Agenda Normal: overview, problem, Relaxed Advanced: Acquire, Release Nightmare: Acquire_release, Consume Hell: Sequentially

More information

Relaxed Memory Consistency

Relaxed Memory Consistency Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Sequential Consistency & TSO. Subtitle

Sequential Consistency & TSO. Subtitle Sequential Consistency & TSO Subtitle Core C1 Core C2 data = 0, 1lag SET S1: store data = NEW S2: store 1lag = SET L1: load r1 = 1lag B1: if (r1 SET) goto L1 L2: load r2 = data; Will r2 always be set to

More information

Memory Models for C/C++ Programmers

Memory Models for C/C++ Programmers Memory Models for C/C++ Programmers arxiv:1803.04432v1 [cs.dc] 12 Mar 2018 Manuel Pöter Jesper Larsson Träff Research Group Parallel Computing Faculty of Informatics, Institute of Computer Engineering

More information

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes. Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some

More information

CS5460: Operating Systems

CS5460: Operating Systems CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that

More information

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models Lecture 13: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models 1 Coherence Vs. Consistency Recall that coherence guarantees

More information

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

COMP Parallel Computing. CC-NUMA (2) Memory Consistency COMP 633 - Parallel Computing Lecture 11 September 26, 2017 Memory Consistency Reading Patterson & Hennesey, Computer Architecture (2 nd Ed.) secn 8.6 a condensed treatment of consistency models Coherence

More information

Formal Specification of RISC-V Systems Instructions

Formal Specification of RISC-V Systems Instructions Formal Specification of RISC-V Systems Instructions Arvind Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan Computer Science and Artificial Intelligence Lab. MIT RISC-V Workshop, MIT,

More information

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

Beyond Sequential Consistency: Relaxed Memory Models

Beyond Sequential Consistency: Relaxed Memory Models 1 Beyond Sequential Consistency: Relaxed Memory Models Computer Science and Artificial Intelligence Lab M.I.T. Based on the material prepared by and Krste Asanovic 2 Beyond Sequential Consistency: Relaxed

More information

Other consistency models

Other consistency models Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012

More information

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November

More information

Using Relaxed Consistency Models

Using Relaxed Consistency Models Using Relaxed Consistency Models CS&G discuss relaxed consistency models from two standpoints. The system specification, which tells how a consistency model works and what guarantees of ordering it provides.

More information

Multiprocessor Synchronization

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory

More information

Advanced OpenMP. Memory model, flush and atomics

Advanced OpenMP. Memory model, flush and atomics Advanced OpenMP Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the source code. Compilers, processors

More information

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

SELECTED TOPICS IN COHERENCE AND CONSISTENCY SELECTED TOPICS IN COHERENCE AND CONSISTENCY Michel Dubois Ming-Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA90089-2562 dubois@usc.edu INTRODUCTION IN CHIP

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit

More information

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014 Weak memory models Mai Thuong Tran PMA Group, University of Oslo, Norway 31 Oct. 2014 Overview 1 Introduction Hardware architectures Compiler optimizations Sequential consistency 2 Weak memory models TSO

More information

Audience. Revising the Java Thread/Memory Model. Java Thread Specification. Revising the Thread Spec. Proposed Changes. When s the JSR?

Audience. Revising the Java Thread/Memory Model. Java Thread Specification. Revising the Thread Spec. Proposed Changes. When s the JSR? Audience Revising the Java Thread/Memory Model See http://www.cs.umd.edu/~pugh/java/memorymodel for more information 1 This will be an advanced talk Helpful if you ve been aware of the discussion, have

More information

Overhauling SC atomics in C11 and OpenCL

Overhauling SC atomics in C11 and OpenCL Overhauling SC atomics in C11 and OpenCL Mark Batty, Alastair F. Donaldson and John Wickerson INCITS/ISO/IEC 9899-2011[2012] (ISO/IEC 9899-2011, IDT) Provisional Specification Information technology Programming

More information

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

More information

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models Relaxed Memory-Consistency Models [ 9.1] In small multiprocessors, sequential consistency can be implemented relatively easily. However, this is not true for large multiprocessors. Why? This is not the

More information

The Java Memory Model

The Java Memory Model The Java Memory Model The meaning of concurrency in Java Bartosz Milewski Plan of the talk Motivating example Sequential consistency Data races The DRF guarantee Causality Out-of-thin-air guarantee Implementation

More information

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University 740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess

More information

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L17: Memory Model Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements HW4 / Lab4 1 Overview Symmetric Multi-Processors (SMPs) MIMD processing cores

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

Repairing Sequential Consistency in C/C++11

Repairing Sequential Consistency in C/C++11 Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon Kang Seoul National University, Korea jeehoon.kang@sf.snu.ac.kr

More information

G52CON: Concepts of Concurrency

G52CON: Concepts of Concurrency G52CON: Concepts of Concurrency Lecture 6: Algorithms for Mutual Natasha Alechina School of Computer Science nza@cs.nott.ac.uk Outline of this lecture mutual exclusion with standard instructions example:

More information

Hardware Memory Models: x86-tso

Hardware Memory Models: x86-tso Hardware Memory Models: x86-tso John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 9 20 September 2016 Agenda So far hardware organization multithreading

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA TriCheck: Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel, Yatin A. Manerkar, Daniel Lustig*, Michael Pellauer*, Margaret Martonosi Princeton University *NVIDIA ASPLOS 2017

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Sequential Consistency for Heterogeneous-Race-Free

Sequential Consistency for Heterogeneous-Race-Free Sequential Consistency for Heterogeneous-Race-Free DEREK R. HOWER, BRADFORD M. BECKMANN, BENEDICT R. GASTER, BLAKE A. HECHTMAN, MARK D. HILL, STEVEN K. REINHARDT, DAVID A. WOOD JUNE 12, 2013 EXECUTIVE

More information

Repairing Sequential Consistency in C/C++11

Repairing Sequential Consistency in C/C++11 Technical Report MPI-SWS-2016-011, November 2016 Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon

More information

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important Outline ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Memory Consistency Models Copyright 2006 Daniel J. Sorin Duke University Slides are derived from work by Sarita

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252

More information

The C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting

The C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting The C++ Memory Model Rainer Grimm Training, Coaching and Technology Consulting www.grimm-jaud.de Multithreading with C++ C++'s answers to the requirements of the multicore architectures. A well defined

More information

Topic C Memory Models

Topic C Memory Models Memory Memory Non- Topic C Memory CPEG852 Spring 2014 Guang R. Gao CPEG 852 Memory Advance 1 / 29 Memory 1 Memory Memory Non- 2 Non- CPEG 852 Memory Advance 2 / 29 Memory Memory Memory Non- Introduction:

More information

SharedArrayBuffer and Atomics Stage 2.95 to Stage 3

SharedArrayBuffer and Atomics Stage 2.95 to Stage 3 SharedArrayBuffer and Atomics Stage 2.95 to Stage 3 Shu-yu Guo Lars Hansen Mozilla November 30, 2016 What We Have Consensus On TC39 agreed on Stage 2.95, July 2016 Agents API (frozen) What We Have Consensus

More information

Language- Level Memory Models

Language- Level Memory Models Language- Level Memory Models A Bit of History Here is a new JMM [5]! 2000 Meyers & Alexandrescu DCL is not portable in C++ [3]. Manson et. al New shiny C++ memory model 2004 2008 2012 2002 2006 2010 2014

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

Example: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54

Example: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54 Example: The Dekker Algorithm on SMP Systems Memory Consistency The Dekker Algorithm 43 / 54 Using Memory Barriers: the Dekker Algorithm Mutual exclusion of two processes with busy waiting. //flag[] is

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information