Implementing the C11 memory model for ARM processors. Will Deacon February 2015
|
|
- Whitney Harrison
- 6 years ago
- Views:
Transcription
1 1 Implementing the C11 memory model for ARM processors Will Deacon February 2015
2 Introduction 2 ARM ships intellectual property, specialising in RISC microprocessors Over 50 billion chips shipped, around 2.5 billion per quarter Upstream kernel developer at ARM, Cambridge (UK!) Enable new architectural features in Linux before silicon Influence future hardware designs with feedback and prototypes I m going to talk about memory models, which form a crucial part of low-level system architecture and are needed to ensure portability of high-level multi-threaded user code.
3 What is memory ordering? (1) We expect a single CPU, executing a single thread of execution to operate in program order. Easy to reason about but terribly slow! Prohibits common compiler transformations (e.g. hoisting) Forbids common hardware optimisations (e.g. store buffers and caches) Increases memory subsystem bottleneck Instead, allow the program to run out-of-order as long as the programmer can t tell. 3
4 What is memory ordering? (2) We can t have our cake and eat it. With multiple CPUs, we can observe many of the tricks being played on us! SB (Dekker's) - Initially: A = B = 0 p0 a: A = 1; b: C = B; p1 c: B = 1; d: D = A; Results (C, D) == (1, 1) (C, D) == (0, 1) (C, D) == (1, 0) (C, D) == (0, 0) Question: What can cause this apparent reordering in practice? 4
5 Store buffering Buffering allows a variable to be live in multiple locations at once. 5 The memory model defines the set of permitted behaviours that may be observed in a system. Interesting cases are expressed as litmus tests.
6 6 Sequential Consistency Sequential consistency is easy to reason about, as there is a single global ordering. Sequential Consistency (SC): A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. Leslie Lamport (1979) Question: How does this constrain SB?
7 7 SB+SC No valid interleaving for (C,D) == (0,0), therefore forbidden by SC p0 p1, Results (C,D) a: A = 1; b: C = B; c: B = 1; d: D = A; (a,b,c,d) : (0,1) (c,d,a,b) : (1,0) (a,c,b,d) : (1,1)... (b,c,d,a) : (0,0) (b,d,c,a) : (0,0) (b,d,a,c) : (0,0) (b,a,d,c) : (0,0) (d,c,b,a) : (0,0) (d,b,c,a) : (0,0) (d,b,a,c) : (0,0) (d,a,b,c) : (0,0)
8 8 C11 Threads cannot be implemented as a library. Hans Boehm [1] The C11 standard introduced native threads and atomic operations: Atomic types (atomic_int) defined in <stdatomic.h> Atomic operations such as atomic_compare_exchange, atomic_load Formal relations to describe ordering and data races This necessitates a memory model, the default behaviour being SC-DRF.
9 memory_order_* Maintaining SC is expensive! Atomic operations are parameterised by enum memory_order for finer-grained control: memory_order_seq_cst: sequential consistency (default) memory_order_acq_rel: RCpc [2] LOAD-acquire/STORE-release semantics memory_order_consume: data dependent acquisition [3] memory_order_relaxed: no inter-thread synchronisation This provides a portable mechanism to expose weakly-ordered hardware to applications for optimal performance. atomic_int foo;... return atomic_load_explicit(&foo, memory_order_acquire); 9
10 Relations The C11 memory model can be described by a series of relations: sequenced-before (sb) reads-from (rf) synchronizes-with (sw) happens-before (hb) and some complications thanks to consume (cad, dob, ithb) All writes to an atomic variable form part of a total modification order mo for that location, consistent with hb. We ll focus on SC and acquire/release operations, ignoring consume. 10
11 sequenced-before (sb) sb Describes intra-thread evaluation order and applies to operations on arbitrary types. static int x; static int y; /* The store to x is sequenced-before the store to y */ int main(void) { x = 1; y = 2; return 0; } Matches the single-threaded intuition already present in the language. 11
12 reads-from (rf) rf An operation reads a value written by another. Not strictly defined by the standard, but useful as a building block Can be applied to arbitrary types Can be applied between threads for atomic types No ordering implications on its own static atomic_int x; /* T1's load of x reads-from T0's store iff y == 1 */ void t0(void) { x = 1; } void t1(void) { int y = x;... } 12
13 synchronizes-with (sw) 13 sw Applies only to operations on atomic types and is defined differently for each family of memory orderings. If A and B are atomic operations of the specified memory order, then: SC: A sw B if A rf B acquire/release: A sw B if A rf B and A is a release and B is an acquire. or sw = Wsc rf rf Rsc Wrel Racq relaxed atomics do not participate in sw!
14 happens-before (hb) 14 hb An operation A happens-before B if A sb B or A sw B (consume adds complications). The relation is transitive, meaning that atomic variables can be used to stitch together thread-local code: hb = (sb sw) + A data race exists if a program contains two actions, at least one of which is a write and one of which is not atomic, in different threads on the same memory location, neither of which happens-before the other. A program exhibiting such a race has undefined behaviour (SC-DRF).
15 modification-order (mo) 15 mo The modification-order of an atomic variable indicates the sequence of visible side-effects (i.e. writes) to that variable. It is a single total order consistent with hb and is strictly per-location. static atomic_int x; /* MO of x is either {0, 1, 2} or {0, 2, 1} */ void t0(void) { x = 1; } void t1(void) { x = 2; } Additionally, there is a single total order sc on all SC operations that is consistent with hb and mo.
16 Formal tools Recall the SB example 16 int main() { atomic_int x=0; atomic_int y=0; {{{ { y.store(1,memory_order_seq_cst); r1=x.load(memory_order_seq_cst); } { x.store(1,memory_order_seq_cst); r2=y.load(memory_order_seq_cst); } }}} return 0; } We can feed this to a formal model and visualise the set of consistent executions.
17 CppMem 17 a:wna x=0 sb,hb b:wna y=0 rf mo,hb,sw mo hb,sw c:wsc y=1 sb,sc,hb d:rsc x=0 rf,hb,sw sc e:wsc x=1 sb,sc,hb f:rsc y=1
18 18 SB+acq+rel We can modify SB to use acquire/release: int main() { atomic_int x=0; atomic_int y=0; {{{ { y.store(1,memory_order_release); r1=x.load(memory_order_acquire); } { x.store(1,memory_order_release); r2=y.load(memory_order_acquire); } }}} return 0; } r1 == r2 == 0 is now permitted.
19 Acquire/release 19 a:wna x=0 sb,hb b:wna y=0 rf mo,hb,sw mo hb,sw c:wrel y=1 rf sb,hb d:racq x=0 e:wrel x=1 sb,hb f:racq y=0 Acquire/release in C11 is not SC!
20 20 Message Passing (1) Passing messages between a producer and a consumer thread is ideally suited to acquire/release: MP int main() { int x=0; atomic_int y=0; {{{ { x=1; y.store(1,memory_order_release); } { r1=y.load(memory_order_acquire).readsvalue(1); /* If we read y == 1 */ r2=x; } /* Then we must read x == 1 */ }}} return 0; } y is an atomic flag indicating the validity of the data x.
21 Message Passing (2) Only one consistent execution: a:wna x=0 sb,hb b:wna y=0 hb,sw mo sw c:wna x=1 sb,hb d:wrel y=1 rf rf,hb,sw e:racq y=1 sb,hb f:rna x=1 21 Question: What happens if e reads y == 0 instead?
22 22 Message Passing (3) Data race on x between c and f! a:wna x=0 sb,hb b:wna y=0 hb,sw mo rf,hb,sw rf c:wna x=1 e:racq y=0 sb,hb dr sb,hb d:wrel y=1 f:rna x=0 Intuition: don t read data if!valid.
23 23 Acquire/release instructions ARMv8 introduced native LDAR and STLR instructions: LDAR <Xt>, [<Xn>] ordered against subsequent accesses in program order STLR <Xt>, [<Xn>] ordered against prior accesses in program order and any prior observed writes A variation on MP, memory initialised to zero: LDR X0, [Xa, #4] ADD X0, X0, #1 STR X0, [Xa, #4] STLR #1, [Xa] Looks an awful lot like hb! 1: LDAR X0, [Xa] CBZ X0, 1b LDR X1, [Xa, #4] X1 == 1
24 24 SC acquire/release The fun doesn t stop here: STLR LDAR STLR LDAR globally observed in program order STLR is multi-copy atomic when observed by LDAR #1, [Xa] X0, [Xb] A variation on SB, memory initialised to zero: STLR #1, [Xb] LDAR X1, [Xa] X0 == X1 == 0 forbidden Unlike C11, provide SC when paired and map directly onto memory_order_seq_cst. Rsc Wsc Racq Wrel ARMv8 LDAR STLR LDAR STLR
25 Conclusion We ve only scratched the surface of the C11 and ARM memory models: Compound atomic operations (cmpxchg) memory_order_consume Explicit fences However, there is a deliberate mapping from ARMv8 to C11 SC and formal tools for the ARM model are under active development. 25
26 26 Thank You Hans-J. Boehm Threads Cannot be Implemented as a Library K. Gharachorloo Shared Memory Consistency Models: A Tutorial Paul McKenney et al. N4215: Towards Implementation and Use of memory_order_consume The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
27 27 Sequential Consistency (2) Program A B C A p0 Parallel Interleaving B p1 C p2
28 SC example SC is easy to reason about, as there is a single global ordering. This can be demonstrated by the IRIW litmus test: X = Y = 0 T0 X = 1 T1 Y = 1 T2 X2 = X Y2 = Y T3 Y3 = Y X3 = X (X2, Y2) = (1, 0) (X3, Y3) = (0, 1) not permitted (X2, Y2) = (0, 1) (X3, Y3) = (1, 0) not permitted SC actually forbids any reordering of reads and writes. 28
29 The ARM weak memory model The ARM architecture features a relatively weak memory model: No multi-copy atomicity requirement (unlike TSO) Arbitrary reordering of independent reads and writes Explicit barriers and instructions to enforce ordering System-level ordering (e.g. MMIO, Cache/TLB maintenance, ) Like many (all?) other architectures, the memory model is not formally defined and is driven by pragmatism rather than pure mathematics. 29
30 Observability 30 Ordering is defined in terms of observability by memory masters (observers). Writes A write to a location in memory is said to be observed by an observer when: (1) A subsequent read of the location by the same observer will return the value written by the observed write, or written by a write to that location by any observer that is sequenced in the coherence order of the location after the observed write and (2) A subsequent write of the location by the same observer will be sequenced in the coherence order of the location after the observed write This is actually pretty intuitive
31 Observability (2) 31 but reads are observable too! Reads A read of a location in memory is said to be observed by an observer when a subsequent write to the location by the same observer will have no effect on the value returned by the read. These definitions clearly have relations with rf and mo.
32 Global Observability and Completion A normal memory access is globally observed for a shareability domain when it is observed by all observers in that domain. n A table walk is complete for a shareability domain when its accesses are globally observed in that domain and the TLB is updated. n An access is complete for a shareability domain when it is globally observed in that domain and any table walks associated with it have completed in the same domain. Much more difficult to correlate with C11, which cares only about application-level ordering. n 32
33 Explicit barriers The ARM architecture defines three barrier instructions: ISB Pipeline flush and context synchronisation DMB <option> Ensure ordering of memory accesses DMB <option> Ensure completion of memory accesses The <option> argument specifies the required shareability domain (NSH, ISH, OSH, SY) and access type (ST). Defaults to full system, all access types if omitted. Userspace runs in the same inner-shareable domain. 33
34 Dependencies In the absence of explicit barriers, dependencies define observation order of normal memory accesses. 34 Address: value returned by a read is used to compute the address of a subsequent access. Control: value returned by a read is used to determine the condition flags and the flags are used in the condition code checking that determines the address of a subsequent access. Data: value returned by a read is used as data written by a subsequent write. There are also a few other rules (RaR, store speculation).
35 Dependency Examples 35 ldr r1, [r0, #4] and r1, #0xfff ldr r3, [r2, r1] ldr r1, [r0, #4] cmp r1, #1 addeq r2, #4 ldr r3, [r2] ldr r1, [r0, #4] add r1, #5 str r1, [r2] (address) (control) (data) Question: Which dependencies enforce ordering of observability?
36 Mapping to C11 Typically, architectures provide either stronger (x86) or weaker (PowerPC) guarantees than those required by the C11 relaxed memory models: Architecture SC Acq/rel Relaxed x86 ARMv7 = PowerPC = ia64 = ARMv8 = = Explicit fences are used to convert into. Rsc Wsc Racq Wrel ARMv7 LDR; DMB DMB; STR; DMB LDR; DMB DMB; STR ARMv8 LDAR STLR LDAR STLR 36
Relaxed Memory: The Specification Design Space
Relaxed Memory: The Specification Design Space Mark Batty University of Cambridge Fortran meeting, Delft, 25 June 2013 1 An ideal specification Unambiguous Easy to understand Sound w.r.t. experimentally
More informationP1202R1: Asymmetric Fences
Document number: P1202R1 Date: 2018-01-20 (pre-kona) Reply-to: David Goldblatt Audience: SG1 P1202R1: Asymmetric Fences Overview Some types of concurrent algorithms can be split
More informationARMv8-A Memory Systems. Systems. Version 0.1. Version 1.0. Copyright 2016 ARM Limited or its affiliates. All rights reserved.
Connect ARMv8-A User Memory Guide Systems Version 0.1 Version 1.0 Page 1 of 17 Revision Information The following revisions have been made to this User Guide. Date Issue Confidentiality Change 28 February
More informationThe C/C++ Memory Model: Overview and Formalization
The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions
More informationMemory Consistency Models. CSE 451 James Bornholt
Memory Consistency Models CSE 451 James Bornholt Memory consistency models The short version: Multiprocessors reorder memory operations in unintuitive, scary ways This behavior is necessary for performance
More informationUsing Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm
Using Weakly Ordered C++ Atomics Correctly Hans-J. Boehm 1 Why atomics? Programs usually ensure that memory locations cannot be accessed by one thread while being written by another. No data races. Typically
More informationC11 Compiler Mappings: Exploration, Verification, and Counterexamples
C11 Compiler Mappings: Exploration, Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu http://check.cs.princeton.edu November 22 nd, 2016 1 Compilers Must Uphold
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationMulticore Programming: C++0x
p. 1 Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 p. 2 C++0x: the next C++ Specified by the
More informationC++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar
C++ 11 Memory Gerstenberg NUMA Seminar Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion
More informationC++ Concurrency - Formalised
C++ Concurrency - Formalised Salomon Sickert Technische Universität München 26 th April 2013 Mutex Algorithms At most one thread is in the critical section at any time. 2 / 35 Dekker s Mutex Algorithm
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect
More information<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016
weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb Sutter s talks atomic Weapons: The C++ Memory Model and Modern Hardware Lock-Free Programming (or, Juggling Razor Blades)
More informationGPU Concurrency: Weak Behaviours and Programming Assumptions
GPU Concurrency: Weak Behaviours and Programming Assumptions Jyh-Jing Hwang, Yiren(Max) Lu 03/02/2017 Outline 1. Introduction 2. Weak behaviors examples 3. Test methodology 4. Proposed memory model 5.
More informationProgram logics for relaxed consistency
Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 1st Lecture, 28 July 2014 Outline Part I. Weak memory models 1. Intro
More informationShared Memory Consistency Models: A Tutorial
Shared Memory Consistency Models: A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The
More informationUnit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth
Unit 12: Memory Consistency Models Includes slides originally developed by Prof. Amir Roth 1 Example #1 int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 int t1 = x;! x = 1;! int t2 = y;! print(t1,t2)!
More informationMemory Consistency Models
Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt
More informationPortland State University ECE 588/688. Memory Consistency Models
Portland State University ECE 588/688 Memory Consistency Models Copyright by Alaa Alameldeen 2018 Memory Consistency Models Formal specification of how the memory system will appear to the programmer Places
More informationDeclarative semantics for concurrency. 28 August 2017
Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017 An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program
More informationParallel Computer Architecture Spring Memory Consistency. Nikos Bellas
Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency
More informationMotivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency
Shared Memory Consistency Models Authors : Sarita.V.Adve and Kourosh Gharachorloo Presented by Arrvindh Shriraman Motivations Programmer is required to reason about consistency to ensure data race conditions
More informationFoundations of the C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model
More informationRELAXED CONSISTENCY 1
RELAXED CONSISTENCY 1 RELAXED CONSISTENCY Relaxed Consistency is a catch-all term for any MCM weaker than TSO GPUs have relaxed consistency (probably) 2 XC AXIOMS TABLE 5.5: XC Ordering Rules. An X Denotes
More informationAn introduction to weak memory consistency and the out-of-thin-air problem
An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) CONCUR, 7 September 2017 Sequential consistency 2 Sequential
More informationTaming release-acquire consistency
Taming release-acquire consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) POPL 2016 Weak memory models Weak memory models provide formal sound semantics
More informationMemory Consistency Models
Calcolatori Elettronici e Sistemi Operativi Memory Consistency Models Sources of out-of-order memory accesses... Compiler optimizations Store buffers FIFOs for uncommitted writes Invalidate queues (for
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs
More informationHigh-level languages
High-level languages High-level languages are not immune to these problems. Actually, the situation is even worse: the source language typically operates over mixed-size values (multi-word and bitfield);
More informationShared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics
Shared Memory Programming with OpenMP Lecture 8: Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the
More informationReasoning about the C/C++ weak memory model
Reasoning about the C/C++ weak memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 13 October 2014 Talk outline I. Introduction Weak memory models The C11 concurrency model
More informationHardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13
Hardware models: inventing a usable abstraction for Power/ARM 1 Hardware models: inventing a usable abstraction for Power/ARM Disclaimer: 1. ARM MM is analogous to Power MM all this is your next phone!
More informationLowering C11 Atomics for ARM in LLVM
1 Lowering C11 Atomics for ARM in LLVM Reinoud Elhorst Abstract This report explores the way LLVM generates the memory barriers needed to support the C11/C++11 atomics for ARM. I measure the influence
More informationLoad-reserve / Store-conditional on POWER and ARM
Load-reserve / Store-conditional on POWER and ARM Peter Sewell (slides from Susmit Sarkar) 1 University of Cambridge June 2012 Correct implementations of C/C++ on hardware Can it be done?...on highly relaxed
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationDesigning Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve
Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important
More informationCoherence and Consistency
Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationHeterogeneous-Race-Free Memory Models
Heterogeneous-Race-Free Memory Models Jyh-Jing (JJ) Hwang, Yiren (Max) Lu 02/28/2017 1 Outline 1. Background 2. HRF-direct 3. HRF-indirect 4. Experiments 2 Data Race Condition op1 op2 write read 3 Sequential
More informationC++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model
C++ Memory Model (mkempf@hsr.ch) December 26, 2012 Abstract Multi-threaded programming is increasingly important. We need parallel programs to take advantage of multi-core processors and those are likely
More informationNew Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05
New Programming Abstractions for Concurrency Red Hat 12/04/05 1 Concurrency and atomicity C++11 atomic types Transactional Memory Provide atomicity for concurrent accesses by different threads Both based
More informationDistributed Operating Systems Memory Consistency
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Distributed Operating Systems Memory Consistency Marcus Völp (slides Julian Stecklina, Marcus Völp) SS2014 Concurrent
More informationC++ Memory Model. Don t believe everything you read (from shared memory)
C++ Memory Model Don t believe everything you read (from shared memory) The Plan Why multithreading is hard Warm-up example Sequential Consistency Races and fences The happens-before relation The DRF guarantee
More informationCan Seqlocks Get Along with Programming Language Memory Models?
Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. Boehm HP Labs Hans-J. Boehm: Seqlocks 1 The setting Want fast reader-writer locks Locking in shared (read) mode allows concurrent
More informationCS510 Advanced Topics in Concurrency. Jonathan Walpole
CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to
More informationThe C1x and C++11 concurrency model
The C1x and C++11 concurrency model Mark Batty University of Cambridge January 16, 2013 C11 and C++11 Memory Model A DRF model with the option to expose relaxed behaviour in exchange for high performance.
More informationShared Memory Consistency Models: A Tutorial
Shared Memory Consistency Models: A Tutorial By Sarita Adve, Kourosh Gharachorloo WRL Research Report, 1995 Presentation: Vince Schuster Contents Overview Uniprocessor Review Sequential Consistency Relaxed
More informationHSA MEMORY MODEL HOT CHIPS TUTORIAL - AUGUST 2013 BENEDICT R GASTER
HSA MEMORY MODEL HOT CHIPS TUTORIAL - AUGUST 2013 BENEDICT R GASTER WWW.QUALCOMM.COM OUTLINE HSA Memory Model OpenCL 2.0 Has a memory model too Obstruction-free bounded deques An example using the HSA
More informationProgramming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean? Hans-J. Boehm 10/25/2010 1 Disclaimers: This is an overview talk. Much of this work was done by others or jointly. I m relying particularly
More informationNew Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05
New Programming Abstractions for Concurrency in GCC 4.7 Red Hat 12/04/05 1 Concurrency and atomicity C++11 atomic types Transactional Memory Provide atomicity for concurrent accesses by different threads
More informationMemory Consistency Models: Convergence At Last!
Memory Consistency Models: Convergence At Last! Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign sadve@cs.uiuc.edu Acks: Co-authors: Mark Hill, Kourosh Gharachorloo,
More informationModule 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency
Memory Consistency Models Memory consistency SC SC in MIPS R10000 Relaxed models Total store ordering PC and PSO TSO, PC, PSO Weak ordering (WO) [From Chapters 9 and 11 of Culler, Singh, Gupta] [Additional
More informationUnderstanding POWER multiprocessors
Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2,3 Luc Maranget 3 Derek Williams 4 1 University of Cambridge 2 Oxford University 3 INRIA 4 IBM June 2011 Programming shared-memory
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency
More informationOverhauling SC atomics in C11 and OpenCL
Overhauling SC atomics in C11 and OpenCL John Wickerson, Mark Batty, and Alastair F. Donaldson Imperial Concurrency Workshop July 2015 TL;DR The rules for sequentially-consistent atomic operations and
More informationMemory barriers in C
Memory barriers in C Sergey Vojtovich Software Engineer @ MariaDB Foundation * * Agenda Normal: overview, problem, Relaxed Advanced: Acquire, Release Nightmare: Acquire_release, Consume Hell: Sequentially
More informationRelaxed Memory Consistency
Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationSequential Consistency & TSO. Subtitle
Sequential Consistency & TSO Subtitle Core C1 Core C2 data = 0, 1lag SET S1: store data = NEW S2: store 1lag = SET L1: load r1 = 1lag B1: if (r1 SET) goto L1 L2: load r2 = data; Will r2 always be set to
More informationMemory Models for C/C++ Programmers
Memory Models for C/C++ Programmers arxiv:1803.04432v1 [cs.dc] 12 Mar 2018 Manuel Pöter Jesper Larsson Träff Research Group Parallel Computing Faculty of Informatics, Institute of Computer Engineering
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More informationCS5460: Operating Systems
CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that
More informationLecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models
Lecture 13: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models 1 Coherence Vs. Consistency Recall that coherence guarantees
More informationCOMP Parallel Computing. CC-NUMA (2) Memory Consistency
COMP 633 - Parallel Computing Lecture 11 September 26, 2017 Memory Consistency Reading Patterson & Hennesey, Computer Architecture (2 nd Ed.) secn 8.6 a condensed treatment of consistency models Coherence
More informationFormal Specification of RISC-V Systems Instructions
Formal Specification of RISC-V Systems Instructions Arvind Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan Computer Science and Artificial Intelligence Lab. MIT RISC-V Workshop, MIT,
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationBeyond Sequential Consistency: Relaxed Memory Models
1 Beyond Sequential Consistency: Relaxed Memory Models Computer Science and Artificial Intelligence Lab M.I.T. Based on the material prepared by and Krste Asanovic 2 Beyond Sequential Consistency: Relaxed
More informationOther consistency models
Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012
More informationSymmetric Multiprocessors: Synchronization and Sequential Consistency
Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November
More informationUsing Relaxed Consistency Models
Using Relaxed Consistency Models CS&G discuss relaxed consistency models from two standpoints. The system specification, which tells how a consistency model works and what guarantees of ordering it provides.
More informationMultiprocessor Synchronization
Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory
More informationAdvanced OpenMP. Memory model, flush and atomics
Advanced OpenMP Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the source code. Compilers, processors
More informationSELECTED TOPICS IN COHERENCE AND CONSISTENCY
SELECTED TOPICS IN COHERENCE AND CONSISTENCY Michel Dubois Ming-Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA90089-2562 dubois@usc.edu INTRODUCTION IN CHIP
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationWeak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014
Weak memory models Mai Thuong Tran PMA Group, University of Oslo, Norway 31 Oct. 2014 Overview 1 Introduction Hardware architectures Compiler optimizations Sequential consistency 2 Weak memory models TSO
More informationAudience. Revising the Java Thread/Memory Model. Java Thread Specification. Revising the Thread Spec. Proposed Changes. When s the JSR?
Audience Revising the Java Thread/Memory Model See http://www.cs.umd.edu/~pugh/java/memorymodel for more information 1 This will be an advanced talk Helpful if you ve been aware of the discussion, have
More informationOverhauling SC atomics in C11 and OpenCL
Overhauling SC atomics in C11 and OpenCL Mark Batty, Alastair F. Donaldson and John Wickerson INCITS/ISO/IEC 9899-2011[2012] (ISO/IEC 9899-2011, IDT) Provisional Specification Information technology Programming
More informationDistributed Systems. Distributed Shared Memory. Paul Krzyzanowski
Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In small multiprocessors, sequential consistency can be implemented relatively easily. However, this is not true for large multiprocessors. Why? This is not the
More informationThe Java Memory Model
The Java Memory Model The meaning of concurrency in Java Bartosz Milewski Plan of the talk Motivating example Sequential consistency Data races The DRF guarantee Causality Out-of-thin-air guarantee Implementation
More information740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L17: Memory Model Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements HW4 / Lab4 1 Overview Symmetric Multi-Processors (SMPs) MIMD processing cores
More informationLecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory
Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section
More informationRepairing Sequential Consistency in C/C++11
Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon Kang Seoul National University, Korea jeehoon.kang@sf.snu.ac.kr
More informationG52CON: Concepts of Concurrency
G52CON: Concepts of Concurrency Lecture 6: Algorithms for Mutual Natasha Alechina School of Computer Science nza@cs.nott.ac.uk Outline of this lecture mutual exclusion with standard instructions example:
More informationHardware Memory Models: x86-tso
Hardware Memory Models: x86-tso John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 9 20 September 2016 Agenda So far hardware organization multithreading
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationTriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
TriCheck: Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel, Yatin A. Manerkar, Daniel Lustig*, Michael Pellauer*, Margaret Martonosi Princeton University *NVIDIA ASPLOS 2017
More informationLecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,
More informationSequential Consistency for Heterogeneous-Race-Free
Sequential Consistency for Heterogeneous-Race-Free DEREK R. HOWER, BRADFORD M. BECKMANN, BENEDICT R. GASTER, BLAKE A. HECHTMAN, MARK D. HILL, STEVEN K. REINHARDT, DAVID A. WOOD JUNE 12, 2013 EXECUTIVE
More informationRepairing Sequential Consistency in C/C++11
Technical Report MPI-SWS-2016-011, November 2016 Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon
More informationPage 1. Outline. Coherence vs. Consistency. Why Consistency is Important
Outline ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Memory Consistency Models Copyright 2006 Daniel J. Sorin Duke University Slides are derived from work by Sarita
More informationROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define
More informationCS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252
More informationThe C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting
The C++ Memory Model Rainer Grimm Training, Coaching and Technology Consulting www.grimm-jaud.de Multithreading with C++ C++'s answers to the requirements of the multicore architectures. A well defined
More informationTopic C Memory Models
Memory Memory Non- Topic C Memory CPEG852 Spring 2014 Guang R. Gao CPEG 852 Memory Advance 1 / 29 Memory 1 Memory Memory Non- 2 Non- CPEG 852 Memory Advance 2 / 29 Memory Memory Memory Non- Introduction:
More informationSharedArrayBuffer and Atomics Stage 2.95 to Stage 3
SharedArrayBuffer and Atomics Stage 2.95 to Stage 3 Shu-yu Guo Lars Hansen Mozilla November 30, 2016 What We Have Consensus On TC39 agreed on Stage 2.95, July 2016 Agents API (frozen) What We Have Consensus
More informationLanguage- Level Memory Models
Language- Level Memory Models A Bit of History Here is a new JMM [5]! 2000 Meyers & Alexandrescu DCL is not portable in C++ [3]. Manson et. al New shiny C++ memory model 2004 2008 2012 2002 2006 2010 2014
More informationLecture: Consistency Models, TM
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency
More informationExample: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54
Example: The Dekker Algorithm on SMP Systems Memory Consistency The Dekker Algorithm 43 / 54 Using Memory Barriers: the Dekker Algorithm Mutual exclusion of two processes with busy waiting. //flag[] is
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More information