Multicore Programming: C++0x

Similar documents
The C1x and C++11 concurrency model

C++ Concurrency - Formalised

The C/C++ Memory Model: Overview and Formalization

Relaxed Memory: The Specification Design Space

Reasoning about the C/C++ weak memory model

Program logics for relaxed consistency

Taming release-acquire consistency

Load-reserve / Store-conditional on POWER and ARM

Hardware Memory Models: x86-tso

An introduction to weak memory consistency and the out-of-thin-air problem

High-level languages

Declarative semantics for concurrency. 28 August 2017

Overhauling SC atomics in C11 and OpenCL

Multicore Programming Java Memory Model

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

The Semantics of x86-cc Multiprocessor Machine Code

Overhauling SC atomics in C11 and OpenCL

Reasoning About The Implementations Of Concurrency Abstractions On x86-tso. By Scott Owens, University of Cambridge.

Repairing Sequential Consistency in C/C++11

Can Seqlocks Get Along with Programming Language Memory Models?

Repairing Sequential Consistency in C/C++11

The Java Memory Model

Foundations of the C++ Concurrency Memory Model

Programming Language Memory Models: What do Shared Variables Mean?

Repairing Sequential Consistency in C/C++11

An operational semantics for C/C++11 concurrency

X-86 Memory Consistency

RELAXED CONSISTENCY 1

THÈSE DE DOCTORAT de l Université de recherche Paris Sciences Lettres PSL Research University

Taming Release-Acquire Consistency

Relaxed Memory Consistency

Implementing the C11 memory model for ARM processors. Will Deacon February 2015

SharedArrayBuffer and Atomics Stage 2.95 to Stage 3

C++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model

Understanding POWER multiprocessors

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

Safe Optimisations for Shared-Memory Concurrent Programs. Tomer Raz

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

A Concurrency Semantics for Relaxed Atomics that Permits Optimisation and Avoids Thin-Air Executions

Example: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54

Nitpicking C++ Concurrency

Overhauling SC Atomics in C11 and OpenCL

C++ Memory Model. Don t believe everything you read (from shared memory)

Overhauling SC atomics in C11 and OpenCL

Lowering C11 Atomics for ARM in LLVM

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Multicore Semantics and Programming

Memory Models for C/C++ Programmers

C++ Memory Model Tutorial

CS510 Advanced Topics in Concurrency. Jonathan Walpole

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

A Denotational Semantics for Weak Memory Concurrency

Types for Precise Thread Interference

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Deterministic Parallel Programming

The Java Memory Model

Common Compiler Optimisations are Invalid in the C11 Memory Model and what we can do about it

A better x86 memory model: x86-tso (extended version)

Generating Code from Event-B Using an Intermediate Specification Notation

Overview: Memory Consistency

Relaxed Memory-Consistency Models

Coherence and Consistency

Distributed Operating Systems Memory Consistency

Reasoning between Programming Languages and Architectures

Using Relaxed Consistency Models

Library Abstraction for C/C++ Concurrency

Effective Stateless Model Checking for C/C++ Concurrency

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Multithreading done right? Rainer Grimm: Multithreading done right?

M4 Parallelism. Implementation of Locks Cache Coherence

Other consistency models

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

11/19/2013. Imperative programs

Pattern-based Synthesis of Synchronization for the C++ Memory Model

The C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

CS 450 Exam 2 Mon. 4/11/2016

From POPL to the Jungle and back

Nondeterminism is Unavoidable, but Data Races are Pure Evil

Threads and Locks. Chapter Introduction Locks

Automatically Comparing Memory Consistency Models

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan

GPU Concurrency: Weak Behaviours and Programming Assumptions

Reasoning about the Implementation of Concurrency Abstractions on x86-tso

arxiv: v2 [cs.pl] 9 Jul 2016

Programming Languages

Compositional Verification of Compiler Optimisations on Relaxed Memory

Relaxed Memory-Consistency Models

Review of last lecture. Goals of this lecture. DPHPC Overview. Lock-based queue. Lock-based queue

Language- Level Memory Models

Threads and Memory Model for C++

Show No Weakness: Sequentially Consistent Specifications of TSO Libraries

P1202R1: Asymmetric Fences

Motivation & examples Threads, shared memory, & synchronization

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

Failure-atomic Synchronization-free Regions

Transcription:

p. 1 Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010

p. 2 C++0x: the next C++ Specified by the C++ Standards Committee Defined in The Standard, a 1300 page prose document The design is a detailed compromise: performance, optimisations and hardware usability compatibility with the next C, C1X legacy code

p. 3 C++0x: the next C++ Our mathematical model is faithful to the intent of, and has influenced The Standard The model: syntactically separates out expert features has a weak memory defines a happens-before relation requires non-atomic reads and writes to be DRF provides atomic reads and writes for racy programs

The syntactic divide p. 4 An example of the syntax // for regular programmers: atomic_int x = 0; x.store(1); y = x.load(); // for experts: x.store(2, memory_order); y = x.load(memory_order); atomic_thread_fence(memory_order); With a choice of memory order mo_seq_cst mo_release mo_acquire mo_acq_rel mo_consume mo_relaxed

A model of two parts p. 5 An operational semantics: Processes programs, identifying memory actions Constructs candidate executions, E opsem An axiomatic memory model: Judges E opsem paired with a memory ordering, X witness Searches the consistent executions for races and unconstrained reads

p. 6 Judgement of the axiomatic model cpp memory model opsem (p : program) = let pre executions = {(E opsem,x witness ). opsem p E opsem consistent execution (E opsem,x witness )} in if X pre executions. (indeterminate reads X {}) (unsequenced races X {}) (data races X {}) then NONE else SOME pre executions

The relations of a pre-execution p. 7 An E opsem part containing: sequenced before, program order asw additional synchronizes with, inter-thread ordering dd data-dependence An X witness part containing: rf relates a write to any reads that take its value sc a total order over mo_seq_cst and mutex actions mo modification order, per location total order of writes

A single threaded program p. 8 a:w na x=2 int main() { int x = 2; int y = 0; y = (x == x); return 0; } rf b:w na y=0 rf c:r na x=2 d:r na x=2 e:w na y=1../examples/t1.c

Memory actions p. 9 action ::= a:r na x=v non-atomic read a:w na x=v non-atomic write a:r mo x=v atomic read a:w mo x=v atomic write a:rmw mo x=v1/v2 atomic read-modify-write a:l x lock a:u x unlock a:f mo fence

Memory orders p. 10 Memory orders are shown as follows: mo ::= SC memory order seq cst RLX memory order relaxed REL memory order release ACQ memory order acquire CON memory order consume A/R memory order acq rel

p. 11 Location kinds location kind = MUTEX NON ATOMIC ATOMIC actions respect location kinds = a. case location a of SOME l (case location-kind l of MUTEX is lock or unlock a NON ATOMIC is load or store a ATOMIC is load or store a is atomic action a) NONE T

That single threaded program again p. 12 a:w na x=2 int main() { int x = 2; int y = 0; y = (x == x); return 0; } rf b:w na y=0 rf c:r na x=2 d:r na x=2 e:w na y=1../examples/t1.c

p. 13 Unsequenced race unsequenced races = {(a, b). is load or store a is load or store b (a b) same location a b (is write a is write b) same thread a b (a sequenced-before b b sequenced-before a)}

An unsequenced race p. 14 a:w na x=2 int main() { int x = 2; int y = 0; y = (x == (x=3)); return 0; } d:r na x=2 rf b:w na y=0 ur dummy dummy c:w na x=3 e:w na y=0

A multi-threaded program p. 15 void foo(int* p) {*p=3;} int main() { int x = 2; int y; thread t1(foo, &x); y = 3; t1.join(); return 0; } becomes: int main() { int x = 2; int y; {{{ x = 3; y = 3; }}} return 0; } a:w na x=2 asw asw b:w na x=3 c:w na y=3../examples/t3-parallel.c

Synchronizes-with and happens-before p. 16 The parent thread has synchronization edges, labeled asw, to its child threads. There are other ways to synchronize. We will define the happens-before relation later. It contains the transitive closure of all synchronization edges and all sequenced before edges (amongst other things).

p. 17 Data race data races = {(a,b). (a b) same location a b (is write a is write b) same thread a b (is atomic action a is atomic action b) (a happens-before b b happens-before a)}

p. 18 A data race int main() { int x = 2; int y; {{{ x=3; y=(x==3); }}}; return 0; } a:w na x=2 asw asw,rf b:w na x=3 dr c:r na x=2 d:w na y=0

p. 19 Modification order A total order of the writes at each atomic location, similar to coherence order on Power a:w na x=0 int main() { atomic_int x = 0; int y = 0; {{{ { x.store(1); x.store(2); } { y = 1; } }}} return 0; } b:w na y=0 mo asw asw c:w SC x=1 e:w na y=1,mo d:w SC x=2../examples/t70-na-mo.c

p. 20 SC order There is a total order over all sequentially consistent atomic actions. SC atomics read the last prior write in SC order (or a non SC write). consistent sc order = let sc happens before = happens-before all sc actions in let sc mod order = modification-order all sc actions in strict total order over all sc actions ( ) sc sc happens before sc sc mod order sc

p. 21 Atomic actions do not race a:w SC x=2 int main() { atomic_int x; x.store(2, mo_seq_cst); int y = 0; {{{ x.store(3); y = ((x.load()) == 3); }}}; return 0; } b:w na y=0 asw c:w SC x=3 asw rf,sc sc d:r SC x=2 e:w na y=0

The release-acquire idiom p. 22 // sender x =... y = 1; // receiver while (0 == y); r = x; a:w na x=1 b:w REL y=1 sw c:r ACQ y=1../examples/t15.c d:r na x=1

Release-acquire synchronization p. 23 a:w na x=1 b:w REL y=1,mo,rs sw c:w RLX y=2 rf d:r ACQ y=2../examples/t8a.c e:r na x=1

p. 24 The release sequence The release sequence is a sub-sequence of the the modification order following a release rs element rs head a = same thread a rs head is atomic rmw a a rel release-sequence b = is at atomic location b is release a rel ( (b = a rel ) (rs element a rel b a rel modification-order b ( c. a rel modification-order c modification-order b = rs element a rel c)))

An execution with a release sequence p. 25 a:w na x=1 b:w REL y=1,mo,rs c:w RLX y=2 rf d:r ACQ y=2../examples/t8a-no-sw.c e:r na x=1

p. 26 Synchronizes-with a synchronizes-with b = (* additional synchronization, from thread create etc. *) a additional-synchronized-with b (same location a b a actions b actions ( (* mutex synchronization *) (is unlock a is lock b a sc b) (* release/acquire synchronization *) (is release a is acquire b same thread a b ( c. a release-sequence c rf b)) [...]))

Release-acquire synchronization p. 27 a:w na x=1 b:w REL y=1,mo,rs sw c:w RLX y=2 rf d:r ACQ y=2../examples/t8a.c e:r na x=1

Happens-before (without consume) p. 28 simple happens before = ( sequenced-before ) synchronizes-with + consistent simple happens before = simple happens before irreflexive ( )

p. 29 Happens-before inter-thread-happens-before = let r = synchronizes-with dependency-ordered-before ( synchronizes-with ) sequenced-before in ( r ( sequenced-before )) r + consistent inter thread happens before = irreflexive ( inter-thread-happens-before ) happens-before = sequenced-before inter-thread-happens-before

p. 30 Visible side effect Non-atomic reads read from one of their visible side effects a visible-side-effect b = a happens-before b is write a is read b same location a b ( c. (c a) (c b) is write c same location c b a happens-before c happens-before b)

p. 31 Visible sequence of side effects Atomic reads read from a write in one of their visible sequences of side effects. visible sequence of side effects tail vsse head b = {c. vsse head modification-order c (b happens-before c) ( a. vsse head modification-order a modification-order c = (b happens-before a))}

p. 32 An atomic read a:w na x=1 b:w REL y=1,mo,rs sw c:w RLX y=2 rf d:r ACQ y=2../examples/t8a.c e:r na x=1

p. 33 Consistent reads-from mapping consistent reads from mapping = ( b. (is read b is at non atomic location b) = visible-side-effect (if ( a vse. a vse b) visible-side-effect rf then ( a vse. a vse b a vse b) else ( a. a rf b))) ( b. (is read b is at atomic location b) = (if ( (b,vsse) visible-sequences-of-side-effects. (b = b)) then ( (b,vsse) visible-sequences-of-side-effects. (b = b) ( c vsse. c rf b)) else ( a. a rf b))) ( (x,a). rf (y,b). rf a happens-before b same location a b is at atomic location b = (x = y) x modification-order y) ( (a,b). rf is atomic rmw b = a modification-order b) ( (a,b). rf is seq cst b = is seq cst a a sc λc. is write c same location b c b) [...]

p. 34 Coherence Coherence is defined an absence of four execution fragments: a:w RLX x=1 c:r RLX x=1 rf mo hb b:w RLX x=2 d:r RLX x=2 rf../examples/coherence-axiom-1.exc b:w RLX x=2 c:w RLX x=1 mo hb rf d:r RLX x=2../examples/coherence-axiom-2.exc a:w RLX x=1 hb mo b:w RLX x=2../examples/coherence-axiom-4.exc rf a:w RLX x=1 c:r RLX x=1 mo hb d:w RLX x=2../examples/coherence-axiom-3.exc

p. 35 Concurrency examples that can be observed The model allows the following non-sc behaviour: message passing (RLX, REL-CON) store buffering (REL-ACQ, RLX, REL-CON) load buffering (RLX, CON) write-to-read causality (RLX, CON) IRIW (REL-ACQ, RLX, REL-CON)...but DRF programs that use only the memory_order_seq_cst atomics should be sequentially consistent

An execution compiler p. 36 Operation x86 Implementation Load non-sc mov Load Seq cst lock xadd(0) OR: mfence, mov Store non-sc mov Store Seq cst lock xchg OR: mov, mfence Fence non-sc no-op Fence Seq cst mfence

p. 37 Theorem consistent execution E opsem X witness evt comp evt comp 1 E x86 valid execution X x86

p. 38 Conclusion C++0x offers a simple model to normal programmers while experts get a highly configurable language that abstracts the hardware memory model we have arrived just in time to point out a few bugs, and many changes have been made as a result of our work the intricacy of such models makes tools important, CPPMEM helps in exploring and understanding the model formal models provide an opportunity to provide guarantees about programs based on the specification, like our compiler correctness result