C++ Concurrency - Formalised

Similar documents
Multicore Programming: C++0x

The C/C++ Memory Model: Overview and Formalization

The C1x and C++11 concurrency model

Hardware Memory Models: x86-tso

Relaxed Memory: The Specification Design Space

High-level languages

Multicore Programming Java Memory Model

Memory Models for C/C++ Programmers

The Semantics of x86-cc Multiprocessor Machine Code

Programming Language Memory Models: What do Shared Variables Mean?

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

Mechanised industrial concurrency specification: C/C++ and GPUs. Mark Batty University of Kent

Understanding POWER multiprocessors

Load-reserve / Store-conditional on POWER and ARM

From POPL to the Jungle and back

Lowering C11 Atomics for ARM in LLVM

Topic C Memory Models

RELAXED CONSISTENCY 1

Program logics for relaxed consistency

Taming release-acquire consistency

Implementing the C11 memory model for ARM processors. Will Deacon February 2015

C++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model

Part 1. Shared memory: an elusive abstraction

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Foundations of the C++ Concurrency Memory Model

Overhauling SC atomics in C11 and OpenCL

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Nondeterminism is Unavoidable, but Data Races are Pure Evil

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Industrial concurrency specification for C/C++ Mark Batty University of Kent

An introduction to weak memory consistency and the out-of-thin-air problem

Example: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54

Declarative semantics for concurrency. 28 August 2017

Overhauling SC atomics in C11 and OpenCL

Overview: Memory Consistency

THÈSE DE DOCTORAT de l Université de recherche Paris Sciences Lettres PSL Research University

Memory Consistency Models

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

Programming in Parallel COMP755

Chenyu Zheng. CSCI 5828 Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

CS5460: Operating Systems

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Reasoning About The Implementations Of Concurrency Abstractions On x86-tso. By Scott Owens, University of Cambridge.

Verifying Fence Elimination Optimisations

Reasoning between Programming Languages and Architectures

C++ Memory Model. Don t believe everything you read (from shared memory)

Multicore Semantics and Programming

Distributed Operating Systems Memory Consistency

Reasoning about the C/C++ weak memory model

Pattern-based Synthesis of Synchronization for the C++ Memory Model

Programming Languages

X-86 Memory Consistency

Brookes is Relaxed, Almost!

x86-tso: A Rigorous and Usable Programmer s Model for x86 Multiprocessors

Moscova. Jean-Jacques Lévy. March 23, INRIA Paris Rocquencourt

A Concurrency Semantics for Relaxed Atomics that Permits Optimisation and Avoids Thin-Air Executions

Advanced OpenMP. Memory model, flush and atomics

Relaxed Memory Consistency

Memory Consistency Models: Convergence At Last!

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

Partially Redundant Fence Elimination for x86, ARM, and Power Processors

Verification and Validation

Memory Consistency Models. CSE 451 James Bornholt

Threads Cannot Be Implemented As a Library

C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar

Concept of a process

GPU Concurrency: Weak Behaviours and Programming Assumptions

Don t sit on the fence: A static analysis approach to automatic fence insertion

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Is coherence everything?

CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency

An operational semantics for C/C++11 concurrency

Applied Theorem Proving: Modelling Instruction Sets and Decompiling Machine Code. Anthony Fox University of Cambridge, Computer Laboratory

Beyond Sequential Consistency: Relaxed Memory Models

CS533 Concepts of Operating Systems. Jonathan Walpole

OS Process Synchronization!

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Shared Memory Consistency Models: A Tutorial

Multiprocessor Synchronization

arxiv: v1 [cs.pl] 29 Aug 2018

C++ Memory Model Tutorial

Mohamed M. Saad & Binoy Ravindran

Concurrency. Johan Montelius KTH

Generating Code from Event-B Using an Intermediate Specification Notation

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Compositional Verification of Compiler Optimisations on Relaxed Memory

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

A CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency

Concurrent Processes Rab Nawaz Jadoon

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Static and dynamic Testing

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Threads and Memory Model for C++

Repairing Sequential Consistency in C/C++11

Checking Concurrent Java Programs

Transcription:

C++ Concurrency - Formalised Salomon Sickert Technische Universität München 26 th April 2013

Mutex Algorithms At most one thread is in the critical section at any time. 2 / 35

Dekker s Mutex Algorithm [2] Initialisation 1 x = 0; y = 0; Thread 1 1 x = 1; 2 if (y == 1) { 3... //Busy Wait 4 } 5 // Critical Section Thread 2 1 y = 1; 2 if (x == 1) { 3... //Busy Wait 4 } 5 // Critical section 3 / 35

Dekker s Mutex Algorithm [2] Initialisation 1 x = 0; y = 0; Thread 1 1 x = 1; 2 if (y == 1) { 3... //Busy Wait 4 } 5 // Critical Section Thread 2 1 y = 1; 2 if (x == 1) { 3... //Busy Wait 4 } 5 // Critical section 1 MOV [x] <- 1 2 MOV r1 <- [y] 1 MOV [y] <- 1 2 MOV r2 <- [x] 3 / 35

Modern x86 Multiprocessors - simplified (based on [4]) r1 r2 r1 r2 H/W Thread 1 H/W Thread 2 Shared Memory 4 / 35

Modern x86 Multiprocessors - simplified r1 r2 r1 r2 H/W Thread 1 H/W Thread 2 Write B. Write B. Shared Memory 5 / 35

Modern x86 Multiprocessors and Dekker s Algorithm - - H/W Thread 1 - - H/W Thread 2 - - x = 0, y = 0 6 / 35

Modern x86 Multiprocessors and Dekker s Algorithm - - H/W Thread 1 - - H/W Thread 2 x = 1 - x = 0, y = 0 7 / 35

Modern x86 Multiprocessors and Dekker s Algorithm - - H/W Thread 1 - - H/W Thread 2 x = 1 y = 1 x = 0, y = 0 8 / 35

Modern x86 Multiprocessors and Dekker s Algorithm r1 = 0 - H/W Thread 1 - - H/W Thread 2 x = 1 y = 1 x = 0, y = 0 9 / 35

Modern x86 Multiprocessors and Dekker s Algorithm r1 = 0 - H/W Thread 1 - r2 = 0 H/W Thread 2 x = 1 y = 1 x = 0, y = 0 10 / 35

Modern x86 Multiprocessors Both threads may enter the critical section at the same time! 11 / 35

Modern x86 Multiprocessors Both threads may enter the critical section at the same time! Due to write buffers threads may see a subtly different memory. 11 / 35

Modern x86 Multiprocessors Both threads may enter the critical section at the same time! Due to write buffers threads may see a subtly different memory. The processor exhibits a relaxed memory model. 11 / 35

Modern x86 Multiprocessors Both threads may enter the critical section at the same time! Due to write buffers threads may see a subtly different memory. The processor exhibits a relaxed memory model. Solutions: Assembler: FENCE instructions. C++11: Special types. (Atomics) 11 / 35

Modern x86 Multiprocessors Both threads may enter the critical section at the same time! Due to write buffers threads may see a subtly different memory. The processor exhibits a relaxed memory model. Solutions: Assembler: FENCE instructions. C++11: Special types. (Atomics) Sequential consistency is restored by paying a performance penalty. 11 / 35

Modern x86 Multiprocessors Both threads may enter the critical section at the same time! Due to write buffers threads may see a subtly different memory. The processor exhibits a relaxed memory model. Solutions: Assembler: FENCE instructions. C++11: Special types. (Atomics) Sequential consistency is restored by paying a performance penalty. Some Non-x86 architectures exhibit even weaker models, e.g ARM. 11 / 35

Mathematizing C++ Concurrency ([3]) Given a program p, what are the possible ways to execute it? 12 / 35

Mathematizing C++ Concurrency ([3]) Given a program p, what are the possible ways to execute it? 1. Calculate X opsem from p. Loosely speaking: Structure of the program. 12 / 35

Mathematizing C++ Concurrency ([3]) Given a program p, what are the possible ways to execute it? 1. Calculate X opsem from p. Loosely speaking: Structure of the program. 2. Find all X witness consistent with X opsem. Loosely speaking: Different executions of the program. 12 / 35

Mathematizing C++ Concurrency ([3]) Given a program p, what are the possible ways to execute it? 1. Calculate X opsem from p. Loosely speaking: Structure of the program. 2. Find all X witness consistent with X opsem. Loosely speaking: Different executions of the program. 3. Check for undefined behaviour. Reading from uninitialized variables Unsequenced Races (e.g. x == (x = 2)) Data Races... 12 / 35

X opsem : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } 13 / 35

X opsem : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } e:w x=1 14 / 35

X opsem : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 15 / 35

X opsem : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 a:w a=1 c:r y=? b:r x=? d:w z=(x? == y?) 16 / 35

X opsem : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 a:w a=1 sb sb c:r y=? b:r x=? sb sb d:w z=(x? == y?) 17 / 35

X opsem : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 asw asw a:w a=1 sb sb c:r y=? b:r x=? sb sb d:w z=(x? == y?) 18 / 35

X opsem Independent from the architecture 19 / 35

X opsem Independent from the architecture Composed of (among other parts): Actions (simplified) aid : R l = v aid : W l = v 19 / 35

X opsem Independent from the architecture Composed of (among other parts): Actions (simplified) aid : R l = v aid : W l = v Binary relations: sequenced before (sb) additional synchronized with (asw) 19 / 35

X opsem Independent from the architecture Composed of (among other parts): Actions (simplified) aid : R l = v aid : W l = v Binary relations: sequenced before (sb) additional synchronized with (asw) In this special case: simple happens before = ( sequenced before ) additional synchronized with + 19 / 35

X witness : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 hb hb a:w a=1 hb hb c:r y=? b:r x=? hb hb d:w z=(x? == y?) 20 / 35

X witness : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 hb hb rf a:w a=1 hb hb c:r y=2 b:r x=? hb hb d:w z=(x? == 2) 21 / 35

X witness : An Example 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 hb hb rf a:w a=1 rf hb hb c:r y=2 b:r x=1 hb hb d:w z=0 22 / 35

X witness Dependent on the architecture 23 / 35

X witness Dependent on the architecture Composed of: Binary relations: reads from (rf) 23 / 35

X witness Dependent on the architecture Composed of: Binary relations: reads from (rf) sequentialconsistency (sc) (not applicable in the example) modificationorder (mo) (not applicable in the example) 23 / 35

X = (X opsem, X witness ) : An execution candidate 1 int main() { 2 int x, y, z, a; 3 {{{ x = 1; 4 y = 2; 5 }}}; 6 a = 1; 7 z = (x == y); 8 return 0; 9 } f:w y=2 e:w x=1 hb hb rf a:w a=1 rf hb hb c:r y=2 b:r x=1 hb hb d:w z=0 24 / 35

X = (X opsem, X witness ) : Undefined Behaviour? f:w y=2 e:w x=1 hb hb Uninitialised Reads? Unsequenced Races? Data Races? rf a:w a=1 rf hb hb c:r y=2 b:r x=1 hb hb d:w z=0 25 / 35

X = (X opsem, X witness ) : Undefined Behaviour? f:w y=2 e:w x=1 hb hb Uninitialised Reads? Unsequenced Races? Data Races? rf a:w a=1 rf hb hb c:r y=2 b:r x=1 hb hb d:w z=0 25 / 35

X = (X opsem, X witness ) : Undefined Behaviour? f:w y=2 e:w x=1 hb hb Uninitialised Reads? Unsequenced Races? Data Races? rf a:w a=1 rf hb hb c:r y=2 b:r x=1 hb hb d:w z=0 25 / 35

X = (X opsem, X witness ) : Undefined Behaviour? f:w y=2 e:w x=1 hb hb Uninitialised Reads? Unsequenced Races? Data Races? rf a:w a=1 rf hb hb c:r y=2 b:r x=1 hb hb d:w z=0 25 / 35

Undefined Behaviour: Data Races 1 int main() { 2 int x = 0; 3 {{{ x = 1; 4 x = 2; 5 }}}; 6 return x; 7 } c:w x=1 a:w x=0 asw asw b:r x=? asw d:w x=2 asw 26 / 35

Undefined Behaviour: Data Races 1 int main() { 2 int x = 0; 3 {{{ x = 1; 4 x = 2; 5 }}}; 6 return x; 7 } c:w x=1 a:w x=0 hb hb d:w x=2 hb hb b:r x=? 27 / 35

Undefined Behaviour: Data Races 1 int main() { 2 int x = 0; 3 {{{ x = 1; 4 x = 2; 5 }}}; 6 return x; 7 } c:w x=1 a:w x=0 hb hb dr d:w x=2 hb hb b:r x=? 28 / 35

The Formalised C++ Memory Model cpp_memory_model opsem (p : program) = let pre_executions = {(X opsem, X witness ) opsem p X opsem consistent_execution X opsem X witness } in if X pre_executions. undefined_behaviour X then NONE else SOME pre_executions 29 / 35

C++11 Concurrency / Language Features Concurrency Idioms (Atomics, Mutexes, Threads) 30 / 35

C++11 Concurrency / Language Features Concurrency Idioms (Atomics, Mutexes, Threads) Pre-C++11 No memory model for multi-threaded code Concurrency Idioms provided by a third party: pthreads OpenMP Drawbacks: No formalised standard, compiler may produce incorrect code. 30 / 35

C++11 Concurrency / Language Features Concurrency Idioms (Atomics, Mutexes, Threads) Pre-C++11 No memory model for multi-threaded code Concurrency Idioms provided by a third party: pthreads OpenMP Drawbacks: No formalised standard, compiler may produce incorrect code. C++11 A memory model for multi-threaded code Concurrency Idioms in are part of the language (std::atomic<t> [1], std::mutex, std::thread) Similar to Java Benefits: Compiler is able to produce correct code. 30 / 35

Applications of the Formal Memory Model Corrections to the C++0x standard. memory_order_seq_cst was in fact not sequentially consistent 31 / 35

Applications of the Formal Memory Model Corrections to the C++0x standard. memory_order_seq_cst was in fact not sequentially consistent Confidence in memory model and specification. 31 / 35

Applications of the Formal Memory Model Corrections to the C++0x standard. memory_order_seq_cst was in fact not sequentially consistent Confidence in memory model and specification. Verify correctness of prototype implementations. Figure: Figure taken from [3] 31 / 35

Applications of the Formal Memory Model Corrections to the C++0x standard. memory_order_seq_cst was in fact not sequentially consistent Confidence in memory model and specification. Verify correctness of prototype implementations. Developer support. 31 / 35 Figure: Figure taken from [3]

32 / 35 Questions?

Bonus: Atomics vs. Mutexes - short presentation 33 / 35

Bonus: Dekker s algorithm in CppMem http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/index.html 34 / 35

References C++ atomic types and operations, http://www.openstd.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html. Dekker s algorithm, http://www.cs.utexas.edu/users/ewd/transcriptions/ewd01xx/ewd123.h Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizing c++ concurrency. SIGPLAN Not., 46(1):55 66, January 2011. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. x86-tso: a rigorous and usable programmer s model for x86 multiprocessors. Commun. ACM, 53(7):89 97, July 2010. 35 / 35