Declarative semantics for concurrency. 28 August 2017

Similar documents
Taming release-acquire consistency

An introduction to weak memory consistency and the out-of-thin-air problem

Effective Stateless Model Checking for C/C++ Concurrency

Repairing Sequential Consistency in C/C++11

Repairing Sequential Consistency in C/C++11

Program logics for relaxed consistency

Taming Release-Acquire Consistency

Relaxed Memory: The Specification Design Space

Repairing Sequential Consistency in C/C++11

Reasoning about the C/C++ weak memory model

The C1x and C++11 concurrency model

On Parallel Snapshot Isolation and Release/Acquire Consistency

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

Overhauling SC atomics in C11 and OpenCL

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Multicore Programming: C++0x

The C/C++ Memory Model: Overview and Formalization

RELAXED CONSISTENCY 1

Load-reserve / Store-conditional on POWER and ARM

Overview: Memory Consistency

Hardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13

Foundations of the C++ Concurrency Memory Model

CS5460: Operating Systems

Library Abstraction for C/C++ Concurrency

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

A separation logic for a promising semantics

Reasoning About The Implementations Of Concurrency Abstractions On x86-tso. By Scott Owens, University of Cambridge.

Handout 3 Multiprocessor and thread level parallelism

Deterministic Parallel Programming

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016

GPU Concurrency: Weak Behaviours and Programming Assumptions

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Overhauling SC atomics in C11 and OpenCL

CSE502: Computer Architecture CSE 502: Computer Architecture

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Proving liveness. Alexey Gotsman IMDEA Software Institute

Lecture 12. Lecture 12: The IO Model & External Sorting

Shared Memory Consistency Models: A Tutorial

Relaxed Memory-Consistency Models

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Overhauling SC atomics in C11 and OpenCL

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Transaction Management and Concurrency Control. Chapter 16, 17

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Stability in Weak Memory Models

Overhauling SC Atomics in C11 and OpenCL

Concurrency Control. Chapter 17. Comp 521 Files and Databases Spring

Analyzing Snapshot Isolation

Goal of Concurrency Control. Concurrency Control. Example. Solution 1. Solution 2. Solution 3

Concurrency Control Serializable Schedules and Locking Protocols

A Concurrency Semantics for Relaxed Atomics that Permits Optimisation and Avoids Thin-Air Executions

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

Compositional Verification of Compiler Optimisations on Relaxed Memory

THÈSE DE DOCTORAT de l Université de recherche Paris Sciences Lettres PSL Research University

Common Compiler Optimisations are Invalid in the C11 Memory Model and what we can do about it

The Java Memory Model

C++ Concurrency - Formalised

arxiv: v2 [cs.pl] 9 Jul 2016

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Implementing the C11 memory model for ARM processors. Will Deacon February 2015

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

The Cache-Coherence Problem

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Lectures 8 & 9. Lectures 7 & 8: Transactions

TRANSACTION MANAGEMENT

Relaxed Separation Logic: A Program Logic for C11 Concurrency

Understanding POWER multiprocessors

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

A Revisionist History of Denotational Semantics

Pattern-based Synthesis of Synchronization for the C++ Memory Model

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Coherence and Consistency

An operational semantics for C/C++11 concurrency

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

GPS: Navigating Weak Memory with Ghosts, Protocols, and Separation

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Multicore Programming Java Memory Model

Lecture 21. Lecture 21: Concurrency & Locking

High-level languages

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Lecture 1: Conjunctive Queries

From IMP to Java. Andreas Lochbihler. parts based on work by Gerwin Klein and Tobias Nipkow ETH Zurich

arxiv: v2 [cs.pl] 30 Aug 2016

Page 1. Cache Coherence

Sequential Consistency & TSO. Subtitle

A Denotational Semantics for Weak Memory Concurrency

Memory Consistency Models

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Memory Consistency Models

Programming Language Memory Models: What do Shared Variables Mean?

Beyond Sequential Consistency: Relaxed Memory Models

Transaction Management: Introduction (Chap. 16)

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan

Transcription:

Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017

An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program execution (generalization of an execution trace) Map a program to a set of executions Define a consistency predicate on executions Semantics = set of consistent executions of a program Exception: catch-fire semantics Existence of at least one bad consistent execution implies undefined behavior.

Executions 3 Events Reads, Writes, Updates, Fences Relations Program order, po (also called sequenced-before, sb) Reads-from, rf W x 0 W y 0 W x 1 R x 1 rf po R y 0 rf R y 1 R x 0 rf W y 1

Executions 4 Definition (Label) A label has one of of the following forms: R x v r W x v w U(x v r v w ) F where x Loc and v r, v w Val. Definition (Event) An event is a triple id, i, l where id N is an event identifier, i Tid {0} is a thread identifier, and l is a label.

Executions 5 Definition (Execution graph) An execution graph is a tuple E, po, rf where: E is a finite set of events po ( program order ) is a partial order on E rf ( reads-from ) is a binary relation on E such that: For every w, r rf typ(w) {W, U} typ(r) {R, U} loc(w) = loc(r) valw (w) = val r (r) rf 1 is a function (that is: if w 1, r, w 2, r rf then w 1 = w 2 )

Some notations 6 Let G = E, po, rf be an execution graph. G.E = E G.po = po G.rf = rf G.R = {r E typ(r) = R typ(r) = U} G.W = {w E typ(w) = W typ(w) = U} G.RMW = {u E typ(u) = U} G.F = {f E typ(f ) = F} G.R x = G.R {r E loc(r) = x}...

Mapping programs to executions: Example 7 Store buffering (SB) x := 1 a := y x = y = 0 y := 1 b := x W x 0 W y 0 W x 0 W y 0 W x 0 W y 0 W x 1 W y 1 W x 1 W y 1 W x 1 W y 1 R y 0 R x 0 R y 1 R x 1 R y 0 R x 42

Mapping programs to executions: Definition 8 The thread subsystem associates a sequential execution graph to every command. A program execution is obtained by joining the sequential execution graphs of the constituent threads. Definition An execution graph G is called sequential if the following hold: tid(a) = 0 for every a G.E G.po is a total order on G.E G.rf =

From commands to sequential execution graphs 9 Initial execution graph: G - the empty graph silent c, s ε c, s c, s, G = c, s, G non-silent c, s l c, s l ε a = n, 0, l n {id(b) b G.E} c, s, G = c, s, Add(G, a) where Add(G, a) is the execution graph G given by: G.E = G.E {a} G.po = G.po (G.E {a}) G.rf = G.rf Definition (Execution graph of a command) G is a an execution graph of a command c with a final store s if c, s 0, G = skip, s, G.

Mapping programs to executions: Definition 10 Definition (Thread restriction) Given i Tid and an execution graph G, G i denotes the sequential execution graph obtained by restricting G to the events {a G.E tid(a) = i}, modifying their thread identifiers to 0, and discarding all rf-edges. Definition (Execution graph of a program) G is an execution graph of a program P (with an outcome O) if G i is an execution of P(i) (with final store O(i)) for every i Tid.

Consistency predicate 11 Let X be some consistency predicate (on execution graphs) Definition (Allowed outcome under a declarative model) An outcome O is allowed for a program P under X if there exists an execution graph G such that: G is an execution graph of P with outcome O. G is X-consistent. Exception: catch-fire semantics... or if there exists an execution graph G such that: G is an execution graph of P. G is X-consistent. G is bad.

Completeness 12 The most basic consistency condition: Definition (Completeness) An execution graph G is called complete if codom(g.rf) = G.R i.e., every read reads from some write.

Sequential consistency 13 the result of any execution is the same as if the operations of all the processors were executed in some sequential order, respecting the order specified by the program [Lamport, 1979]

Sequential consistency [Lamport] 14 Definition Let sc be a total order on G.E. G is called SC-consistent wrt sc if the following hold: If a, b G.po then a, b sc. If a, b G.rf then a, b sc and there does not exist c G.W loc(b) such that a, c sc and c, b sc. Definition An execution graph G is called SC-consistent if the following hold: G is complete. G is SC-consistent wrt some total order sc on G.E.

SB example 15 Store buffering (SB) x := 1 a := y x = y = 0 y := 1 b := x Allowed W x 0 W y 0 Forbidden W x 0 W y 0 W x 1 W y 1 W x 1 W y 1 R y 0 R x 1 R y 0 R x 0

Sequential consistency (Alternative) 16 Definition (Modification order (aka coherence order)) mo is called a modification order for an execution graph G if mo = x Loc mo x where each mo x is a total order on G.W x. Definition (Alternative SC definition) An execution graph G is called SC-consistent if the following hold: G is complete There exists a modification order mo for G such that G.po G.rf mo rb is acyclic where: rb = G.rf 1 ; mo \ id (from-reads / reads-before)

SB example 17 Store buffering (SB) x := 1 a := y x = y = 0 y := 1 b := x Allowed W x 0 W y 0 Forbidden W x 0 W y 0 W x 1 W y 1 W x 1 W y 1 R y 0 R x 1 R y 0 R x 0

Equivalence 18 Theorem The two SC definitions are equivalent. Proof (sketch). Lamport SC alternative SC: Take mo x = [W x ]; sc; [W x ]. Then, G.po G.rf mo rb sc. Alternative SC Lamport SC: Take sc to be any total order extending G.po G.rf mo rb.

Relaxing sequential consistency 19 SC is very expensive to implement in hardware. It also forbids various optimizations that are sound for sequential code. What most hardware guarantee and compilers preserve is SC-per-location (aka coherence). Definition An execution graph G is called coherent if the following hold: G is complete For every location x, there exists a total order sc x on all accesses to x such that: If a, b [RWx ]; G.po; [RW x ] then a, b sc x If a, b [Wx ]; G.rf; [R x ] then a, b sc x and there does not exist c G.W x such that a, c sc x and c, b sc x.

Alternative definition of coherence I 20 SC: po rf mo rb is acyclic COH: po loc rf mo rb is acyclic Definition Let mo be a modification order for an execution graph G. G is called coherent wrt mo if G.po loc G.rf mo rb is acyclic (where rb = G.rf 1 ; mo \ id). Theorem An execution graph G is coherent iff the following hold: G is complete G is coherent wrt some modification order mo for G.

Bad patterns I 21 R x rf po W x no-future-read a := x / 1 x := 1 rf U x rmw-1 r := CAS(x, 1, 1) / 1 Recall: W is either a write or an RMW. R is either a read or an RMW.

Bad patterns II 22 x := 1 x := 2 a := x / 2 a := x / 1 W x mo po W x coherence-ww W x mo W x rf po R x coherence-wr W x rf mo R x W x coherence-rw W x W x rf mo rf R x R x coherence-rr po po

Bad patterns III 23 W x mo rf rmw-2 U x mo mo W x W x U x rf atomicity In coherent executions, an RMW event may only read from its immediate mo-predecessor.

Alternative definition of coherence II 24 Theorem Let mo be a modification order for an execution graph G. G is coherent wrt mo iff the following hold: rf; po is irreflexive. mo; po is irreflexive. mo; rf; po is irreflexive. rf 1 ; mo; po is irreflexive. rf 1 ; mo; rf; po is irreflexive. rf is irreflexive. mo; rf is irreflexive. rf 1 ; mo; mo is irreflexive. (no-future-read) (coherence-ww) (coherence-rw) (coherence-wr) (coherence-rr) (rmw-1) (rmw-2) (rmw-atomicity)

Examples (aka litmus tests ) 25 Coherence test x := 1 a := x / 2 x = 0 x := 2 b := x / 1 Store buffering x := 1 a := y /0 x = y = 0 y := 1 b := x /0

Atomicity 26 Parallel increment x = 0 a := FAA(x, 1) b := FAA(x, 1) Guarantees that a = 1 b = 1. Can we implement locks in this semantics? Spinlock implementation lock(l) : r := 0; while r do r := CAS(l, 0, 1) unlock(l) : l := 0

Implementing locks? 27 Under COH, the spinlock implementation does not guarantee mutual exclusion. Spinlock implementation lock(l) : r := 0; while r do r := CAS(l, 0, 1) unlock(l) : l := 0 Lock example lock(l) x := 1 a := y /0 unlock(l) lock(l) y := 1 b := x /0 unlock(l)

Message passing More generally, COH is often too weak: x := 42; y := 1 x = y = 0 a := y; while a do a := y; b := x /0 x := 42; y := 1 x = y = 0 a := y; /1 b := x /0 MP is a common programming idiom. How can we disallow the weak behavior? 28

Supporting message passing 29 W x mo W x rf po R x coherence-wr W x mo W x rf (po rf) + R x coherence-wr Solution: Strengthen the notion of an observed write. In other words, make rf-edges synchronizing.

Release/acquire (RA) memory model 30 SC: po rf mo rb is acyclic COH: po loc rf mo rb is acyclic RA: (po rf) + loc mo rb is acyclic Definition Let mo be a modification order for an execution graph G. G is called RA-consistent wrt mo if (po rf) + loc mo rb is acyclic for some modification order mo for G (where rb = G.rf 1 ; mo \ id). Definition An execution graph G is RA-consistent if the following hold: G is complete G is RA-consistent wrt some modification order mo for G.

Alternative definition of RA consistency 31 Theorem Let mo be a modification order for an execution graph G. G is RA-consistent wrt mo iff the following hold: (po rf) + is irreflexive. mo; (po rf) + is irreflexive. rf 1 ; mo; (po rf) + is irreflexive. rf 1 ; mo; mo is irreflexive. (no-future-read) (coherence-ww) (coherence-wr) (rmw-atomicity)

Hardware implementation of RA 32 RA is cheaper to implement than SC, but some architectures still require some fences. On Power, a lightweight fence (lwsync) suffices, and RA is still cheaper than SC. Release write lwsync ; store Acquire read load ; lwsync (or load ; bc ; isync) ARMv7 is like Power, but has no lightweight fence. Release write dmb ; store Acquire read load ; dmb (or load ; bc ; isb) ARMv8 has special release/acquire accesses. Alternative: acquire read load ; dmb ld On TSO, no fences are needed. (See also exercise.)

Mixing the models 33 COH < RA < SC Revisit the MP idiom: x := 42 y := 1 a := y while a do a := y b := x /0 We only need the last read of y to synchronize. Idea: introduce access modes. x := rlx 42 y := rel 1 a := y rlx while a do a := y a := y acq b := x rlx /0

Happens-before 34 Each memory accesses has a mode: Reads: rlx or acq Writes: rlx or rel RMWs: rlx, acq, rel or acq-rel Strength order is given by (the transitive closure of): Synchronization: rlx acq rel acq-rel Happens-before: G.sw = [W rel ]; G.rf; [R acq ] G.hb = (G.po G.sw) +

Towards C/C++11 memory model 35 SC: po rf mo rb is acyclic COH: po loc rf mo rb is acyclic RA: (po rf) + loc mo rb is acyclic C11: hb loc rf mo rb is acyclic Definition Let mo be a modification order for an execution graph G. G is called C11-consistent wrt mo if hb loc rf mo rb is acyclic (where rb = G.rf 1 ; mo \ id). Definition An execution graph G is C11-consistent if the following hold: G is complete G is C11-consistent wrt some modification order mo for G.

The C/C++11 memory model 36 nonatomic relaxed release/ acquire sc The full C/C++11 is more general: Non-atomics for non-racy code (the default!) Four types of fences for fine grained control SC accesses to ensure sequential consistency if needed More elaborate definition of sw ( release sequences )

Summary 37 A declarative approach for (weak) concurrency semantics Uniformly and modularly handle various models Important weakenings of SC (coherence, RA) with alternative formulations based on bad patterns Introduction to the C/C++11 memory model

Exercise: Extended coherence order 38 Let G be a coherent execution wrt some modification order mo for G. Let eco = (rf mo rb) +. 1. Does eco totally order all accesses to a given location x? 2. Provide a simplification of eco that avoids the use of transitive closure.

Exercise: RA vs. TSO 39 Do RA and TSO have the same behaviors? 1. Construct a program with two threads that has different outcomes under TSO and RA. 2. Construct a program without write-write races that distinguishes the two models.

Exercise: Write-before 40 Let G be a complete execution without RMW events and let: wb = [W]; (po rf) + loc ; [W] ([W]; (po rf) + loc ; rf 1 ; [W]\id) 1. Show that G is RA-consistent iff wb is acyclic. 2. (Optional, difficult) Extend the definition of wb to work for executions with RMW events. 3. (Optional, difficult) Can one define an analogue wb relation in terms of just G, such that G is SC-consistent iff wb is acyclic?