New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

Similar documents
New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

The C/C++ Memory Model: Overview and Formalization

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Foundations of the C++ Concurrency Memory Model

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

RELAXED CONSISTENCY 1

C++ Memory Model. Don t believe everything you read (from shared memory)

Improving the Practicality of Transactional Memory

C11 Compiler Mappings: Exploration, Verification, and Counterexamples

1 Publishable Summary

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

Can Seqlocks Get Along with Programming Language Memory Models?

Relaxed Memory: The Specification Design Space

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016

Draft Specification of Transactional Language Constructs for C++

Taming release-acquire consistency

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Transactional Memory in C++ Hans-J. Boehm. Google and ISO C++ Concurrency Study Group chair ISO C++ Transactional Memory Study Group participant

Lecture 12 Transactional Memory

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

The C1x and C++11 concurrency model

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

Program logics for relaxed consistency

The Java Memory Model

Multicore Programming: C++0x

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Reasoning about the C/C++ weak memory model

Shared Memory Consistency Models: A Tutorial

Memory barriers in C

Chí Cao Minh 28 May 2008

Threads and Locks. Chapter Introduction Locks

An Update on Haskell H/STM 1

Conflict Detection and Validation Strategies for Software Transactional Memory

Linearizability of Persistent Memory Objects

CS5460: Operating Systems

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

University of Karlsruhe (TH)

Linearizability of Persistent Memory Objects

High-level languages

An introduction to weak memory consistency and the out-of-thin-air problem

11/19/2013. Imperative programs

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

Threads Cannot Be Implemented As a Library

The Java Memory Model

Declarative semantics for concurrency. 28 August 2017

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Failure-atomic Synchronization-free Regions

Mohamed M. Saad & Binoy Ravindran

[Potentially] Your first parallel application

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Summary: Issues / Open Questions:

C++ Concurrency - Formalised

Transactionalizing Legacy Code: An Experience Report Using GCC and Memcached. Trilok Vyas, Yujie Liu, and Michael Spear Lehigh University

Memory Consistency Models

Portland State University ECE 588/688. Memory Consistency Models

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

Lock vs. Lock-free Memory Project proposal

C++ Memory Model Tutorial

Understanding Hardware Transactional Memory

Don t Start with Dekker s Algorithm: Top-Down

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Lecture 25: Synchronization. James C. Hoe Department of ECE Carnegie Mellon University

Audience. Revising the Java Thread/Memory Model. Java Thread Specification. Revising the Thread Spec. Proposed Changes. When s the JSR?

Relaxed Memory-Consistency Models

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions

Scalable Software Transactional Memory for Chapel High-Productivity Language

The Multicore Transformation

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 5 Programming with Locks and Critical Sections

CS 179: GPU Programming LECTURE 5: GPU COMPUTE ARCHITECTURE FOR THE LAST TIME

DBT Tool. DBT Framework

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Speculative Synchronization

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Exploiting Distributed Software Transactional Memory

Kernel Synchronization I. Changwoo Min

Memory Consistency Models: Convergence At Last!

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

Distributed Operating Systems Memory Consistency

Programmazione di sistemi multicore

Other consistency models

Order Is A Lie. Are you sure you know how your code runs?

Repairing Sequential Consistency in C/C++11

6.852: Distributed Algorithms Fall, Class 20

Kotlin/Native concurrency model. nikolay

Enforcing Textual Alignment of

Scheduling Transactions in Replicated Distributed Transactional Memory

Advanced concurrent programming in Java Shared objects

Repairing Sequential Consistency in C/C++11

The New Java Technology Memory Model

Thread-Local. Lecture 27: Concurrency 3. Dealing with the Rest. Immutable. Whenever possible, don t share resources

Nondeterminism is Unavoidable, but Data Races are Pure Evil

EMBEDDED SYSTEMS PROGRAMMING More About Languages

SELF-TUNING HTM. Paolo Romano

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Safe Optimisations for Shared-Memory Concurrent Programs. Tomer Raz

Transcription:

New Programming Abstractions for Concurrency Red Hat 12/04/05 1

Concurrency and atomicity C++11 atomic types Transactional Memory Provide atomicity for concurrent accesses by different threads Both based on C++11 memory model Single memory location Low-level abstraction, exposes HW primitives Any number of memory locations High-level abstraction, mixed SW/HW runtime support Talk s focus is on C++ but C11 has (very) similar support 2

Atomic types and accesses Making a type T atomic: atomic<t> Load, store: atomic<int> a; a = a + 1; a.store(a.load() + 1); CAS and other atomic read-modify-write: int exp = 0; a.compare_exchange_strong(exp, 1); previous = a.fetch_add(23); Sequential consistency is default All s-c ops in total order that is consistent with per-thread program orders Other weaker memory orders can be specified locked_flag.store(false, memory_order_release); Orders: acquire, acq_rel, release, relaxed, seq_cst, consume 3

Why a memory model? Defines multi-threaded executions (undefined pre C++11) Normal, nonatomic memory accesses Ordering of all operations enforced by atomic/synchronizing memory accesses Common ground for programmers and compilers Formalizations of the model exist (Batty et al. [1]) Base for testing tools, compiler testing, verification,... Unified abstraction for HW memory models Portable concurrent code (across HW and compilers) Simpler than several HW memory models 4

Happens-before (HB) Order of operations in a particular execution of a program Derived from / related to other relations: Sequenced-before (SB): single-thread program order Reads-from: which store op s value a load op reads Synchronizes with (SW) Example: acquire-load reads from release-store (both atomic) Total orders for seq_cst operations, lock acquisition/release Simplified: HB = transitive closure of SB U SW Compiler generates code that ensures some valid HB: Must be acyclic and consistent with all other relations/rules Generated code ensures HB on top of HW memory model 5

Data-race freedom (DRF) Data race: Nonatomic accesses, same location, at least one a store, not ordered by HB Any valid execution has a data race? => Undefined behavior Programs must be DRF Allows compiler to optimize Compiler must preserve DRF Access granularity (e.g., bitfields) Speculative stores, reordering, hoisting,... 6

Example: Programmer s POV Thread1 Publication: init(data); data_public.store(true, mo_release); Thread2 +SB reads-from + rel/acq = sync-with if (data_public.load(mo_acquire)) use(data); +SB =init happens-before use Programmers must beware of data races: temp = data; if (data_public.load(mo_acquire)) use(temp); Races with init Program behavior is undefined 7

Example: Can the compiler hoist a load? Can we load data earlier than the conditional? if (data_public.load(mo_acquire)) use(data); No! Introduces data race, defeats acquire HW barrier. if (data_public.load(mo_relaxed)) use(data); Yes! mo_relaxed doesn t contribute to happens-before. if (data_public.load(mo_acquire)) use(data); use_again(data); Yes!? data always loaded, nonatomic, no ordering by other sync. Can assume DRF program, so no concurrent write. 8

GCC status Atomics: efficient code generated on the major archs New atomic* builtins (replace sync*) Frontend: C++ works, C11 atomics are WIP libatomic: library fallback for non-native atomic ops Issues: Back-off in CAS loops? CAS_strong on LL/SC archs vs. C++ progress guarantees libatomic: Don t use 2-word CAS for 2-word atomics? Memory model: seems to work, more or less Need to audit GCC passes Is optimizing across atomics worthwhile? Need comprehensive testing to prevent future regressions 9

Transactional Memory (TM): What is it? TM is a programming abstraction Declare that several actions are atomic But don t have to implement how this is achieved TM implementations Are generic, not application-specific Several implementation possibilities: STM: Pure SW TM algorithms Blocking or nonblocking, fine- or coarse-granular locking,... HTM/HyTM: using additional HW support for TM E.g., Intel TSX, AMD ASF 10

Transactional language constructs for C/C++ Declare that compound statements, expressions, or functions must execute atomically transaction_atomic { if (x < 10) y++; } No data annotations or special data types required Existing (sequential) code can be used in transactions Language-level txns are a portable interface for HTM/STM HTM support can be delivered by a TM runtime library update Draft specification for C++ [2] HP, IBM, Intel, Oracle, Red Hat C++ standard study group on TM (SG5) C will be similar (GCC supports txns in C and C++) Feedback welcome! 11

How to synchronize with transactions? TM extends the C++11 memory model All transactions totally ordered Order contributes to Happens-Before (HB) TM implementation ensures some valid order that is consistent with HB Does not imply sequential execution! Data-race freedom still required init(data); transaction_atomic { data_public = true; } Correct: transaction_atomic { if (data_public) use(data); } Incorrect: transaction_atomic { temp = data; // Data race if (data_public) use(temp); } 12

Atomic vs. relaxed transactions Atomic Relaxed Atomic wrt.: All other code Only other transactions Restrictions on txnal code: No other synchronization (conservative, WIP) None Keyword: transaction_atomic transaction_relaxed Restrictions/safety of atomic txns checked at compile time Compiler analyzes code Additional function attributes to deal with multiple CUs WIP: dynamic checking at runtime instead of static checking (optional) 13

TM supports a modular programming model Programmers don t need to manage association between shared data and synchronization metadata (e.g., locks) TM implementation takes care of that Functions containing only txnal sync compose w/o deadlock, nesting order does not matter Example: void move(list& l1, list& l2, element e) { if (l1.remove(e)) l2.insert(e); } TM: transaction_atomic { move(a, B, 23); } Locks:? Early university user studies [3,4] suggest that txns lead to simpler programs with fewer errors compared to locking 14

GCC status Runs common TM benchmarks correctly Compiler: most of TM spec implemented Publication safety not guaranteed Need to restrict reordering across conditionals Uninstrumented code path not yet created TM type annotations not checked when casting libitm: general-purpose TM algorithms Common ABI with ICC Some divergence since most recent version of the spec Performance tuning (problem: lack of real world usage) Use HTMs that become available We need your workloads and use cases! 15

References [1] http://www.cl.cam.ac.uk/~pes20/cpp [2] https://sites.google.com/site/tmforcplusplus/ [3] Pankratius & Adl-Tabatabai, A study of transactional memory vs. locks in practice, in SPAA 2011 [4] Rossbach et al., Is Transactional Programming Actually Easier?, in PPoPP 2010 16