Efficient waiting for concurrent programs

Similar documents
Efficient concurrent waiting for C++20

Adding std::*_semaphore to the atomics clause.

Doc number: P0126R2 Revises: P0126R1, N4195 Date: Project: Reply-to: Thanks-to: A simplifying abstraction

Enhancing std::atomic_flag for waiting.

A jthread automatically signals an interrupt at the end of its lifetime to the started thread (if still joinable) and joins:

A Cooperatively Interruptible Joining Thread, Rev 4 New in R4

Interrupt Tokens and a Joining Thread, Rev 7 New in R7

Apply the following edits to N4741, the working draft of the Standard. The feature test macros cpp_lib_latch and cpp_lib_barrier should be created.

Revised Latches and Barriers for C++20

async and ~future (Revision 3)

A Cooperatively Interruptible Joining Thread, Rev 2

Proposed Wording for Concurrent Data Structures: Hazard Pointer and Read Copy Update (RCU)

Design Patterns in C++

Fixing Atomic Initialization, Rev1

Atomic maximum/minimum

Proposed Wording for Concurrent Data Structures: Hazard Pointer and Read Copy Update (RCU)

Supporting async use-cases for interrupt_token

Concurrent programming in C++11

The C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting

Multi-threaded Programming in C++

Synchronization COMPSCI 386

1 Process Coordination

Distributed Real-Time Control Systems. Lecture 21 Condition Variables Typical ConcurrencyProblems

Variant: a type-safe union without undefined behavior (v2).

Synchronization Spinlocks - Semaphores

Synchronization for Concurrent Tasks

Models of concurrency & synchronization algorithms

The University of Texas at Arlington

Structural Support for C++ Concurrency

Concurrency. On multiprocessors, several threads can execute simultaneously, one on each processor.

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

Real Time Operating Systems and Middleware

Chenyu Zheng. CSCI 5828 Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder

Concurrency: a crash course

Variant: a typesafe union. ISO/IEC JTC1 SC22 WG21 N4218

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018

The University of Texas at Arlington

IT 540 Operating Systems ECE519 Advanced Operating Systems

Introduction to Real-Time Operating Systems

Synchronization 1. Synchronization

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

Lecture 8: September 30

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Chapter 6 Concurrency: Deadlock and Starvation

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University

Synchronization. CS 475, Spring 2018 Concurrent & Distributed Systems

Week 3. Locks & Semaphores

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

Process Synchronization (Part I)

CS420: Operating Systems. Process Synchronization

EE458 - Embedded Systems Lecture 8 Semaphores

What's wrong with Semaphores?

HY 345 Operating Systems

CSE 120 Principles of Operating Systems Spring 2016

Programming in Parallel COMP755

HY345 - Operating Systems

Recall from Chapter 1 that a data race is a failure to correctly implement critical sections for non-atomic shared variable accesses.

C++ Memory Model. Don t believe everything you read (from shared memory)

Working Draft, Technical Specification for C++ Extensions for Parallelism

Synchronization 1. Synchronization

Midterm on next week Tuesday May 4. CS 361 Concurrent programming Drexel University Fall 2004 Lecture 9

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Document Number: P0429R4 Date: Reply to: 0.1 Revisions... 1

Faculty of Computers & Information Computer Science Department

Multithreading done right? Rainer Grimm: Multithreading done right?

Synchronization II: EventBarrier, Monitor, and a Semaphore. COMPSCI210 Recitation 4th Mar 2013 Vamsi Thummala

Lecture 5 Threads and Pthreads II

UMTX OP (2) FreeBSD System Calls Manual UMTX OP (2)

Monitors & Condition Synchronization

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

ENGR 3950U / CSCI 3020U UOIT, Fall 2012 Quiz on Process Synchronization SOLUTIONS

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

CSE 120. Fall Lecture 6: Semaphores. Keith Marzullo

C++ Concurrency in Action

COMP6771 Advanced C++ Programming

Chapter 6: Process Synchronization. Module 6: Process Synchronization

Prof. Dr. Hasan Hüseyin

Chapter 6: Process Synchronization

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

CS 153 Design of Operating Systems Winter 2016

Java Semaphore (Part 1)

A polymorphic value-type for C++

Synchronization I. Jo, Heeseung

The Java Memory Model

Qualified std::function signatures

Synchronization Mechanisms

Dealing with Issues for Interprocess Communication

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Splicing Maps and Sets (Revision 1)

Source-Code Information Capture

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

Deadlock and Monitors. CS439: Principles of Computer Systems February 7, 2018

IV. Process Synchronisation

CSCI 447 Operating Systems Filip Jagodzinski

Last Class: Synchronization

Semaphores. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Synchronization and Semaphores. Copyright : University of Illinois CS 241 Staff 1

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Transcription:

Doc number: P0514R2 Revises: P0514R0-1, P0126R0-2, and N4195 Date: 2017-10-09 Project: Programming Language C++, Concurrency Working Group Reply-to: Olivier Giroux <ogiroux@nvidia.com> Efficient waiting for concurrent programs The current atomic objects make it easy to implement inefficient blocking synchronization in C++, due to lack of support for waiting in a more efficient way than polling. One problem that results, is poor system performance under oversubscription and/or contention. Another is high energy consumption under contention, regardless of oversubscription. The current atomic_flag object does nothing to help with this problem, despite its name that suggests it is suitable for this use. Its interface is tightly-fitted to the demands of the simplest spinlocks without contention or energy mitigation beyond what timed back-off can achieve. We propose to create new specialized atomic operations, and thread synchronization object types, that likely replace atomic_flag in practice. A simple abstraction for scalable waiting Semaphores are lightweight synchronization primitives that control concurrent access to a shared resource. A binary semaphore, then, is analogous to a mutex with no thread ownership semantics. This concept is behind our new proposed type: std::binary_semaphore. Objects of class binary_semaphore are easily adapted to serve the role of a mutex: struct semaphore_mutex { void lock() { s.acquire(); void unlock() { s.release(); private: std::binary_semaphore s(1); ; A counting semaphore type is also proposed alongside: std::counting_semaphore, to regulate shared access to a resource that is not mutually-exclusive but bounded by a maximum degree of concurrency. Moving beyond new semaphore types, we propose atomic free functions that enable pre-existing algorithms expressed in terms of atomics to benefit from the same efficient support behind semaphores: struct simple_lock { void lock() { bool old;

while(!b.compare_exchange_weak(old = false, true)) std::atomic_wait(&b, true); void unlock() { b = false; std::atomic_notify_one(&b); private: std::atomic<bool> b = ATOMIC_VAR_INIT(false); ; Note that in high-quality implementations this necessitates a semaphore table owned by the implementation, which causes some unavoidable interference due to aliasing of unrelated atomic updates. For greater control over this sort of interference, we introduce the final type in this proposal: class condition_variable_atomic. With this last facility, we can manage false sharing of synchronization state and achieve higher performance: struct improved_simple_lock { void lock() { bool old; while(!b.compare_exchange_weak(old = false, true)) s.wait(&b, true); void unlock() { b = false; s.notify_one(&b); private: std::atomic<bool> b = ATOMIC_VAR_INIT(false); std::condition_variable_atomic s; ; Reference implementation It s here - https://github.com/ogiroux/semaphore. Please see P0514R0, P0514R1, P0126 and N4195 for additional analysis not repeated here.

C++ Proposed Wording Apply the following edits to N4687, the working draft of the Standard. The feature test macro cpp_lib_semaphore should be added. Modify 32.2 Header <atomic> synopsis // 32.9, fences extern "C" void atomic_thread_fence(memory_order) noexcept; extern "C" void atomic_signal_fence(memory_order) noexcept; [atomics.syn] // 32.10, waiting and notifying functions void atomic_notify_one(const volatile atomic<t>*); void atomic_notify_one(const atomic<t>*); void atomic_notify_all(const volatile atomic<t>*); void atomic_notify_all(const atomic<t>*); void atomic_wait_explicit(const volatile atomic<t>*, typename atomic<t>::value_type, memory_order); void atomic_wait_explicit(const atomic<t>*, typename atomic<t>::value_type, memory_order); void atomic_wait(const volatile atomic<t>*, typename atomic<t>::value_type); void atomic_wait(const atomic<t>*, typename atomic<t>::value_type); Add 32.10 Waiting and notifying functions [atomics.waitnotify] 1 This section provides a mechanism to wait for the value of an atomic object to change more efficiently than can be achieved with polling. Waiting functions in this facility may block until they are unblocked by notifying functions, according to each function s effects. [Note: Programs are not guaranteed to observe transient atomic values, an issue known as the A-B-A problem, resulting in continued blocking if a condition is only temporarily met. End Note.] 2 The functions atomic_wait and atomic_wait_explicit are waiting functions. The functions atomic_notify_one and atomic_notify_all are notifying functions. void atomic_notify_one(const volatile atomic<t>* object); void atomic_notify_one(const atomic<t>* object);

3 Effects: unblocks up to one thread that blocked after observing the result of an atomic operation X, if there exists another atomic operation Y, such that X precedes Y in the modification order of *object, and Y happensbefore this call. void atomic_notify_all(const volatile atomic<t>* object); void atomic_notify_all(const atomic<t>* object); 4 Effects: unblocks each thread that blocked after observing the result of an atomic operation X, if there exists another atomic operation Y, such that X precedes Y in the modification order of *object, and Y happensbefore this call. void atomic_wait_explicit(const volatile atomic<t>* object, typename atomic<t>::value_type old, memory_order order); void atomic_wait_explicit(const atomic<t>* object, typename atomic<t>::value_type old, memory_order order); 5 Requires: The order argument shall not be memory_order_release nor memory_order_acq_rel. 6 Effects: Repeatedly performs the following steps, in order: 1. Evaluates object->load(order)!= old then, if the result is true, returns. 2. Blocks until an implementation-defined condition has been met. [Note: consequently, it may unblock for reasons other than a call to a notifying function. - end note] void atomic_wait(const volatile atomic<t>* object, typename atomic<t>::value_type old); void atomic_wait(const atomic<t>* object, typename atomic<t>::value_type old); 7 Effects: Equivalent to: atomic_wait_explicit(object, old, memory_order_seq_cst); Modify 33.1 General Table 140 Thread support library summary Subclause Header(s) [thread.general] 33.2 Requirements 33.3 Threads <thread> 33.4 Mutual exclusion <mutex> <shared_mutex> 33.5 Condition variables <condition_variable> 33.6 Futures <future> 33.7 Semaphores <semaphore> Modify 33.5 Condition variables [thread.condition]

1 Condition variables provide synchronization primitives used to block a thread until notified by some other thread that some condition is met or until a system time is reached. Class condition_variable provides a condition variable that can only wait on an object of type unique_lock<mutex>, allowing maximum efficiency on some platforms. Class condition_variable_any provides a general condition variable that can wait on objects of user-supplied lock types. Class condition_variable_atomic provides a specialized condition variable that evaluates predicates over a single object of class atomic<t>, without using a lock. 2 Condition variables permit concurrent invocation of the wait, wait_for, wait_until, notify_one and notify_all member functions. 3 The execution of notify_one and notify_all shall be atomic. The execution of wait, wait_for, and wait_until shall be performed in up to three atomic parts: 1. the release of the any user-supplied lock mutex, or the evaluation of a predicate over an object of class atomic<t>, and entry into the waiting state; 2. the unblocking of the wait; and 3. the reacquisition of the any user-supplied lock. Modify 33.5.1 Header <condition_variable> synopsis namespace std { class condition_variable; class condition_variable_any; class condition_vatiable_atomic; void notify_all_at_thread_exit(condition_variable& cond, unique_lock<mutex> lk); enum class cv_status { no_timeout, timeout ; [condition_variable.syn] Add 33.5.5 Class condition_variable_atomic [thread.condition.condvaratomic] 1 Class condition_variable_atomic is used with an object of class atomic<t>, without the need to hold a lock. It is unspecified whether operations on class condition_variable_atomic are lock-free. 2 The member functions wait, wait_for, and wait_until are waiting functions. The member functions notify_one and notify_all are notifying functions. namespace std { class condition_variable_atomic { public: condition_variable_atomic(); ~condition_variable_atomic(); condition_variable_atomic(const condition_variable_atomic&) = delete; condition_variable_atomic& operator=(const condition_variable_atomic&) = delete; void notify_one(const atomic<t>&) noexcept; void notify_one(const volatile atomic<t>&) noexcept; void notify_all(const atomic<t>&) noexcept; void notify_all(const volatile atomic<t>&) noexcept; void wait(const volatile atomic<t>&, typename atomic<t>::value_type,

void wait(const atomic<t>&, typename atomic<t>::value_type, template <class T, class Predicate> void wait(const volatile atomic<t>&, Predicate pred, template <class T, class Predicate> void wait(const atomic<t>&, Predicate pred, template <class T, class Clock, class Duration> bool wait_until(const volatile atomic<t>&, typename atomic<t>::value_type, chrono::time_point<clock, Duration> const&, template <class T, class Clock, class Duration> bool wait_until(const atomic<t>&, typename atomic<t>::value_type, chrono::time_point<clock, Duration> const&, template <class T, class Predicate, class Clock, class Duration> bool wait_until(const volatile atomic<t>&, Predicate pred, chrono::time_point<clock, Duration> const&, template <class T, class Predicate, class Clock, class Duration> bool wait_until(const atomic<t>&, Predicate pred, chrono::time_point<clock, Duration> const&, template <class T, class Rep, class Period> bool wait_for(const volatile atomic<t>&, typename atomic<t>::value_type, chrono::duration<rep, Period> const&, template <class T, class Rep, class Period> bool wait_for(const atomic<t>&, typename atomic<t>::value_type, chrono::duration<rep, Period> const&, template <class T, class Predicate, class Rep, class Period> bool wait_for(const volatile atomic<t>&, Predicate pred, chrono::duration<rep, Period> const&, template <class T, class Predicate, class Rep, class Period> bool wait_for(const atomic<t>&, Predicate pred, chrono::duration<rep, Period> const&, ; condition_variable_atomic(); 1 Effects: Constructs an object of type condition_variable_atomic. 2 Throws: system_error when an exception is required (33.2.2). 3 Error conditions: resource_unavailable_try_again if some non-memory resource limitation prevents initialization. ~condition_variable_atomic(); 4 Requires: For every function call that blocks on *this, a function call that will cause it to unblock and return shall happen before this call. [Note: This relaxes the usual rules, which would have required all wait calls to happen before destruction. end note ] 5 Effects: Destroys the object.

void notify_one(const volatile atomic<t>& object) noexcept; void notify_one(const atomic<t>& object) noexcept; 6 Effects: If any threads are blocked on *this and object, unblocks one of those threads. void notify_all(const volatile atomic<t>& object) noexcept; void notify_all(const atomic<t>& object) noexcept; 7 Effects: Unblocks all threads that are blocked on *this and object. void wait(const volatile atomic<t>& object, typename atomic<t>::value_type old, void wait(const atomic<t>& object, typename atomic<t>::value_type old, template <class T, class Predicate> void wait(const volatile atomic<t>& object, Predicate pred, template <class T, class Predicate> void wait(const atomic<t>& object, Predicate pred, 8 Effects: Repeatedly performs the following steps, in order: a) For the overloads that take Predicate, evaluate pred(object.load(order)), and for the other, evaluate object.load(order)!= old. If the result is true, returns. b) Blocks on *this and object until an implementation-defined condition has been met. [Note: consequently, it may unblock for reasons other than a call to a notifying function. - end note] template <class T, class Clock, class Duration> bool wait_until(const volatile atomic<t>& object, typename atomic<t>::value_type old, chrono::time_point<clock, Duration> const& abs_time, template <class T, class Clock, class Duration> bool wait_until(const atomic<t>& object, typename atomic<t>::value_type old, chrono::time_point<clock, Duration> const& abs_time, template <class T, class Predicate, class Clock, class Duration> bool wait_until(const volatile atomic<t>& object, Predicate pred, chrono::time_point<clock, Duration> const& abs_time, template <class T, class Predicate, class Clock, class Duration> bool wait_until(const atomic<t>& object, Predicate pred, chrono::time_point<clock, Duration> const& abs_time, template <class T, class Rep, class Period> bool wait_for(const volatile atomic<t>& object, typename atomic<t>::value_type old, chrono::duration<rep, Period> const& rel_time, template <class T, class Rep, class Period> bool wait_for(const atomic<t>& object, typename atomic<t>::value_type old, chrono::duration<rep, Period> const& rel_time, template <class T, class Predicate, class Rep, class Period> bool wait_for(const volatile atomic<t>& object, Predicate pred, chrono::duration<rep, Period> const& rel_time,

template <class T, class Predicate, class Rep, class Period> bool wait_for(const atomic<t>& object, Predicate pred, chrono::duration<rep, Period> const& rel_time, 9 Effects: Repeatedly performs the following steps, in order: a) For the overloads that take Predicate, evaluate pred(object.load(order)), and for the other, evaluate object.load(order)!= old. If the result is true, or with low probability if the result is false, returns the result. b) Blocks on *this and object until the timeout expires or an implementation-defined condition has been met. If the timeout expired, returns false. [Note: consequently, it may unblock for reasons other than a call to a notifying function. - end note] 10 Throws: Timeout-related exceptions (33.2.4). Add 33.7 Semaphores [thread.semaphores] 1 Semaphores are lightweight synchronization primitives that control concurrent access to a shared resource. They are widely used to implement other synchronization primitives and, whenever both are applicable, may be more efficient than condition variables. Class counting_semaphore models a non-negative resource count. Class binary_semaphore has only two states, also known as available and unavailable, and may be even more efficient than class counting_semaphore. 2 For purposes of determining the existence of a data race, all member functions of binary_semaphore and counting_semaphore (other than construction and destruction) behave as atomic operations on *this. Add 33.7.1 Header <semaphore> synopsis namespace std { class binary_semaphore; class counting_semaphore; Add 33.7.2 Class binary_semaphore namespace std { class binary_semaphore { public: using count_type = implementation-defined; // see 33.7.2.1 static constexpr count_type max = 1; [semaphore.syn]: [semaphore.binary]: binary_semaphore(count_type = 0); ~binary_semaphore(); binary_semaphore(const binary_semaphore&) = delete; binary_semaphore& operator=(const binary_semaphore&) = delete; void release(); void acquire(); bool try_acquire(); template <class Clock, class Duration> bool try_acquire_until(chrono::time_point<clock, Duration> const&); template <class Rep, class Period> bool try_acquire_for(chrono::duration<rep, Period> const&); private: count_type counter; // exposition only ;

using count_type = implementation-defined; 1 An integral type able to represent any value of type int between zero and max, inclusive. static constexpr count_type max = 1; 2 The maximum value that the semaphore can hold. constexpr binary_semaphore(count_type desired = 0); 3 Requires: desired is not negative, and no greater than max. 4 Effects: Initializes counter with the value desired. ~binary_semaphore(); 5 Requires: For every function call that blocks on counter, a function call that will cause it to unblock and return shall happen before this call. [Note: This relaxes the usual rules, which would have required all wait calls to happen before destruction. end note ] 6 Effects: Destroys the object. void release(); 7 Requires: counter < max. 8 Effects: Atomically increments counter by 1 then, if any threads are blocked on counter, unblocks at least one among them. 9 Synchronization: Synchronizes with invocations of try_acquire() that observe the result of the effects. bool try_acquire(); 10 Effects: Atomically, subtracts 1 from counter then, if the result is positive or zero, updates counter with the result. An implementation may spuriously fail to replace the value if there are contending invocations in other threads. 11 Returns: true if the value was replaced, otherwise false. void acquire(); 12 Effects: Repeatedly performs the following steps, in order: a) Evaluates try_acquire() then, if the result is true, returns. b) Blocks until counter >= 1. template <class Clock, class Duration> bool try_acquire_until(chrono::time_point<clock, Duration> const& abs_time); template <class Rep, class Period> bool try_wait_for(chrono::duration<rep, Period> const& rel_time); 13 Effects: Repeatedly performs the following steps, in order:

a) Evaluates try_acquire(). If the result is true, returns true. b) Blocks until the timeout expires or counter >= 1. If the timeout expired, returns false. 11 Throws: Timeout-related exceptions (33.2.4). Add 33.7.3 Class counting_semaphore namespace std { class counting_semaphore { public: using count_type = implementation-defined; // see 33.7.3.1 static constexpr count_type max = implementation-defined; // see 33.7.3.2 counting_semaphore(count_type = 0); ~counting_semaphore(); counting_semaphore(const counting_semaphore&) = delete; counting_semaphore& operator=(const counting_semaphore&) = delete; [semaphore.counting]: void release(count_type = 1); void acquire(); bool try_acquire(); template <class Clock, class Duration> bool try_acquire_until(chrono::time_point<clock, Duration> const&); template <class Rep, class Period> bool try_acquire_for(chrono::duration<rep, Period> const&); private: count_type counter; // exposition only ; using count_type = implementation-defined; 14 An integral type able to represent any value of type int between zero and max, inclusive. static constexpr count_type max = implementation-defined; 15 The maximum value that the semaphore can hold. [Note: max should be at least as large as the maximum number of threads the implementation can support. - end note] constexpr counting_semaphore(count_type desired = 0); 16 Requires: desired is not negative, and no greater than max. 17 Effects: Initializes counter with the value desired. ~counting_semaphore(); 18 Requires: For every function call that blocks on *this, a function call that will cause it to unblock and return shall happen before this call. [Note: This relaxes the usual rules, which would have required all wait calls to happen before destruction. end note ] 19 Effects: Destroys the object. void release(count_type update = 1);

20 Requires: update > 0, and counter + update <= max. 21 Effects: Atomically increments the counter by update. If any threads are blocked on counter, unblocks at least update among them. 22 Synchronization: Synchronizes with invocations of try_acquire() that observe the result of the effects. bool try_acquire(); 23 Effects: Atomically, decrements counter by 1 then, if the result is positive or zero, updates counter with the result. An implementation may spuriously fail to replace the value if there are contending invocations in other threads. 24 Returns: true if the value was replaced, otherwise false. void acquire(); 25 Effects: Repeatedly performs the following steps, in order: c) Evaluates try_acquire(). If the result is true, returns. d) Blocks until counter >= 1. template <class Clock, class Duration> bool try_acquire_until(chrono::time_point<clock, Duration> const& abs_time); template <class Rep, class Period> bool try_wait_for(chrono::duration<rep, Period> const& rel_time); 26 Effects: Repeatedly performs the following steps, in order: c) Evaluates try_acquire(). If the result is true, returns true. d) Blocks until the timeout expires or counter >= 1. If the timeout expired, returns false. 27 Throws: Timeout-related exceptions (33.2.4).