C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar

Similar documents
The C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

Chenyu Zheng. CSCI 5828 Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder

Memory Consistency Models

Distributed Operating Systems Memory Consistency

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Enhancing std::atomic_flag for waiting.

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

Overview: Memory Consistency

Memory Consistency Models. CSE 451 James Bornholt

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

Relaxed Memory Consistency

SPIN, PETERSON AND BAKERY LOCKS

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

Memory Models for C/C++ Programmers

Relaxed Memory-Consistency Models

Advanced concurrent programming in Java Shared objects

Memory barriers in C

Implementing the C11 memory model for ARM processors. Will Deacon February 2015

C++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Synchronization SPL/2010 SPL/20 1

Parallel Numerical Algorithms

Beyond Sequential Consistency: Relaxed Memory Models

CS5460: Operating Systems

Lock-Free, Wait-Free and Multi-core Programming. Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap

Example: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54

Foundations of the C++ Concurrency Memory Model

HSA MEMORY MODEL HOT CHIPS TUTORIAL - AUGUST 2013 BENEDICT R GASTER

CMSC 433 Programming Language Technologies and Paradigms. Synchronization

Language- Level Memory Models

P1202R1: Asymmetric Fences

Order Is A Lie. Are you sure you know how your code runs?

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Multiprocessor Synchronization

Multithreading Deep Dive. my twitter. a deep investigation on how.net multithreading primitives map to hardware and Windows Kernel.

Using Relaxed Consistency Models

The C/C++ Memory Model: Overview and Formalization

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Audience. Revising the Java Thread/Memory Model. Java Thread Specification. Revising the Thread Spec. Proposed Changes. When s the JSR?

Portland State University ECE 588/688. Memory Consistency Models

G52CON: Concepts of Concurrency

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

SHARED-MEMORY COMMUNICATION

Advanced OpenMP. Memory model, flush and atomics

A brief introduction to OpenMP

11/19/2013. Imperative programs

G52CON: Concepts of Concurrency

Weak Memory Models: an Operational Theory

Other consistency models

Memory model for multithreaded C++: August 2005 status update

Atomic operations in C

Manchester University Transactions for Scala

Sharing Objects Ch. 3

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

C++ Memory Model. Don t believe everything you read (from shared memory)

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

7/6/2015. Motivation & examples Threads, shared memory, & synchronization. Imperative programs

Concurrent Programming Lecture 10

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

Motivation & examples Threads, shared memory, & synchronization

Concurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Programming Languages

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Review of last lecture. Goals of this lecture. DPHPC Overview. Lock-based queue. Lock-based queue

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

Shared Memory Consistency Models: A Tutorial

Nondeterminism is Unavoidable, but Data Races are Pure Evil

C++ Memory Model Tutorial

Multiprocessors and Locking

Copyright Khronos Group Page 1. OpenCL BOF SIGGRAPH 2013

Adding std::*_semaphore to the atomics clause.

The Java Memory Model

GPU Concurrency: Weak Behaviours and Programming Assumptions

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

The New Java Technology Memory Model

OpenMP dynamic loops. Paolo Burgio.

Martin Kruliš, v

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

CSE 160 Lecture 7. C++11 threads C++11 memory model

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Cost of Your Programs

Overhauling SC atomics in C11 and OpenCL

Introduction to Parallel Computing

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler Barriers

Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

Outline. 1 Processes and Threads. 2 Synchronization. 3 Memory Management 1 / 45

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Parallel Algorithm Engineering

Concurrency. Johan Montelius KTH

The Art of Parallel Processing

CS533 Concepts of Operating Systems. Jonathan Walpole

Efficient waiting for concurrent programs

<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016

Overview of Lecture 4. Memory Models, Atomicity & Performance. Ben-Ari Concurrency Model. Dekker s Algorithm 4

Memory Consistency Models

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

Transcription:

C++ 11 Memory Gerstenberg NUMA Seminar

Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion Chart 2

Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion Chart 3

Sequential Consistency "... the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. -Leslie Lamport Chart 4

Sequential Consistency - Ordering Maintaining program order among operations on individual processors Dekkers Algorithm: P1 P2 Flag1 = 1; Flag2 = 1; If(Flag2 == 0) if(flag1 == 0) critical section critical section Chart 5

Sequential Consistency - Atomicity Maintaining a single sequential order among operations of all processors A = B = C = D = 0 P1 P2 P3 P4 A = 1; A = 2; while(b!=1){;} while(b!=1){;} B = 1; C = 1; while(c!=1){;} while(c!=1){;} print(a); print(a); Chart 6

Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion Chart 7

Violation in UMA Systems P1 P2 Flag1 = 1; Flag2 = 1; If(Flag2 == 0) if(flag1 == 0) critical section critical section Chart 8 http://www.hpl.hp.com/techreports/compaq-dec/wrl-95-7.pdf

Violation in NUMA Systems P1 P2 Data = 1000; while(!head) {;} Head = 1; work on data Chart 9 http://www.hpl.hp.com/techreports/compaq-dec/wrl-95-7.pdf

Compiler Dekkers Algorithm, g++ -O2, read and write are switched Chart 10

Out of Order Execution Processor avoids being idle by executing instructions out of order Weak Memory Model (PowerPC, ARM) may reorder any instructions exception: data dependency ordering: x = 1; x = 1; y = 2; y = x; may be reordered may not be reordered Strong Memory Model (X86, SPARC) stricter rules apply to reordering (x86 allows only store-load reordering) Chart 11

Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion Chart 12

C++ 11 std::atomic Strictly enforces Sequential Consistency (default) by giving three guarantees: Operations on std::atomic is atomic No instruction reordering past std::atomic operations No out-of-order execution of std::atomic operations Similar to Java & C# volatile keyword (not similar to C++ volatile!) Chart 13

C++ 11 std::atomic header <atomic> Template load, store, compare_exchange operations allow a specific memory order sequential consistency by default Specialization for integral types (int, char, bool ) specialized instructions (and operator overloading) for integral types fetch_add/sub (+=, -=) fetch_and/or/xor (&=, =, ^=) operator++/-- Chart 14

C++ 11 std::atomic - assembler Chart 15

C++ 11 std::atomic - assembler Chart 16

C++ 11 std::atomic - assembler Chart 17

C++ 11 Atomics Memory Ordering Different memory models can be applied to specific operations memory_order_seq_cst: default enforces sequential consistency memory_order_acquire: load only (needs associated release) all writes before release are visible side effects after this operation memory_order_release: store only (needs associated acquire) preceding writes are visible after associated acquire operation memory_order_acq_rel: combination of both acquire and release memory_order_relaxed: no memory ordering, atomicity only Chart 18

Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion Chart 19

Memory Barriers void produce() { payload = 42; guard.store(1, std::memory_order_release) } void consume(int iterations) { for(int i = 0; i < iterations; i++){ if(guard.load(std::memory_order_acquire)) result[i] = payload; } } Chart 20

Memory Barriers Intel x86 ARM V7 PowerPC Chart 21 http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/

Memory Barries 1000 iterations: Intel x86: strong memory model implicit acquire-release consistency ARM v7, PowerPC: weak memory model casual consistency needs memory barriers for acquire-release consistency Chart 22 http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/

Memory Models CPU Architecture Chart 23 http://preshing.com/20120930/weak-vs-strong-memory-models/

Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion Chart 24

Conclusion std::atomics provide simple, multiplatform, lock-free thread synchronization at the cost of runtime performance through enforcing atomicity of longrunning operations locally disabling compiler optimization locally disabling out-of-order execution the performance impact can be reduced by using atomics sparcely (obviously) specifying special memory ordering when ever possible. Chart 25

Sources http://en.cppreference.com/w/cpp/atomic/atomic http://www.hpl.hp.com/techreports/compaq-dec/wrl-95-7.pdf http://preshing.com https://peeterjoot.wordpress.com/tag/memory-barrier/ http://www.intel.com/content/www/us/en/architecture-andtechnology/64-ia-32-architectures-software-developer-vol-2amanual.html http://herbsutter.com/2013/02/11/atomic-weapons-the-c-memorymodel-and-modern-hardware/ Image sources as listed below each image Chart 26

Thank you for your attention! Gerstenberg NUMA Seminar