CS510 Advanced Topics in Concurrency. Jonathan Walpole

Similar documents
Threads Cannot Be Implemented As a Library

Foundations of the C++ Concurrency Memory Model

Synchronising Threads

Introduction to Parallel Programming Part 4 Confronting Race Conditions

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Shared Memory Parallel Programming with Pthreads An overview

CSE 451 Midterm 1. Name:

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

CS533 Concepts of Operating Systems. Jonathan Walpole

M4 Parallelism. Implementation of Locks Cache Coherence

Memory Consistency Models

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

CS533 Concepts of Operating Systems. Jonathan Walpole

Programming Language Memory Models: What do Shared Variables Mean?

C++ Memory Model. Don t believe everything you read (from shared memory)

The C/C++ Memory Model: Overview and Formalization

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

Coherence and Consistency

Threading and Synchronization. Fahd Albinali

Programming in Parallel COMP755

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

CS 450 Exam 2 Mon. 4/11/2016

The New Java Technology Memory Model

Relaxed Memory-Consistency Models

CS5460: Operating Systems

Dealing with Issues for Interprocess Communication

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Operating Systems. Synchronization

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Review of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue

Advanced OpenMP. Memory model, flush and atomics

Advanced Operating Systems (CS 202)

RCU in the Linux Kernel: One Decade Later

The Java Memory Model

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Programming Languages

ENCM 501 Winter 2019 Assignment 9

CS 241 Honors Concurrent Data Structures

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

The Java Memory Model

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

C++ Memory Model Tutorial

Using Relaxed Consistency Models

CMSC421: Principles of Operating Systems

An introduction to weak memory consistency and the out-of-thin-air problem

Section 5: Thread Synchronization

Other consistency models

Pre-lab #2 tutorial. ECE 254 Operating Systems and Systems Programming. May 24, 2012

CS 153 Design of Operating Systems Winter 2016

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Operating systems. Lecture 3 Interprocess communication. Narcis ILISEI, 2018

Multiprocessor Synchronization

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

Lecture 5: Synchronization w/locks

Solving the Producer Consumer Problem with PThreads

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

CSci 4061 Introduction to Operating Systems. Synchronization Basics: Locks

Thread-Local. Lecture 27: Concurrency 3. Dealing with the Rest. Immutable. Whenever possible, don t share resources

THREADS: (abstract CPUs)

Concurrent Processes Rab Nawaz Jadoon

Introducing Shared-Memory Concurrency

Advanced Threads & Monitor-Style Programming 1/24

Memory model for multithreaded C++: August 2005 status update

Concurrency. Glossary

Shared Memory Consistency Models: A Tutorial

CIS Operating Systems Synchronization based on Busy Waiting. Professor Qiang Zeng Spring 2018

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Operating Systems, Assignment 2 Threads and Synchronization

Synchronization. Before We Begin. Synchronization. Credit/Debit Problem: Race Condition. CSE 120: Principles of Operating Systems.

C++ Memory Model. Martin Kempf December 26, Abstract. 1. Introduction What is a Memory Model

Synchronization. Before We Begin. Synchronization. Example of a Race Condition. CSE 120: Principles of Operating Systems. Lecture 4.

CS-537: Midterm Exam (Spring 2001)

GPU Concurrency: Weak Behaviours and Programming Assumptions

Concurrency: a crash course

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Part II Process Management Chapter 6: Process Synchronization

Chapter 6: Process Synchronization. Operating System Concepts 8 th Edition,

Shared-Memory Programming

CSE 374 Programming Concepts & Tools

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 8: Semaphores, Monitors, & Condition Variables

CSE Traditional Operating Systems deal with typical system software designed to be:

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

New Programming Abstractions for Concurrency in GCC 4.7. Torvald Riegel Red Hat 12/04/05

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

CMSC 330: Organization of Programming Languages

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet

CSE 153 Design of Operating Systems

Overview: Memory Consistency

CSE 120: Principles of Operating Systems. Lecture 4. Synchronization. October 7, Prof. Joe Pasquale

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Systèmes d Exploitation Avancés

Administrivia. p. 1/20

DISTRIBUTED COMPUTER SYSTEMS

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Transcription:

CS510 Advanced Topics in Concurrency Jonathan Walpole

Threads Cannot Be Implemented as a Library

Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to contain 0? Initialization: x = y = 0; Thread 1 x = 1; r1 = y; Thread 2 y = 1; r2 = x;

Sequential Consistency If your intuition is based on sequential consistency, you may think that r1 = r2 = 0 is invalid But sequential consistency does not hold on most (any?) modern architectures - Compilers may reorder memory operations under the (invalid) assumption that the program is sequential! - Hardware may reorder memory operations, as we will see in the next class - You are expected to use memory barriers if this hardware reordering would be problematic for your program!

The Problem Languages such as C and C++ do not support concurrency - C/C++ compilers do not implement it - Threads are implemented in library routines (i.e. Pthreads) - The compiler, unaware of threads, reorders operations as if it was dealing with a sequential program - Some of this reordering is invalid for concurrent programs But C programs with Pthreads work just fine don t they?

The Pthreads Solution Pthreads defines synchronization primitives - Pthread_mutex_lock - Pthread_mutex_unlock Programmers must use them to prevent concurrent access to memory - Define critical sections and protect them with mutex locks - Do not allow data races in the program

Pthreads Synch. Primitives Lock and unlock must be implemented carefully because they themselves contain data races! - Use memory barriers to prevent hardware reordering - Disable optimizations to prevent compiler reordering This seems ok, doesn t it? - Lock and unlock are implemented carefully - The code between lock and unlock is executed sequentially due to mutual exclusion enforced by locks - The code outside the critical section has no races

The Catch The programmer needs to determine where there is a race in order to place the locking primitives How can the programmer tell whether there is a data race? - Need to reason about ordering of memory operations to determine if there is a race - This requires the programming language to formally define a memory model - C / C++ do not!... well, they didn t before now - Without a memory model, how can you tell if there is a race?

Example Does this program contain a race? - Is it possible for either x or y to be modified? - Is x == y == 1 a possible outcome? Initially: x = y = 0; if (x == 1) ++y; if (y == 1) ++x;

Valid Compiler Transformations Assuming sequential consistency, there is no race But sequential consistency is a bad assumption! Also, a compiler could legally transform if (x == 1) ++y; if (y == 1) ++x; to ++y; if (x!= 1) --y; ++x; if (y!= 1) --x; This transformation produces a race! How can the compiler know not to do this?

The Programmer s Problem Programmer needs to know the constraints on compiler or hardware memory reordering in order to determine whether there is a race - Then needs to prevent the race by using the mutual exclusion primitives - Some CPUs define a formal memory consistency model - But the C programming language doesn t define a formal memory model, so its not possible to answer the first question with confidence in all cases!

The Compiler Developer s Problem Compilers need to know about concurrency in order to know that its not OK to use certain sequential optimizations - Code transformations that are valid for sequential programs but not concurrent ones - Alternatively, the compiler must be conservative all the time, leading to performance degradation for sequential programs

Another Example Rewriting adjacent data (bit fields) struct { int a:17; int b:15 } x; If thread 1 updates x.a and thread 2 updates x.b, is there a race?

Another Example Rewriting adjacent data (bit fields) struct { int a:17; int b:15 } x; x.a = 42 possibly implemented by the compiler as { } tmp = x; // Read both fields into a // 32 bit temporary variable tmp &= ~0x1ffff; // Mask off old a; tmp = 42; // Or in new value x = tmp; // Overwrite all of x

The Programmer s Problem An update to x.a also updates x.b by side effect! - There appears to be no race in the program, but this transparent compiler optimization creates one - Mutual exclusion primitives should have been used! but a separate lock for x.a and x.b will not do either! If x.a and x.b are protected using separate locks a thread updating only x.a would need to also acquire the lock for x.b - How is the programmer to know this?

The Compiler Developer s Problem How can you know that this optimization (which is fine for a sequential program) is not ok? - If your language has no concept of concurrency? The granularity of memory operations (bit, byte, word etc) is architecture specific - Programs that need to know this kind of detail are not portable! - If the language defined a minimum granularity for atomic updates, what should it be?

Register Promotion Register promotion can also introduce updates where there were none before. Example: conditional locking for multithreading: for (...) {... if (mt) pthread_mutex_lock (...); x =... x... if (mt) pthread_mutex_unlock (...); }

Register Promotion Register promotion can also introduce updates where there were non before. Example: conditional locking for multithreading: Transformed to

Register Promotion Register promotion can also introduce updates where there were non before. Example: conditional locking for multithreading r = x; for (...) {... if (mt) { x = r; pthread_mutex_lock (...); r = x; } r =... r... if (mt) { x = r; pthread_mutex_unlock (...); r = x; } }

The Problem The variable x is now accessed outside the critical section The compiler caused this problem (i.e., broke the program) - but how is the compiler supposed to know this is a mistake if it is unaware of concurrency and hence unaware of critical sections? Bottom line: the memory abstraction is under-defined!

Other Problems Sticking to the Pthreads rules prevents important performance optimizations - not just in the compiler It also makes programs based on non-blocking synchronization illegal! - NBS programs contain data races, by definition! Even with a formal memory model, how would we extend the Pthreads approach to cover NBS?

C++11 Memory Model Compiler optimizations are not allowed to introduce data races! - can be controlled via compilation flags Atomic types and operations - implemented via mutual exclusion - or by lock-free algorithms - or directly by underlying atomic instructions

Avoiding Data Races Compiler optimizations must not introduce stores on code paths that don t have them Packed data: - each field/object is its own memory location - compiler not allowed to write other memory locations - packing small fields in to a word is no longer allowed - use half-word or byte operations - bit fields can still race... could be implemented using CAS

Avoiding Data Races (cont.) Ongoing analysis of existing optimizations to see if they generate races or not...

Atomic Types Operations on atomic types are indivisible Memory coherence enforced per object - each object has a single modification order - atomic writes to the object are serialized Choice of synchronization modes - memory ordering semantics defined per operation Ongoing work on lock-free implementations of atomic types

Synchronization Modes Memory model synchronization modes give a choice of memory ordering semantics - Sequentially Consistent (strong, the default) - Acquire Release (weaker) - Consume (even weaker) - Relaxed (weakest)

Sequentially Consistent

Sequentially Consistent The assert can not fail Happens-before order exists between all operations

Sequentially Consistent The final loop in thread 1 is not infinite! Atomic operations are optimization barriers (like opaque functions) Reordering can happen between them, but not across them!

Sequentially Consistent

Sequentially Consistent Neither assert can fail

Relaxed

Relaxed Either assert can fail! No ordering is enforced (no happens before edges) Only coherence order per variable is enforced

Relaxed

Relaxed This assert can not fail (stores to same variable in same thread) Once thread 2 has seen 2 in x, it can not see an earlier value. Coherence order of x matches order of stores to x from thread 1

Acquire Release

Acquire Release Like sequential consistency, but only for dependent variables, not between independent reads of independent writes Both asserts can pass, because no ordering is implied between thread 1 and 2 Sequential consistency would require that if one passes the other must fail

Acquire Release

Acquire Release Assert can not fail, because store to y happens before store to x, even though y is not atomic.

Consume No happens-before ordering on non-dependent variables Assert in thread 2 is true Assert in thread 3 can fail

Consume

Review: Sequentially Consistent

Review: Sequentially Consistent All threads see the same state Both asserts are true

Review: Acquire Release

Review: Acquire Release Only the two threads involved see the same state Thread 2 s assert is true Thread 3 s assert can fail, since thread 1 and 3 have not synchronized

Review: Relaxed

Review: Relaxed Both asserts can fail