COMP6771 Advanced C++ Programming

Similar documents
COMP6771 Advanced C++ Programming

COMP6771 Advanced C++ Programming

Chenyu Zheng. CSCI 5828 Spring 2010 Prof. Kenneth M. Anderson University of Colorado at Boulder

Homework 4. Any questions?

C++11 threads -- a simpler interface than pthreads Examples largely taken from 013/02/24/investigating-c11-threads/

COMP6771 Advanced C++ Programming

Database Systems on Modern CPU Architectures

A <Basic> C++ Course. 12 lambda expressions and concurrency in C++11. Julien Deantoni

Introduction to Lock-Free Programming. Olivier Goffart

tcsc 2016 Luca Brianza 1 Luca Brianza 19/07/16 INFN & University of Milano-Bicocca

Datenstrukturen und Algorithmen

Synchronising Threads

Multi-threaded Programming in C++

C++11 Concurrency and Multithreading For Hedge Funds & Investment Banks:

The C++ Memory Model. Rainer Grimm Training, Coaching and Technology Consulting

Parallel Computing. Prof. Marco Bertini

C - Grundlagen und Konzepte

05-01 Discussion Notes

Object-Oriented Programming for Scientific Computing

CMSC 132: Object-Oriented Programming II

Advanced Threads & Monitor-Style Programming 1/24

Shared-Memory Programming

Orbix TS Thread Library Reference

COMP6771 Advanced C++ Programming

Distributed Programming

30. Parallel Programming IV

CPSC 261 Midterm 2 Thursday March 17 th, 2016

A jthread automatically signals an interrupt at the end of its lifetime to the started thread (if still joinable) and joins:

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Solution: a lock (a/k/a mutex) public: virtual void unlock() =0;

COMP6771 Advanced C++ Programming

Concurrent Programming in C++ Venkat

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

Design Patterns in C++

Exception Namespaces C Interoperability Templates. More C++ David Chisnall. March 17, 2011

TS Thread Library Reference. Version 6.2, December 2004

RAII and Smart Pointers. Ali Malik

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Synchronization Spinlocks - Semaphores

Concurrent programming in C++11

Synchronization I. Jo, Heeseung

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Static initializers in C++

Ch. 12: Operator Overloading

THREADS: (abstract CPUs)

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

Multithreading Using Lockless Lists and RCU. Ansel Sermersheim CppNow - May 2017

CMSC 433 Programming Language Technologies and Paradigms. Synchronization

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Synchronization. CS 475, Spring 2018 Concurrent & Distributed Systems

Programming Languages

Proposed Wording for Concurrent Data Structures: Hazard Pointer and Read Copy Update (RCU)

Systems software design. Processes, threads and operating system resources

Interrupt Tokens and a Joining Thread, Rev 7 New in R7

High Integrity C++ for Parallel and Concurrent Programming Coding Standard

CSE 333 Lecture smart pointers

A Proposal to Add a Logical Const Wrapper to the Standard Library Technical Report

CSE 451 Midterm 1. Name:

Engineering Robust Server Software

A Cooperatively Interruptible Joining Thread, Rev 4 New in R4

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

C++ Addendum: Inheritance of Special Member Functions. Constructors Destructor Construction and Destruction Order Assignment Operator

Supporting async use-cases for interrupt_token

C-style casts Static and Dynamic Casts reinterpret cast const cast PVM. Casts

Introduction to Threads

Learning Outcomes. Concurrency and Synchronisation. Textbook. Concurrency Example. Inter- Thread and Process Communication. Sections & 2.

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Parallel Programming: Background Information

Parallel Programming: Background Information

Department of Computer Science. COS 122 Operating Systems. Practical 3. Due: 22:00 PM

Multithreading in C++0x part 1: Starting Threads

C++ is Fun Part 15 at Turbine/Warner Bros.! Russell Hanson

CS 241 Honors Concurrent Data Structures

Locks. Dongkun Shin, SKKU

Concurrency and Synchronisation

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

Concurrency and Synchronisation

C and C++ 7. Exceptions Templates. Alan Mycroft

Concurrent programming in C++11

CS 537 Lecture 11 Locks

Operating Systems. Synchronization

! Why is synchronization needed? ! Synchronization Language/Definitions: ! How are locks implemented? Maria Hybinette, UGA

Concurrency: Locks. Announcements

CS 153 Design of Operating Systems Winter 2016

! Errors can be dealt with at place error occurs

Semaphores. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CMSC 330: Organization of Programming Languages. The Dining Philosophers Problem

std::async() in C++11 Basic Multithreading

Multithreading done right? Rainer Grimm: Multithreading done right?

Deadlock and Monitors. CS439: Principles of Computer Systems September 24, 2018

A Proposal to Add a Const-Propagating Wrapper to the Standard Library

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Part X. Advanced C ++

A Cooperatively Interruptible Joining Thread, Rev 2

Efficient concurrent waiting for C++20

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

Doc number: P0126R2 Revises: P0126R1, N4195 Date: Project: Reply-to: Thanks-to: A simplifying abstraction

Concurrency, Thread. Dongkun Shin, SKKU

Threads and Java Memory Model

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution

Transcription:

1. COMP6771 Advanced C++ Programming Week 9 Multithreading (continued) 2016 www.cse.unsw.edu.au/ cs6771

2. So Far Program Workflows: Sequential, Parallel, Embarrassingly Parallel Memory: Shared Memory, Local Memory, Message Passing : Generally one CPU clock cycle Concepts and Problems: Race Conditions, Mutual Exclusion, Mutex Objects and Deadlocks

. 3 1 #include <iostream> 2 #include <thread> 3 threadlamdarace.cpp 4 // NOTE: do not compile this with -O2 it optimises out the ++ 5 // call and prevents the race condition from happening. 6 7 int main() { 8 int i = 1; 9 const long numiterations = 1000000; 10 std::thread t1{[&] { 11 for (int j = 0; j < numiterations; ++j) { 12 i++; 13 } 14 }}; 15 std::thread t2{[&] { 16 for (int j = 0; j < numiterations; ++j) { 17 i--; 18 } 19 }}; 20 t1.join(); 21 t2.join(); 22 std::cout << i << std::endl; 23 }

4. threadlamdarace.cpp Output 1 $./threadlamdarace 2 847046 3 $./threadlamdarace 4-18494 5 $./threadlamdarace 6 25156 7 $./threadlamdarace 8-433633 9 $./threadlamdarace 10-8622 11 $./threadlamdarace 12-365615 13 $./threadlamdarace 14 364766 15 $./threadlamdarace 16 1 17 $./threadlamdarace 18 1 19 $./threadlamdarace 20 1000001

5. Atomic types automatically synchronise and prevent race conditions The C++11 standard provides atomic primative types: Typedef name Full specialisation atomic bool atomic<bool> atomic char atomic<char> atomic int atomic<int> atomic uint atomic<unsigned int> atomic long atomic<long>

. Example Our problematic race condition fixed with an atomic int: 1 #include <iostream> 2 #include <thread> 3 #include <atomic> 4 int main() { 5 std::atomic<int> i {1}; 6 const long numiterations = 1000000; 7 std::thread t1([&i,numiterations] { 8 for (int j = 0; j < numiterations; ++j) { 9 i++; 10 } 11 }); 12 std::thread t2([&i,numiterations] { 13 for (int j = 0; j < numiterations; ++j) { 14 i--; 15 } 16 }); 17 t1.join(); 18 t2.join(); 19 std::cout << i << std::endl; 20 } 6 Is this a good approach?

7. Atomic Performance Problem The purpose of multi-threaded code is to improve performance Atomic operations slow down our code as they need to synchronise through locking How you structure multi-threaded code has a large influence on performance In our example we lock every time we need to increment, how might we do this better?

. Improved Performance Example 1 int main() { 2 std::atomic<int> i {1}; 3 const long numiterations = 1000000; 4 std::thread t1([&i,numiterations] { 5 int k = 0; // use a local variable 6 for (int j = 0; j < numiterations; ++j) { 7 k++; // generally more complex math 8 } 9 i += k; 10 }); 11 std::thread t2([&i,numiterations] { 12 int k = 0; 13 for (int j = 0; j < numiterations; ++j) { 14 k--; 15 } 16 i -= k; 17 }); 18 t1.join(); 19 t2.join(); 20 std::cout << i << std::endl; 21 } 8 Only write to the atomic int once per thread!

9. Library The standard defines a number of operations on atomic types. These operations will be faster than your code. Example, fetch add(): 1 #include <iostream> 2 #include <atomic> 3 #include <thread> 4 5 int main() { 6 std::atomic<int> i{10}; 7 std::cout << "Initial Value: " << i << std::endl; 8 int fetchedvalue = i.fetch_add(5); 9 std::cout << "Fetched Value: " << fetchedvalue << std::endl; 10 std::cout << "Current Value: " << i << std::endl; 11 } Ouput: 1 Initial Value: 10 2 Fetched Value: 10 3 Current Value: 15

10. Atomic Classes The std::atomic library does not provide much support for creating atomic wrappers around your own data types. It can wrap a TriviallyCopyable type T which is a class with no user defined copy controllers. It can also wrap raw pointer types (but dereferences are not atomic!). See: http://en.cppreference.com/w/cpp/atomic/atomic See: http://stackoverflow.com/questions/15886308/ stdatomic-with-custom-class-c-11 Problem: Synchonising on a whole class doesn t seem very efficent, what if only part of a class had race conditions?

11. C++11 C++11 provides Mutex objects in the <mutex> header file. General idea: A thread wants to read/write shared memory tries to lock the mutex object. If another thread is currently locking the same mutex the first thread waits until the thread is unlocked or a timer expires. When the thread obtains the lock it can safely read/write the shared memory When the thread has finished using the shared memory it releases the lock Note: If two or more threads are waiting for a lock to be released there is no order to which one will obtain the lock next. It is assumed that all threads use the mutex to correctly access the shared memory.

12. std::mutex Non-timed mutex class Member functions: lock() Tries to obtain the lock on the mutex and blocks indefinitely until the lock has been aquired. try lock() Tries to obtain the lock on the mutex, if the mutex is already locked will immediately return false, if the lock is obtained will return true. unlock() Releases the lock currently held. Note: use std::recursive mutex if your code is recursive and the same thread tries to repeatedly lock the same mutex.

13. std::mutex example 1 #include <iostream> 2 #include <thread> 3 #include <mutex> 4 5 int main() { 6 int i = 1; 7 const long numiterations = 1000000; 8 std::mutex imutex; 9 std::thread t1([&] { 10 for (int j = 0; j < numiterations; ++j) { 11 imutex.lock(); 12 i++; 13 imutex.unlock(); 14 } 15 }); 16 std::thread t2([&] { 17 for (int j = 0; j < numiterations; ++j) { 18 imutex.lock(); 19 i--; 20 imutex.unlock(); 21 } 22 }); 23 t1.join(); 24 t2.join(); 25 std::cout << i << std::endl; 26 }

14. Mutex Performance Problem We now have the same problem we had with atomic operations! std::mutex locking takes exclusive ownership over objects. Our program slows down while threads are busy waiting to acquire mutex locks. Exclusive ownership of a lock over a critical section is required for safely writing data, but what about reading it? Readers-writers locks allow multiple threads to read data in parallel (shared ownership) but if data needs to be written exclusive ownership over the mutex must be aquired.

15. Readers-writers lock (C++14) C++14 std::shared timed mutex is a form of Readers-writers lock Shared ownership (read lock): lock shared() blocks until the read lock is aquired. Multiple threads can lock shared() the mutex in parallel. Exclusive ownership (write lock): lock() blocks until the write lock is aquired. Both the read and write locks need to be unlocked before ownership can be aquired. While a thread has exclusive ownership no other thread can obtain a lock (read or write). Unlocking: make sure you unlock the correct lock: unlock() and unlock shared()

16. Exceptions and Locking What happen to the mutex lock if an exception is thrown out of a critical section? 1 #include <iostream> 2 #include <thread> 3 #include <mutex> 4 5 int main() { 6 int i = 1; 7 const long numiterations = 1000000; 8 std::mutex imutex; 9 std::thread t1([&] { 10 for (int j = 0; j < numiterations; ++j) { 11 imutex.lock(); 12 i++; 13 imutex.unlock(); 14 } 15 }); 16 std::thread t2([&] { 17 for (int j = 0; j < numiterations; ++j) { 18 imutex.lock(); 19 if ( i == 0 ) throw i cannot be less than 0 ; 20 i--; 21 imutex.unlock(); 22 } 23 }); 24 t1.join(); 25 t2.join(); 26 std::cout << i << std::endl; 27 }

17. Exceptions and Locking Major Problem: the mutex lock is never released if we throw out of a critical section. Stack unwinding occurs until the exception is caught. But only inside this thread. If exception isn t caught then program terminates. Even if the exception is caught, if the mutex isn t unlocked then deadlocks and/or undefined program behaviour may occur. We ve seen these types of issues before with dynamic memory. How did we fix them?

18. RAII wrapper class around a mutex. 1 #include <iostream> 2 #include <thread> 3 #include <mutex> 4 5 int main() { 6 int i = 1; 7 const long numiterations = 1000000; 8 std::mutex imutex; 9 std::thread t1([&] { 10 for (int j = 0; j < numiterations; ++j) { 11 std::lock_guard<std::mutex> guard(imutex); 12 i++; 13 } 14 }); 15 std::thread t2([&] { 16 for (int j = 0; j < numiterations; ++j) { 17 std::lock_guard<std::mutex> guard(imutex); 18 i--; 19 } 20 }); 21 t1.join(); 22 t2.join(); 23 std::cout << i << std::endl; 24 }

19. Lock guards are similar to smart pointers. On construction they take ownership over (lock) a mutex. When they go out of scope they get destroyed. Their destructor unlocks the mutex. Are exception safe! You should never manually lock/unlock a mutex - always use lock guards

20. Lock guard limitations Scenario consider a box holding x elements: 1 struct Box { 2 explicit Box(int num) : num_things{num} {} 3 4 int num_things; 5 std::mutex m; 6 }; and a function to transfer items between two boxes. 1 void transfer(box &from, Box &to, int num) { 2 std::lock_guard<std::mutex> lock1(from.m); 3 std::lock_guard<std::mutex> lock2(to.m); 4 5 from.num_things -= num; 6 to.num_things += num; 7 } Problem what happens if two threads in parrallel attempt to move between the same two boxes in opposite directions?

21. Lock guard limitations Problem: what happens if two threads in parrallel attempt to move between the same two boxes in opposite directions? 1 int main() { 2 Box acc1(100); 3 Box acc2(50); 4 5 std::thread t1(transfer, std::ref(acc1), std::ref(acc2), 10); 6 std::thread t2(transfer, std::ref(acc2), std::ref(acc1), 5); 7 8 t1.join(); 9 t2.join(); 10 11 std::cout << acc1.num_things << std::endl; 12 } Remember: a std::lock guard locks a mutex on construction and releases on destruction.

. std::unique lock Possible deadlocks can occur with multiple std::lock guard objects std::unique lock is a more complex lock guard Can defer locking to avoid deadlocks 1 void transfer(box &from, Box &to, int num) { 2 // don t actually lock the mutexs yet 3 std::unique_lock<std::mutex> lock1(from.m, std::defer_lock); 4 std::unique_lock<std::mutex> lock2(to.m, std::defer_lock); 5 6 // lock both unique_locks without deadlock 7 std::lock(lock1, lock2); 8 9 from.num_things -= num; 10 to.num_things += num; 11 } 22 Example (modified) from: http://en.cppreference.com/w/cpp/thread/unique_lock