Concurrency. Johan Montelius KTH

Similar documents
What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

pthreads CS449 Fall 2017

CS 3305 Intro to Threads. Lecture 6

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Threads. Threads (continued)

EPL372 Lab Exercise 2: Threads and pthreads. Εργαστήριο 2. Πέτρος Παναγή

POSIX threads CS 241. February 17, Copyright University of Illinois CS 241 Staff

Threads. studykorner.org

Locks and semaphores. Johan Montelius KTH

recap, what s the problem Locks and semaphores Total Store Order Peterson s algorithm Johan Montelius 0 0 a = 1 b = 1 read b read a

Locks and semaphores. Johan Montelius KTH

Processes. Johan Montelius KTH

A process. the stack

Introduction to PThreads and Basic Synchronization

CSci 4061 Introduction to Operating Systems. (Threads-POSIX)

CS333 Intro to Operating Systems. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole

CSE 333 SECTION 9. Threads

Introduction to pthreads

CS510 Operating System Foundations. Jonathan Walpole

Section 4: Threads and Context Switching

Section 4: Threads CS162. September 15, Warmup Hello World Vocabulary 2

Operating systems fundamentals - B06

Thread Concept. Thread. No. 3. Multiple single-threaded Process. One single-threaded Process. Process vs. Thread. One multi-threaded Process

LSN 13 Linux Concurrency Mechanisms

CS 153 Lab4 and 5. Kishore Kumar Pusukuri. Kishore Kumar Pusukuri CS 153 Lab4 and 5

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

The course that gives CMU its Zip! Concurrency I: Threads April 10, 2001

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CSE 306/506 Operating Systems Threads. YoungMin Kwon

Threads. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Threaded Programming. Lecture 9: Alternatives to OpenMP

CPSC 341 OS & Networks. Threads. Dr. Yingwu Zhu

Multithreaded Programming

Threads. Today. Next time. Why threads? Thread model & implementation. CPU Scheduling

Threads. To do. Why threads? Thread model & implementation. q q q. q Next time: Synchronization

Threads (light weight processes) Chester Rebeiro IIT Madras

Lecture 19: Shared Memory & Synchronization

Creating Threads. Programming Details. COMP750 Distributed Systems

High Performance Computing Course Notes Shared Memory Parallel Programming

CSCI4430 Data Communication and Computer Networks. Pthread Programming. ZHANG, Mi Jan. 26, 2017

Last class: Today: Thread Background. Thread Systems

CS 326: Operating Systems. Process Execution. Lecture 5

Processes and Threads

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads

Computer Systems Laboratory Sungkyunkwan University

CISC2200 Threads Spring 2015

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

Q1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100

Computer Systems Assignment 2: Fork and Threads Package

Lecture 4. Threads vs. Processes. fork() Threads. Pthreads. Threads in C. Thread Programming January 21, 2005

Threads. Jo, Heeseung

Operating Systems & Concurrency: Process Concepts

Roadmap. Concurrent Programming. Concurrent Programming. Motivation. Why Threads? Tevfik Ko!ar. CSC Systems Programming Fall 2010

Thread and Synchronization

THREADS: (abstract CPUs)

Concurrent Programming

ENCM 501 Winter 2019 Assignment 9

High Performance Computing Lecture 21. Matthew Jacob Indian Institute of Science

Machine Program: Procedure. Zhaoguo Wang

Concurrent Programming

POSIX Threads: a first step toward parallel programming. George Bosilca

Creating Threads COMP755

Memory management. Johan Montelius KTH

PRACE Autumn School Basic Programming Models

CMPSC 311- Introduction to Systems Programming Module: Concurrency

CS533 Concepts of Operating Systems. Jonathan Walpole

Concurrent Programming

CSC209H Lecture 11. Dan Zingaro. March 25, 2015

CMPSC 311- Introduction to Systems Programming Module: Concurrency

CS170 Operating Systems. Discussion Section Week 4 Project 2 - Thread

Threads. CS3026 Operating Systems Lecture 06

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

CS370 Operating Systems

Threads. CS-3013 Operating Systems Hugh C. Lauer. CS-3013, C-Term 2012 Threads 1

pthreads Announcement Reminder: SMP1 due today Reminder: Please keep up with the reading assignments (see class webpage)

Concurrency and Synchronization. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

Operating Systems. Threads and Signals. Amir Ghavam Winter Winter Amir Ghavam

Overview. Administrative. * HW 2 Grades. * HW 3 Due. Topics: * What are Threads? * Motivating Example : Async. Read() * POSIX Threads

PThreads in a Nutshell

Operating systems. Lecture 10

Total Score is updated. The score of PA4 will be changed! Please check it. Will be changed

Outline. CS4254 Computer Network Architecture and Programming. Introduction 2/4. Introduction 1/4. Dr. Ayman A. Abdel-Hamid.

4.8 Summary. Practice Exercises

CMPSC 311- Introduction to Systems Programming Module: Concurrency

CS 261 Fall Mike Lam, Professor. Threads

Threads. Jo, Heeseung

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Programming with Shared Memory. Nguyễn Quang Hùng

TCSS 422: OPERATING SYSTEMS

Concurrency, Thread. Dongkun Shin, SKKU

Multicore and Multiprocessor Systems: Part I

The Compilation Process

Multithreading. Reading: Silberschatz chapter 5 Additional Reading: Stallings chapter 4

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

Multiprocessors 2007/2008

A Thread is an independent stream of instructions that can be schedule to run as such by the OS. Think of a thread as a procedure that runs

an infinite loop Processes and Exceptions doing nothing on a busy system timing nothing

CS Operating Systems: Threads (SGG 4)

Transcription:

Concurrency Johan Montelius KTH 2017 1 / 32

What is concurrency? 2 / 32

What is concurrency? Concurrency: (the illusion of) happening at the same time. 2 / 32

What is concurrency? Concurrency: (the illusion of) happening at the same time. A property of the programing model. 2 / 32

What is concurrency? Concurrency: (the illusion of) happening at the same time. A property of the programing model. Why would we want to do things concurrently? 2 / 32

What is parallelism? 3 / 32

What is parallelism? Parallelism: the ability to do several things at the same time. 3 / 32

What is parallelism? Parallelism: the ability to do several things at the same time. A property of the execution. 3 / 32

What is parallelism? Parallelism: the ability to do several things at the same time. A property of the execution. Why would we want to do things in parallel? 3 / 32

concurrency vs parallelism parallel execution sequential sequential programing model concurrent 4 / 32

Why in this course? The problem of concurrency was first encountered in the implementation of operating systems. It has since been a central part in any course on operating systems. 5 / 32

Why in this course? The problem of concurrency was first encountered in the implementation of operating systems. It has since been a central part in any course on operating systems. Today - concurrency is such an important topic that it could (and often do) fill up a course of it s own. 5 / 32

What is the problem? If concurrent activities are not manipulating a shared resource then it s not a problem. 6 / 32

What is the problem? If concurrent activities are not manipulating a shared resource then it s not a problem. We often want to share resources between concurrent activities. 6 / 32

What is the problem? If concurrent activities are not manipulating a shared resource then it s not a problem. We often want to share resources between concurrent activities. What do two UNIX processes share? 6 / 32

A process As we have learned - the unit of a computation. 7 / 32

A process As we have learned - the unit of a computation. a program 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer a computation stack 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer a computation stack a data segment for static data structures 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer a computation stack a data segment for static data structures a heap for dynamic data structures 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer a computation stack a data segment for static data structures a heap for dynamic data structures a file table of open files 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer a computation stack a data segment for static data structures a heap for dynamic data structures a file table of open files signal handlers,... 7 / 32

A process As we have learned - the unit of a computation. a program an instruction pointer a computation stack a data segment for static data structures a heap for dynamic data structures a file table of open files signal handlers,... 7 / 32

A thread process 8 / 32

A thread code process 8 / 32

A thread code process data/heap 8 / 32

A thread code process stack data/heap 8 / 32

A thread code process IP stack data/heap 8 / 32

A thread code process IP stack data/heap 8 / 32

A thread code process IP stack data/heap 8 / 32

A thread code process IP stack data/heap 8 / 32

A thread code process IP stack data/heap 8 / 32

A thread code thread IP stack process data/heap 8 / 32

A thread code thread thread process IP stack IP stack data/heap 8 / 32

A thread code thread thread thread process IP stack IP stack IP stack data/heap 8 / 32

Virtual memory layout 9 / 32

Virtual memory layout 9 / 32

Virtual memory layout kernel 9 / 32

Virtual memory layout code (.text) kernel 9 / 32

Virtual memory layout code (.text) data kernel 9 / 32

Virtual memory layout code (.text) data heap kernel 9 / 32

Virtual memory layout code (.text) data heap stack kernel 9 / 32

Virtual memory layout code (.text) data heap stack stack kernel 9 / 32

threads API # include < pthread.h> # include <stdio.h> int loop = 10; int count = 0; void * hello ( char * name ) { for ( int i = 0; i < loop ; i ++) { count ++; printf (" hello %s %d\n", name, count ); } } int main () { pthread_ t p1; pthread_create (&p1, NULL, hello, "A"); pthread_join (p1, NULL ); return 0; 10 / 32

Concurrency 11 / 32

Concurrency What is the problem? 11 / 32

Cache coherence 12 / 32

Cache coherence The CPU uses caches to improve performance, a cache protocol must provide coherence. 12 / 32

Cache coherence The CPU uses caches to improve performance, a cache protocol must provide coherence. All write operations to a single memory location: are atomic, 12 / 32

Cache coherence The CPU uses caches to improve performance, a cache protocol must provide coherence. All write operations to a single memory location: are atomic, performed in program order and 12 / 32

Cache coherence The CPU uses caches to improve performance, a cache protocol must provide coherence. All write operations to a single memory location: are atomic, performed in program order and seen by all processes in a total order. 12 / 32

Cache coherence The CPU uses caches to improve performance, a cache protocol must provide coherence. All write operations to a single memory location: are atomic, performed in program order and seen by all processes in a total order. The C compiler can do optimizations that we are not prepared for. 12 / 32

Cache coherence The CPU uses caches to improve performance, a cache protocol must provide coherence. All write operations to a single memory location: are atomic, performed in program order and seen by all processes in a total order. The C compiler can do optimizations that we are not prepared for. There are several alternatives of how coherence is defined, this is one example 12 / 32

More problems 13 / 32

More problems What is the expected outcome of an execution? 13 / 32

Sequential consistency The outcome is the same as if all the operations of the program were executed: 14 / 32

Sequential consistency The outcome is the same as if all the operations of the program were executed: as atomic operations in some sequence, 14 / 32

Sequential consistency The outcome is the same as if all the operations of the program were executed: as atomic operations in some sequence, consistent with the program order of each thread. 14 / 32

the code int loop = 10; int count = 0; void * hello ( void *) { : for ( int i = 0; i < loop ; i ++) { count ++; } : } 15 / 32

the code int loop = 10; int count = 0; void * hello ( void *) { : for ( int i = 0; i < loop ; i ++) { count ++; } : }.L3 : movl count (% rip ), % eax addl $1, % eax movl %eax, count (% rip ) addl $1, -4(% rbp ) movl loop (% rip ), % eax cmpl %eax, -4(% rbp ) jl.l3 15 / 32

16 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 0 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 my = 1 0 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 my = 1 1 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 read your 1 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 1 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 read count 1 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 count = 8 1 7 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 count = 8 1 8 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 my = 0 1 8 0 a c b 17 / 32

What about this? int count = 7; volatile int a = 0; volatile int b = 0; void critical (... ) { : while (1) { my = 1; if( your == 0) { count ++; my = 0; break ; } else { my = 0; } } } Thread 1 Thread 2 my = 0 0 8 0 a c b 17 / 32

Total Store Order (TSO) 18 / 32

Total Store Order (TSO) Modern CPU:s do not provide sequential consistency, they only provide Total Store Order. 18 / 32

Total Store Order (TSO) Modern CPU:s do not provide sequential consistency, they only provide Total Store Order. Write operations are performed in a total order. 18 / 32

Total Store Order (TSO) Modern CPU:s do not provide sequential consistency, they only provide Total Store Order. Write operations are performed in a total order. A process will immediately see its own store operations but, 18 / 32

Total Store Order (TSO) Modern CPU:s do not provide sequential consistency, they only provide Total Store Order. Write operations are performed in a total order. A process will immediately see its own store operations but,... a read operation might bypass a write operation of another memory location. 18 / 32

Total Store Order (TSO) Modern CPU:s do not provide sequential consistency, they only provide Total Store Order. Write operations are performed in a total order. A process will immediately see its own store operations but,... a read operation might bypass a write operation of another memory location. There are operations provided by the hardware that will give us better guarantees. 18 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a = 1 a b 0 0 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 read b read a 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 read b read a 0 0 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 read b read a 0 0 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 read b read a 0 0 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 read b read a 0 1 0 19 / 32

Total Store Order WARNING: the following sequence contains scenes that some viewers may find disturbing. P1 P2 a b 0 0 a = 1 b = 1 read b read a 0 1 1 0 19 / 32

Hardware support - TGH 20 / 32

Hardware support - TGH TGH - Thank God for Hardware 20 / 32

Hardware support - TGH TGH - Thank God for Hardware Fences, barriers etc: all load and store operations before a fence are guaranteed to be performed before any operations after the fence. Atomic-swap, test-and-set etc: an instructions that reads and writes to a memory location in one atomic operation. 20 / 32

Hardware support - TGH TGH - Thank God for Hardware Fences, barriers etc: all load and store operations before a fence are guaranteed to be performed before any operations after the fence. Atomic-swap, test-and-set etc: an instructions that reads and writes to a memory location in one atomic operation. Modern CPU:s provide very weak consistency guarantees if these operations are not used. Don t rely on the program order of your code. 20 / 32

Hardware support - TGH TGH - Thank God for Hardware Fences, barriers etc: all load and store operations before a fence are guaranteed to be performed before any operations after the fence. Atomic-swap, test-and-set etc: an instructions that reads and writes to a memory location in one atomic operation. Modern CPU:s provide very weak consistency guarantees if these operations are not used. Don t rely on the program order of your code. Better still - if possible, use a library that handles synchronization. 20 / 32

How to synchronize 21 / 32

How to synchronize Next week. 21 / 32

How to implement threads threads in user space threads in kernel space 22 / 32

How to implement threads threads in user space threads in kernel space kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

How to implement threads threads in user space threads in kernel space scheduler process processes kernel kernel 22 / 32

pros and cons Threads in user space: + You can change scheduler. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. Threads in kernel space: + One thread can suspend while other continue to execute. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. Threads in kernel space: + One thread can suspend while other continue to execute. + A process can utilize multiple cores. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. Threads in kernel space: + One thread can suspend while other continue to execute. + A process can utilize multiple cores. - Thread scheduling requires trap to kernel. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. Threads in kernel space: + One thread can suspend while other continue to execute. + A process can utilize multiple cores. - Thread scheduling requires trap to kernel. - No way to change scheduler for a process. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. Threads in kernel space: + One thread can suspend while other continue to execute. + A process can utilize multiple cores. - Thread scheduling requires trap to kernel. - No way to change scheduler for a process. 23 / 32

pros and cons Threads in user space: + You can change scheduler. + Very fast task switching. - If the process is suspended, all threads are. - A process can not utilize multiple cores. Threads in kernel space: + One thread can suspend while other continue to execute. + A process can utilize multiple cores. - Thread scheduling requires trap to kernel. - No way to change scheduler for a process. Which approach is taken by GNU/Linux? 23 / 32

Java, Haskell and Erlang How is this handled in high level languages? 24 / 32

Java, Haskell and Erlang How is this handled in high level languages? Java: each Java thread mapped to one operating system thread. 24 / 32

Java, Haskell and Erlang How is this handled in high level languages? Java: each Java thread mapped to one operating system thread. Erlang and Haskell: Language threads scheduled by the virtual machine. The virtual machine will use several operating system threads to have several outstanding system calls, utilize multiple cores etc. 24 / 32

Java, Haskell and Erlang How is this handled in high level languages? Java: each Java thread mapped to one operating system thread. Erlang and Haskell: Language threads scheduled by the virtual machine. The virtual machine will use several operating system threads to have several outstanding system calls, utilize multiple cores etc. 24 / 32

Java, Haskell and Erlang How is this handled in high level languages? Java: each Java thread mapped to one operating system thread. Erlang and Haskell: Language threads scheduled by the virtual machine. The virtual machine will use several operating system threads to have several outstanding system calls, utilize multiple cores etc. Java originally had user space threads, and introduced the name, green threads. This was later replaced by native threads i.e. each Java thread attached to a kernel operating system thread. 24 / 32

an experiment How long time does it take to send a message around a ring of a hundred threads? 25 / 32

pthread_create() - from man pages #include <pthread.h> 26 / 32

pthread_create() - from man pages #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); 26 / 32

pthread_create() - from man pages #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); pthread_t *thread : a pointer to a thread structure. 26 / 32

pthread_create() - from man pages #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); pthread_t *thread : a pointer to a thread structure. const pthread_attr_t *attr : a pointer to a structure that are the attributes of the thread. 26 / 32

pthread_create() - from man pages #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); pthread_t *thread : a pointer to a thread structure. const pthread_attr_t *attr : a pointer to a structure that are the attributes of the thread. void *(*start_routine) (void *) : a pointer to a function that takes one argument, (void*), with return value void*. void *arg : the arguments to the function, given as a a void *. 26 / 32

pthread_create() - from man pages #include <pthread.h> int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); pthread_t *thread : a pointer to a thread structure. const pthread_attr_t *attr : a pointer to a structure that are the attributes of the thread. void *(*start_routine) (void *) : a pointer to a function that takes one argument, (void*), with return value void*. void *arg : the arguments to the function, given as a a void *. Compile and link with -pthread. 26 / 32

Pthreads in Linux How do we implement threads in Linux? 27 / 32

Pthreads in Linux How do we implement threads in Linux? In Linux, both fork() and pthread_create() are implemented using the system call clone(). 27 / 32

Pthreads in Linux How do we implement threads in Linux? In Linux, both fork() and pthread_create() are implemented using the system call clone(). What is clone()? 27 / 32

clone() - from man pages Unlike fork(2), clone() allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. 28 / 32

clone() - from man pages Unlike fork(2), clone() allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. The system call clone() allows us to define how much should be shared: fork(): copy table of file descriptors, copy memory space and signal handlers i.e a perfect copy pthread_create(): share table of file descriptors and memory, copy signal handlers 28 / 32

clone() - from man pages Unlike fork(2), clone() allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. The system call clone() allows us to define how much should be shared: fork(): copy table of file descriptors, copy memory space and signal handlers i.e a perfect copy pthread_create(): share table of file descriptors and memory, copy signal handlers Using clone() directly you can pick and choose of more than twenty parameters what the clone should share. 28 / 32

Thread Local Storage (TLS) All threads have their own stack, the heap is shared. 29 / 32

Thread Local Storage (TLS) All threads have their own stack, the heap is shared. Would it not be nice to have some thread local storage? 29 / 32

Thread Local Storage (TLS) All threads have their own stack, the heap is shared. Would it not be nice to have some thread local storage? thread int local = 42; 29 / 32

TLS implementation thread int local = 0; int global = 1; void * hello ( void * name ) { } int stk = 2; int sum = local + global + stk ; 30 / 32

TLS implementation thread int local = 0; int global = 1; void * hello ( void * name ) { } int stk = 2; int sum = local + global + stk ; pushq % rbp movq %rsp, % rbp movq %rdi, -24(% rbp ) movl $2, -8(% rbp ) movl %fs: local@tpoff, % edx movl global (% rip ), % eax addl %eax, % edx movl -8(% rbp ), % eax addl %edx, % eax movl %eax, -4(% rbp ) nop popq % rbp ret $ 30 / 32

remember segmentation The TLS is referenced using the segment selector fs:. 31 / 32

remember segmentation The TLS is referenced using the segment selector fs:. When we change thread, the kernel sets the fs selector register. 31 / 32

remember segmentation The TLS is referenced using the segment selector fs:. When we change thread, the kernel sets the fs selector register. The TLS has an original copy that is copied by each thread (even the mother thread) before any write operations. 31 / 32

remember segmentation The TLS is referenced using the segment selector fs:. When we change thread, the kernel sets the fs selector register. The TLS has an original copy that is copied by each thread (even the mother thread) before any write operations. You can take an address of a TLS structure and pass it to another thread. 31 / 32

remember segmentation The TLS is referenced using the segment selector fs:. When we change thread, the kernel sets the fs selector register. The TLS has an original copy that is copied by each thread (even the mother thread) before any write operations. You can take an address of a TLS structure and pass it to another thread. 31 / 32

Summary 32 / 32

Summary Concurrency vs parallelism? 32 / 32

Summary Concurrency vs parallelism? What is a thread? 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? Sequential Consistency vs Total Store Order 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? Sequential Consistency vs Total Store Order Threads in kernel or user space? 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? Sequential Consistency vs Total Store Order Threads in kernel or user space? Threads in GNU/Linux and clone(). 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? Sequential Consistency vs Total Store Order Threads in kernel or user space? Threads in GNU/Linux and clone(). What is Thread Local Storage? 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? Sequential Consistency vs Total Store Order Threads in kernel or user space? Threads in GNU/Linux and clone(). What is Thread Local Storage? 32 / 32

Summary Concurrency vs parallelism? What is a thread? What do threads of process share? Sequential Consistency vs Total Store Order Threads in kernel or user space? Threads in GNU/Linux and clone(). What is Thread Local Storage? 32 / 32