ECE 574 Cluster Computing Lecture 8

Similar documents
ECE 598 Advanced Operating Systems Lecture 23

ECE 598 Advanced Operating Systems Lecture 11

ECE 598 Advanced Operating Systems Lecture 22

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Concurrency, Thread. Dongkun Shin, SKKU

CS 326: Operating Systems. Process Execution. Lecture 5

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS533 Concepts of Operating Systems. Jonathan Walpole

Locks. Dongkun Shin, SKKU

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Chapter 2 Processes and Threads

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

EECS 482 Introduction to Operating Systems

Chapter 4: Multi-Threaded Programming

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Threads. CS3026 Operating Systems Lecture 06

Preview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread

High Performance Computing Course Notes Shared Memory Parallel Programming

CS333 Intro to Operating Systems. Jonathan Walpole

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Parallel Programming Languages COMP360

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Chapter 4: Threads. Chapter 4: Threads

Motivation of Threads. Preview. Motivation of Threads. Motivation of Threads. Motivation of Threads. Motivation of Threads 9/12/2018.

Definition Multithreading Models Threading Issues Pthreads (Unix)

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Notes. CS 537 Lecture 5 Threads and Cooperation. Questions answered in this lecture: What s in a process?

Last class: Today: Thread Background. Thread Systems

CSC369 Lecture 2. Larry Zhang

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

CSC369 Lecture 2. Larry Zhang, September 21, 2015

Synchronization for Concurrent Tasks

Dealing with Issues for Interprocess Communication

CISC2200 Threads Spring 2015

Introduction to pthreads

Chapter 5: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads Linux Threads Java Threads

CS510 Operating System Foundations. Jonathan Walpole

Processes The Process Model. Chapter 2. Processes and Threads. Process Termination. Process Creation

Systèmes d Exploitation Avancés

Concurrency: a crash course

CSE 153 Design of Operating Systems

Processes, Threads, SMP, and Microkernels

Processes. CS3026 Operating Systems Lecture 05

Linux Driver and Embedded Developer

CS510 Operating System Foundations. Jonathan Walpole

Processes The Process Model. Chapter 2 Processes and Threads. Process Termination. Process States (1) Process Hierarchies

Synchronization for Concurrent Tasks

Questions from last time

CS 5523: Operating Systems

Process Description and Control

Shared Memory Parallel Programming with Pthreads An overview

Locks and semaphores. Johan Montelius KTH

OPERATING SYSTEM. Chapter 4: Threads

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012

CS 261 Fall Mike Lam, Professor. Threads

Timers 1 / 46. Jiffies. Potent and Evil Magic

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5.

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CSCE 313: Intro to Computer Systems

Multithreaded Programming

CS Lecture 3! Threads! George Mason University! Spring 2010!

Chapter 4: Threads. Chapter 4: Threads

Processes. Process Scheduling, Process Synchronization, and Deadlock will be discussed further in Chapters 5, 6, and 7, respectively.

Chapter 4: Threads. Operating System Concepts 9 th Edition

CMSC421: Principles of Operating Systems

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Message-Passing Shared Address Space

PROCESSES & THREADS. Charles Abzug, Ph.D. Department of Computer Science James Madison University Harrisonburg, VA Charles Abzug

Chapter 4: Threads. Operating System Concepts 9 th Edition

pthread Tutorial August 23, Creating and Destroying Threads Creating Threads Returning Results from Threads...

pthreads CS449 Fall 2017

Application Programming

Page 1. Analogy: Problems: Operating Systems Lecture 7. Operating Systems Lecture 7

SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018

Processes and Threads

Multicore and Multiprocessor Systems: Part I

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Chapter 6 Concurrency: Deadlock and Starvation

Lecture 9: Midterm Review

CS604 - Operating System Solved Subjective Midterm Papers For Midterm Exam Preparation

Threads Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

recap, what s the problem Locks and semaphores Total Store Order Peterson s algorithm Johan Montelius 0 0 a = 1 b = 1 read b read a

Multiprocessor Systems. COMP s1

B. V. Patel Institute of Business Management, Computer &Information Technology, UTU

Intro to POSIX Threads with FLTK

Synchronization I. Jo, Heeseung

Modern Computers often do lots of things at the same time (web servers, accessing disks, background processes waiting for s,...).

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Operating Systems. Sina Meraji U of T

Programming in Parallel COMP755

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

4. Concurrency via Threads

CSE 153 Design of Operating Systems Fall 2018

Thread Concept. Thread. No. 3. Multiple single-threaded Process. One single-threaded Process. Process vs. Thread. One multi-threaded Process

Lecture 17: Threads and Scheduling. Thursday, 05 Nov 2009

Transcription:

ECE 574 Cluster Computing Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017

Announcements Too many snow days Posted a video with HW#4 Review HW#5 will be posted 1

Parallel Programming! 2

Processes a Review Multiprogramming multiple processes run at once Process has one view of memory, one program counter, one set of registers, one stack Context switch each process has own program counter saved and restored as well as other state (registers) OSes often have many things running, often in background. 3

On Linux/UNIX sometimes called daemons Can use top or ps to view them. Creating new: on Unix its fork/exec, windows CreateProcess Children live in different address space, even though it is a copy of parent Process termination: what happens? Resources cleaned up. atexit routines run. How does it happen? 4

exit() syscall (or return from main). Killed by a signal. Error Unix process hierarchy. Parents can wait for children to finish, find out what happened not strictly possible to give your children away, although init inherits orphans Process control block. 5

Threads Each process has one address space and single thread of control. It might be useful to have multiple threads share one address space GUI: interface thread and worker thread? Game: music thread, AI thread, display thread? Webserver: can handle incoming connections then pass serving to worker threads Why not just have one process that periodically switches? 6

Lightweight Process, multithreading Implementation: Each has its own PC Each has its own stack Why do it? shared variables, faster communication multiprocessors? mostly if does I/O that blocks, rest of threads can keep going allows overlapping compute and I/O 7

Problems: What if both wait on same resource (both do a scanf from the keyboard?) On fork, do all threads get copied? What if thread closes file while another reading it? 8

Thread Implementations Cause of many flamewars over the years 9

User-Level Threads (N:1 one process many Benefits threads) Kernel knows nothing about them. Can be implemented even if kernel has no support. Each process has a thread table When it sees it will block, it switches threads/pc in user space Different from processes? When thread yield() called it can switch without calling into the kernel (no slow 10

kernel context switch) Can have own custom scheduling algorithm Scale better, do not cause kernel structures to grow Downsides How to handle blocking? Can wrap things, but not easy. Also can t wrap a pagefault. Co-operative, threads won t stop unless voluntarily give up. Can request periodic signal, but too high a rate is inefficient. 11

Kernel-Level Threads (1:1 process to Benefits thread) Kernel tracks all threads in system Handle blocking better Downsides Thread control functions are syscalls When yielding, might yield to another process rather than a thread 12

Might be slower 13

Hybrid (M:N) Can have kernel threads with user on top of it. Fast context switching, but can have odd problems like priority inversion. 14

Linux Posix Threads Originally used only userspace implementations. portable threads. GNU LinuxThreads use clone syscall, SIGUSR1 SIGUSR2 for communicating. Could not implement full POSIX threads, especially with signals. Replaced by NPTL Hard thread-local storage 15

Needed extra helper thread to handle signals Problems, what happens if helper thread killed? Signals broken? 8192 thread limit? proc/top clutter up with processed, not clear they are subthreads NPTL New POSIX Thread Library Kernel threads Clone. Add new futex system calls. Drepper and Molnar at RedHat Why kernel? Linux has very fast context switch compared to some OSes. Need new C library/abi to handle location of thread- 16

local storage On x86 the fs/gs segment used. Others need spare register. Signal handling in kernel Clone handles setting TID (thread ID) exit group() syscall added that ends all threads in process, exit() just ends thread. exec() kills all threads before execing Only main thread gets entry in proc 17

Pthread Programming based on this really good tutorial here: https://computing.llnl.gov/tutorials/pthreads/ 18

Pthread Programming Changes to shared system resources affect all threads in a process (such as closing a file) Identical pointers point to same data Reading and writing to same memory is possible simultaneously (with unknown origin) so locking must be used 19

When can you use? Work on data that can be split among multiple tasks Work that blocks on I/O Work that has to handle asynchronous events 20

Models Pipeline task broken into a set of subtasks that each execute serial on own thread Manager/worker a manager thread assigns work to a set of worker threads. Also manager usually handles I/O static worker pool constant number of threads dynamic worker pool threads started and stopped as needed Peer like manager/worker but the manager also does calculations 21

Shared Memory Model All threads have access to shared memory Threads also have private data Programmers must properly protect shared data 22

Thread Safeness Is a function called thread safe? Can the code be executed multiple times simultaneously? The main problem is if there is global state that must be remembered between calls. For example, the strtok() function. As long as only local variables (on stack) usually not an issue. Can be addressed with locking. 23

POSIX Threads 1995 standard Various interfaces: 1. Thread management: Routines for manipulating threads creating, detaching, joining, etc. Also for setting thread attributes. 2. Mutexes: (mutual exclusion) Routines for creating mutex locks. 3. Condition variables allow having threads wait on a lock 24

4. Synchronization: lock and barrier management 25

POSIX Threads (pthreads) A C interface. There are wrappers for Fortran. Over 100 functions, all starting with pthread Involve opaque data structures that are passed around. Include pthread.h header Include -pthread in linker command to compiler 26

Creating Threads Your function, as per normal, only includes one thread pthread create() creates a new thread You can call it anywhere, as many times as you want pthread create (thread,attr,start routine,arg) You pass is a pointer to a thread object (which is opaque), an attr object (which can be NULL), a 27

start routine which is a C function called when it starts, an an arg argument to pass to the routine. Only can pass one argument. How can you pass more? pointer to a structure. With attributes you can set things like scheduling policies No routines for binding threads to specific cores, but some implementations include optional non-portable way. Also Linux has sched setaffinity routine. 28

pthread exit() Terminating Threads Returns normally from its starting routine another thread uses pthread cancel() in it The entire process is terminated (by ending, or calling exit(), etc) 29

Thread Management pthread join() lets a thread block until another one finishes So master can join all the children and wait until they are done before continuing. Argument to a join is a specific thread to wait on (so if waiting on four, have to have four calls to pthread join() 30

Stack Management Manage your own stack? Can get and set size. Be careful allocating too much on stack. 31

Mutexes Type of lock, only one thread can own it at a time. Can be used to avoid race conditions. 32

Condition Variables A way to avoid spinning on a mutex 33

Debugging 34

Race Conditions Shared counter address RMW on ARM Thread A reads value into reg Context switch happens Thread B reads value into reg, increments, writes out Context switch back to A increments value, writes out What happened? What should value be? 35

Critical Sections Want mutual exclusion, only one can access structure at once 1. no two processes can be inside critical section at once 2. no assumption can be made about speed of CPU 3. no process not in critical section may block other processes 4. no process should wait forever 36

How to avoid Disable interrupts. Heavy handed, only works on singlecore machines. Locks/mutex/semaphore 37

Mutex mutex lock: if unlocked (0), then it sets lock and returns if locked, returns 1, does not enter. what do we do if locked? Busy wait? (spinlock) reschedule (yield)? mutex unlock: sets variable to zero 38

Semaphore Up/Down Wait in queue Blocking As lock frees, the job waiting is woken up 39

Locking Primitives fetch and add (bus lock for multiple cores), xadd (x86) test and set (atomically test value and set to 1) test and test and set compare-and-swap Atomic swap instruction SWP (ARM before v6, deprecated) x86 CMPXCHG Does both load and store in one instruction! 40

Why bad? Longer interrupt latency (can t interrupt atomic op) Especially bad in multi-core load-link/store conditional Load a value from memory Later store instruction to same memory address. Only succeeds if no other stores to that memory location in interim. ldrex/strex (ARMv6 and later) Transactional Memory 41

Locking Primitives can be shown to be equivalent how swap works: lock is 0 (free). r1=1; swap r1,lock now r1=0 (was free), lock=1 (in use) lock is 1 (not-free). r1=1, swap r1,lock now r1=1 (not-free), lock still==1 (in use) 42

Memory Barriers Not a lock, but might be needed when doing locking Modern out-of-order processors can execute loads or stores out-of-order What happens a load or store bypasses a lock instruction? Processor Memory Ordering Models, not fun Technically on BCM2835 we need a memory barrier any time we switch between I/O blocks (i.e. from serial 43

to GPIO, etc.) according to documentation, otherwise loads could return out of order 44

Deadlock Two processes both waiting for the other to finish, get stuck One possibility is a bad combination of locks, program gets stuck P1 takes Lock A. P2 takes Lock B. P1 then tries to take lock B and P2 tries to take Lock A. 45

Livelock Processes change state, but still no forward progress. Two people trying to avoid each other in a hall. Can be harder to detect 46

Starvation Not really a deadlock, but if there s a minor amount of unfairness in the locking mechanism one process might get starved (i.e. never get a chance to run) even though the other processes are properly taking and freeing the locks. 47

How to avoid Deadlock Don t write buggy code Pre-emption (let one of the stuck processes run anyway) Rollback (checkpoint occasionally) What to do if it happens? Reboot the system Kill off stuck processes 48