Lec 26: Parallel Processing. Announcements

Similar documents
CS 316: Multicore/GPUs-II

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed

Synchronization. P&H Chapter 2.11 and 5.8. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

www nfsd emacs lpr Process Management CS 537 Lecture 4: Processes Example OS in operation Why Processes? Simplicity + Speed

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS533 Concepts of Operating Systems. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole

CS333 Intro to Operating Systems. Jonathan Walpole

OS lpr. www. nfsd gcc emacs ls 9/18/11. Process Management. CS 537 Lecture 4: Processes. The Process. Why Processes? Simplicity + Speed

Processes, Threads, SMP, and Microkernels

CS 3305 Intro to Threads. Lecture 6

Synchronization. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. P&H Chapter 2.11

Synchronization. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. P&H Chapter 2.11

Processes and Threads

Multiprocessors and Locking

Synchronization and Prelim 2 Review" Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University

CPSC 341 OS & Networks. Threads. Dr. Yingwu Zhu

UNIX Input/Output Buffering

POSIX threads CS 241. February 17, Copyright University of Illinois CS 241 Staff

CSE 153 Design of Operating Systems Fall 2018

Chapter 4 Concurrent Programming

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

Prelim 3 Review. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University

Lecture 4: Process Management

CS370 Operating Systems Midterm Review

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

Synchronization Spinlocks - Semaphores

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Lecture 5 Threads and Pthreads II

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Advanced Topic: Efficient Synchronization

CS 537 Lecture 2 - Processes

4.8 Summary. Practice Exercises

Arvind Krishnamurthy Spring Threads, synchronization, scheduling Virtual memory File systems Networking

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Chapter 3: Processes

TCSS 422: OPERATING SYSTEMS

CS 333 Introduction to Operating Systems Class 4 Concurrent Programming and Synchronization Primitives

Multicore and Multiprocessor Systems: Part I

CS5460: Operating Systems

CS 5523: Operating Systems

Chapter 5. Multiprocessors and Thread-Level Parallelism

CS 450 Exam 2 Mon. 4/11/2016

CSC 252: Computer Organization Spring 2018: Lecture 26

Review: Easy Piece 1

Lec 22: Interrupts. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

THREADS: (abstract CPUs)

CPSC 341 OS & Networks. Processes. Dr. Yingwu Zhu

Processes. Process Concept

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CS3733: Operating Systems

CSC 1600 Unix Processes. Goals of This Lecture

2 Threads vs. Processes

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Lecture 4: Memory Management & The Programming Interface

Section 5: Thread Synchronization

CS 326: Operating Systems. Process Execution. Lecture 5

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

CPS 310 first midterm exam, 10/6/2014

Synchronization I. Jo, Heeseung

Operating System Structure

PROCESS MANAGEMENT. Operating Systems 2015 Spring by Euiseong Seo

Getting to know you. Anatomy of a Process. Processes. Of Programs and Processes

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Concurrency. Johan Montelius KTH

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

Threads Tuesday, September 28, :37 AM

pthreads Announcement Reminder: SMP1 due today Reminder: Please keep up with the reading assignments (see class webpage)

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Concurrency and Synchronization. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

CS 3733 Operating Systems

ECE 598 Advanced Operating Systems Lecture 23

Lecture 10 Midterm review

Processes & Threads. Process Management. Managing Concurrency in Computer Systems. The Process. What s in a Process?

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Dealing with Issues for Interprocess Communication

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

CSCE 313: Intro to Computer Systems

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

Process Synchronization

Lecture 10: Multi-Object Synchronization

Programming with Shared Memory

Outlook. Process Concept Process Scheduling Operations on Processes. IPC Examples

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

High Performance Computing Course Notes Shared Memory Parallel Programming

Prelim 3 Review. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

1 PROCESSES PROCESS CONCEPT The Process Process State Process Control Block 5

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

518 Lecture Notes Week 3

Parallel Programming with Threads

Multiprocessors & Thread Level Parallelism

Transcription:

Lec 26: Parallel Processing Kavita Bala CS 341, Fall 28 Computer Science Cornell University Announcements Pizza party Tuesday Dec 2, 6:3-9: Location: TBA Final project (parallel ray tracer) out next week Demos: Dec 16 (Tuesday) Prelim 2: Dec 4 Thursday Hollister 11, 7:3-9:/9:3 1

Amdahl s Law Task: serial part, parallel part As number of processors increases, time to execute parallel part goes to zero time to execute serial part remains the same Serial part eventually dominates Must parallelize ALL parts of task Amdahl s Law Consider an improvement E F of the execution time is affected S is the speedup 2

Amdahl s Law Sequential part can limit speedup Example: 1 processors, 9 speedup? T new = T parallelizable /1 + T sequential 1 Speedup = (1 Fparallelizable ) + F Solving: F parallelizable =.999 parallelizable = 9 /1 Need sequential part to be.1% of original time 3

Shared Memory SMP: shared memory multiprocessor Hardware provides single physical address space for all processors Synchronize shared variables using locks Cache Coherence Problem Suppose two CPU cores share a physical address space Write-through caches Time step Event CPU A s cache CPU B s cache Memory 1 CPU A reads X 2 CPU B reads X 3 CPU A writes 1 to X 1 1 4

Coherence Defined Informally: Reads return most recently written value Formally: P writes X; P reads X (no intervening writes) read returns written value P 1 writes X; P 2 reads X (sufficiently later) read returns written value CPU B reading X after step 3 in example P 1 writes X, P 2 writes X all processors see writes in the same order End up with the same final value for X Sequential consistency Cache Coherence Protocols Operations performed by caches in multiprocessors to ensure coherence Migration of data to local caches Reduces bandwidth for shared memory Replication of read-shared data Reduces contention for access Snooping protocols Each cache monitors bus reads/writes 5

Snooping Caches Read: respond if you have data Write: invalidate or update your data Invalidating Snooping Protocols Cache gets exclusive access to a block when it is to be written Broadcasts an invalidate message on the bus Subsequent read in another cache misses Owning cache supplies updated value CPU activity Bus activity CPU A s cache CPU B s cache Memory CPU A reads X Cache miss for X CPU B reads X Cache miss for X CPU A writes 1 to X Invalidate for X 1 CPU B read X Cache miss for X 1 1 1 6

Writing Write-back policies for bandwidth Write-invalidate coherence policy First invalidate all other copies of data Then write it in cache line Anybody else can read it Permits one writer, multiple readers In reality: many coherence protocols Snooping doesn t scale Directory-based protocols Caches and memory record sharing status of blocks in a directory 7

Parallel Programming and Synchronization Processes Hundreds of things going on in the system: how to manage? nfsd ls OS gcc emacs wwwlpr nfsd ls www OS emacs lpr How to make things simple? Decompose computation into separate processes How to make things reliable? Isolate processes from each other to protect from each others faults How to speed up? Overlap I/O bursts of one process with CPU bursts of another 8

What is a process? A program being executed Sequential, one instruction at a time Process is an OS abstraction a thread of execution running in a restricted virtual environment a virtual CPU and virtual memory environment, interfacing with the OS via system calls The unit of execution The unit of scheduling Thread of execution + address space The same as job or task or sequential process. Closely related to thread Process!= Program mapped segments Header Code Initialized data BSS Symbol table Line numbers Ext. refs Executable Program is passive Code + static data Process is running program stack, registers, heap, pc Example: We both run IE on one machine - same program - separate processes - same virtual address space - different physical memory Process address space DLL s Stack Heap BSS Initialized data Code 9

Context Switch Context Switch Process of switching CPU from one process to another State of a running process must be saved and restored: Program Counter, Stack Pointer, General Purpose Registers Suspending a process: OS saves state Saves register values To execute another process, the OS restores state Loads register values Details of Context Switching Context switching code is architecture-dependent Depends on registers Very tricky to implement OS must save state without changing state Must run without changing any user program registers CISC: single instruction saves all state RISC: reserve registers for kernel Overheads: CPU is idle during a context switch Explicit: direct cost of loading/storing registers to/from main memory Implicit: Opportunity cost of flushing useful caches (cache, TLB, etc.) Waiting for pipeline to drain in pipelined processors 1

How to create a process? Double click on a icon? After boot OS starts the first process E.g., init The first process creates other processes: the creator is called the parent process the created is called the child process the parent/child relationships creates a process tree Processes Under UNIX New child process is created by the fork() system call: int fork() creates a new address space copies the parent s address space into the child s uses copy-on-write to avoid copying memory that is only read starts a new thread of control in the child s address space parent and child are almost identical in parent, fork() returns a non-zero integer in child, fork() returns a zero. difference allows parent and child to distinguish themselves int fork() returns TWICE! 11

Example main(int argc, char **argv) { char *myname = argv[1]; int cpid = fork(); if (cpid == ) { printf( The child of %s is %d\n, myname, getpid()); exit(); } else { printf( My child is %d\n, cpid); exit(); } } What does this program print? Bizarre But Real lace:tmp<15> cc a.c lace:tmp<16>./a.out foobar The child of foobar is 23874 My child is 23874 fork() Parent Child retsys v=23874 Operating System v= 12

Cooperating Processes Processes can be independent or can work cooperatively Cooperating processes can be used for: speedup by spreading computation over multiple processors/cores speedup and improving interactivity: one process can work while others are stopped waiting for I/O. better structuring of an application into separate concerns E.g., a pipeline of processes processing data But: cooperating processes need ways to Communicate information Coordinate (synchronize) activities Shared memory By default processes have disjoint physical memory -- complete isolation prevents communication Processes can set up a segment of memory as shared with other process(es) Typically part of the memory of the process creating the shared memory. Other processes attach this to their memory space. Allows high-bandwidth communication between processes by just writing into memory 13

#include <stdio.h> #include <sys/shm.h> #include <sys/stat.h> Example int main(int argc, char **argv) { char* shared_memory; const int size = 496; int segment_id = shmget(ipc_private, size, S_IRUSR S_IWUSR); int cpid = fork(); if (cpid == ) { shared_memory = (char*) shmat(segment_id, NULL, ); sprintf(shared_memory, "Hi from process %d",getpid()); } else { shared_memory = (char*) shmat(segment_id, NULL, ); wait(null); printf("process %d read: %s\n", getpid(), shared_memory); shmdt(shared_memory); shmctl(segment_id, IPC_RMID, NULL); } } Processes are heavyweight Parallel programming with processes: They share almost everything They all share the same code and any data in shared memory (process isolation is not useful) They all share the same privileges What don t they share? Each has its own PC, registers, and stack Idea: why don t we separate the idea of process (address space, accounting, etc.) from that of the minimal thread of control (PC, SP, registers)? 14

Threads vs. processes Most operating systems therefore support two entities: the process, which defines the address space and general process attributes the thread, which defines a sequential execution stream within a process A thread is bound to a single process. For each process, however, there may be many threads. Threads are the unit of scheduling Processes are containers in which threads execute Multithreaded Processes 15

Threads #include <pthread.h> int hits = ; void *PrintHello(void *threadid) { int tid; tid = (int)threadid; printf("hello World! It's me, thread #%d! hits %d\n", tid, ++hits); pthread_exit(null); } int main (int argc, char *argv[]) { pthread_t threads[5]; int t; for(t=; t<num_threads; t++){ printf("in main: creating thread %d\n", t); pthread_create(&threads[t],null,printhello,(void *)t); } pthread_exit(null); } If processes. If threads. 16

Programming with threads Need it to exploit multiple processing units to provide interactive applications to write servers that handle many clients Problem: hard even for experienced programmers Behavior can depend on subtle timing differences Bugs may be impossible to reproduce Needed: synchronization of threads Goals Concurrency poses challenges for: Correctness Threads accessing shared memory should not interfere with each other Liveness Threads should not get stuck, should make forward progress Efficiency Program should make good use of available computing resources (e.g., processors). Fairness Resources apportioned fairly between threads 17

Two threads, one counter Web servers use concurrency: Multiple threads handle client requests in parallel. Some shared state, e.g. hit counts: each thread increments a shared counter to track number of hits hits = hits + 1; What happens when two threads execute concurrently? Shared counters Possible result: lost update! hits = time T1 read hits () hits = + 1 hits = 1 T2 read hits () hits = + 1 Timing-dependent failure race condition hard to reproduce Difficult to debug 18

Race conditions Def: timing-dependent error involving access to shared state Whether it happens depends on how threads scheduled: who wins races to instruction that updates state vs. instruction that accesses state Races are intermittent, may occur rarely Timing dependent = small changes can hide bug A program is correct only if all possible schedules are safe Number of possible schedule permutations is huge Need to imagine an adversary who switches contexts at the worst possible time Critical sections To eliminate races: use critical sections that only one thread can be in Contending threads must wait to enter time T1 CSEnter(); Critical section CSExit(); T1 T2 CSEnter(); Critical section CSExit(); T2 19

Mutexes Critical sections typically associated with mutual exclusion locks (mutexes) Only one thread can hold a given mutex at a time Acquire (lock) mutex on entry to critical section Or block if another thread already holds it Release (unlock) mutex on exit Allow one waiting thread (if any) to acquire & proceed pthread_mutex_init(m); pthread_mutex_lock(m); hits = hits+1; pthread_mutex_unlock(m); T1 pthread_mutex_lock(m); hits = hits+1; pthread_mutex_unlock(m); T2 Using atomic hardware primitives Mutex implementations usually rely on special hardware instructions that atomically do a read and a write. Requires special memory system support on multiprocessors while (test_and_set(&lock)); Mutex init: lock = false; test_and_set uses a special hardware instruction that sets the lock and returns the OLD value (true: locked; false: unlocked) - Alternative instruction: compare & swap, load linked/store conditional - on multiprocessor/multicore: expensive, needs hardware support Critical Section lock = false; 2

Happy Thanksgiving! 21