Program Graph. Lecture 25: Parallelism & Concurrency. Performance. What does it mean?

Similar documents
Sharing is the Key. Lecture 25: Parallelism. Canonical Example. Bad Interleavings. Common to have: CS 62 Fall 2016 Kim Bruce & Peter Mawhorter

Lab. Lecture 26: Concurrency & Responsiveness. Assignment. Maze Program

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 4 Shared-Memory Concurrency & Mutual Exclusion

Programmazione di sistemi multicore

Lecture 23: Parallelism. To Use Library. Getting Good Results. CS 62 Fall 2016 Kim Bruce & Peter Mawhorter

3/25/14. Lecture 25: Concurrency. + Today. n Reading. n P&C Section 6. n Objectives. n Concurrency

CSE332: Data Abstractions Lecture 19: Mutual Exclusion and Locking

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 5 Programming with Locks and Critical Sections

CSE 332: Data Structures & Parallelism Lecture 17: Shared-Memory Concurrency & Mutual Exclusion. Ruth Anderson Winter 2019

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

CSE332: Data Abstractions Lecture 22: Shared-Memory Concurrency and Mutual Exclusion. Tyler Robison Summer 2010

Races. Example. A race condi-on occurs when the computa-on result depends on scheduling (how threads are interleaved)

+ Today. Lecture 26: Concurrency 3/31/14. n Reading. n Objectives. n Announcements. n P&C Section 7. n Race conditions.

CSE 332: Data Structures & Parallelism Lecture 18: Race Conditions & Deadlock. Ruth Anderson Autumn 2017

Thread-Local. Lecture 27: Concurrency 3. Dealing with the Rest. Immutable. Whenever possible, don t share resources

Topics in Parallel and Distributed Computing: Introducing Concurrency in Undergraduate Courses 1,2

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs

Lecture 23: Parallelism. Parallelism Idea. Parallel Programming in Java. CS 62 Fall 2015 Kim Bruce & Michael Bannister

CSE332: Data Abstractions Lecture 25: Deadlocks and Additional Concurrency Issues. Tyler Robison Summer 2010

CSE 374 Programming Concepts & Tools

Thread-unsafe code. Synchronized blocks

CSE 332: Data Structures & Parallelism Lecture 15: Analysis of Fork-Join Parallel Programs. Ruth Anderson Autumn 2018

Programmazione di sistemi multicore

Parallel Programming: Background Information

Parallel Programming: Background Information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CSE 332: Locks and Deadlocks. Richard Anderson, Steve Seitz Winter 2014

CSE 332 Data Abstractions: Data Races and Memory, Reordering, Deadlock, Readers/Writer Locks, and Condition Variables (oh my!) Kate Deibel Summer 2012

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 2 Analysis of Fork-Join Parallel Programs

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting

CSE373: Data Structures & Algorithms Lecture 22: Parallel Reductions, Maps, and Algorithm Analysis. Kevin Quinn Fall 2015

CSE332: Data Abstractions Lecture 23: Data Races and Memory Reordering Deadlock Readers/Writer Locks Condition Variables. Dan Grossman Spring 2012

CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers

THREADS AND CONCURRENCY

THE FINAL EXAM MORE ON RACE CONDITIONS. Specific Topics. The Final. Book, Calculator, and Notes

Need for synchronization: If threads comprise parts of our software systems, then they must communicate.

Thread Safety. Review. Today o Confinement o Threadsafe datatypes Required reading. Concurrency Wrapper Collections

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University

Review: Easy Piece 1

Synchronization I. Jo, Heeseung

Lecture 29: Parallelism II

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [THREADS] Frequently asked questions from the previous class survey

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

RACE CONDITIONS AND SYNCHRONIZATION

CSE 332: Data Abstractions Lecture 22: Data Races and Memory Reordering Deadlock Readers/Writer Locks Condition Variables. Ruth Anderson Spring 2014

Total Score /15 /20 /30 /10 /5 /20 Grader

What next? CSE332: Data Abstractions Lecture 20: Parallel Prefix and Parallel Sorting. The prefix-sum problem. Parallel prefix-sum

CSE 373: Data Structures and Algorithms

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CS-XXX: Graduate Programming Languages. Lecture 20 Shared-Memory Parallelism and Concurrency. Dan Grossman 2012

Synchronization. Announcements. Concurrent Programs. Race Conditions. Race Conditions 11/9/17. Purpose of this lecture. A8 released today, Due: 11/21

1. Introduction to Concurrent Programming

Introducing Shared-Memory Concurrency

The New Java Technology Memory Model

Synchronization SPL/2010 SPL/20 1

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

Synchronization Lecture 24 Fall 2018

Lecture 31: Graph Search & Game 10:00 AM, Nov 16, 2018

Total Score /15 /20 /30 /10 /5 /20 Grader

CS 167 Final Exam Solutions

Sharing Objects Ch. 3

Synchronization I. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

The concept of concurrency is fundamental to all these areas.

Lecture 5: Synchronization w/locks

Stacks and Queues

Synchronization Lecture 23 Fall 2017

Atomicity via Source-to-Source Translation

Problems with Concurrency. February 19, 2014

CMSC 330: Organization of Programming Languages. Threads Classic Concurrency Problems

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

The Dining Philosophers Problem CMSC 330: Organization of Programming Languages

THREADS & CONCURRENCY

Reminder from last time

Advances in Programming Languages

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002

Principles of Software Construction: Objects, Design, and Concurrency. The Perils of Concurrency Can't live with it. Cant live without it.

Synchronization in Concurrent Programming. Amit Gupta

CMSC 330: Organization of Programming Languages. The Dining Philosophers Problem

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

Operating Systems. Synchronization

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

Week 7. Concurrent Programming: Thread Synchronization. CS 180 Sunil Prabhakar Department of Computer Science Purdue University

Advanced Threads & Monitor-Style Programming 1/24

THREADS & CONCURRENCY

Multi-threaded programming in Java

Cache Coherence and Atomic Operations in Hardware

sample exam Concurrent Programming tda383/dit390 Sample exam March 2016 Time:?? Place: Johanneberg

Deadlock CS 241. March 19, University of Illinois

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004

Operating Systems. Sina Meraji U of T

COMP250: Stacks. Jérôme Waldispühl School of Computer Science McGill University. Based on slides from (Goodrich & Tamassia, 2004)

Concurrency & Parallelism. Threads, Concurrency, and Parallelism. Multicore Processors 11/7/17

Threads, Concurrency, and Parallelism

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

Process Management And Synchronization

Concurrent Programming. Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University

Transcription:

Program Graph Lecture 25: Parallelism & Concurrency CS 62 Fall 2015 Kim Bruce & Michael Bannister Some slides based on those from Dan Grossman, U. of Washington Program using fork and join can be seen as directed acyclic graph (DAG). Nodes: pieces of work Edges: dependencies - source must finish before start destination fork join Fork command finishes node and makes two edges out: New thread & continuation of old Join ends node & makes new node w/ 2 edges coming in Performance What does it mean? Let T P be running time if there are P processors Work = T 1 = sum of run-time of all nodes in DAG Span = T = sum of run-time of all nodes on most expensive path in DAG Speed-up on P processors = T 1 /T P Guarantee: T P = O((T 1 / P) + T ) No implementation can beat O( T ) by more than constant factor. No implementation on P processors can beat O((T 1 / P) So framework on average gives best can do, assuming user did best possible. Bottom line: Focus on your algos, data structures, & cut-offs rather than # processors and scheduling. Just need T 1, T, and P to analyze running time

Examples Recall: T P = O((T 1 / P) + T ) For summing: T 1 = O(n) T = O(log n) So expect T p = O(n/P + log n) If instead: T 1 = O(n 2 ) T = O(n) Then expect T p = O(n 2 /P + n) Amdahl s Law Upper bound on speed-up! Suppose the work (time to run w/one processor) is 1 unit time. Let S be portion of execution that cannot be parallelized T 1 = S + (1 - S) = 1 Suppose get perfect speedup on parallel portion. T P = S + (1-S) / P Then overall speedup with P processors (Amdahl s law): T 1 / T P = 1 / (S + (1-S) / P) Parallelism ( processors) is: T 1 / T = 1 / S Bad News! Moral T 1 / T = 1 / S If 33% of program is sequential, then millions of processors won t give speedup over 3. From 1980-2005, every 12 years gave 100x speedup Now suppose clock speed is same but 256 processors instead of 1. To get 100x speedup, need 100 1/(S + (1-S)/P) Solve to get solution S.0061, so need 99.4% perfectly parallel. May not be able to speed up existing algos much, but might find new parallel algos. Can change what we compute Computer graphics now much better in video games with GPU s -- not much faster, but much more detail.

A Last Example: Sorting Quicksort, sequential, in-place, expected time O(n log n) Pick pivot elt O(1) Partition data into O(n) A: less than pivot B: pivot C: greater than pivot Recursively sort A, C 2*T(n/2) Now do in parallel, so T(n/2) n + n/2 + n/4... = 2n, which is O(n) With work, can improve more and get O(log 2 n) Shared Memory Concurrency Sharing Resources Have been studying parallel algorithms using fork-join Reduce span via parallel tasks Algorithms all had a very simple structure to avoid race conditions Each thread had memory only it accessed Example: array sub-range On fork, loaned some of its memory to forkee and did not access that memory again until after join on the forkee But... Strategy won t work well when: Memory accessed by threads is overlapping or unpredictable Threads are doing independent tasks needing access to same resources (rather than implementing the same algorithm) How do we control access?

Concurrent Programming Concurrency: Allowing simultaneous or interleaved access to shared resources from multiple clients Requires coordination, particularly synchronization to avoid incorrect simultaneous access: make somebody block join is not what we want block until another thread is done using what we need not completely done executing Non-Deterministic Computation Even correct concurrent applications are usually highly non-deterministic: how threads are scheduled affects what operations from other threads they see and when they see them. Non-repeatability complicates testing and debugging Examples Threads again?!? Multiple threads: Processing different bank-account operations What if 2 threads change the same account at the same time? Using a shared cache of recent files What if 2 threads insert the same file at the same time? Creating pipeline w/ queue for handing work to next thread in sequence? What if enqueuer and dequeuer adjust a circular array queue at the same time? Not about speed, but Code structure for responsiveness Example: Respond to GUI events in one thread while another thread is performing an expensive computation Processor utilization (mask I/O latency) If 1 thread goes to disk, have something else to do Failure isolation Convenient structure if want to interleave multiple tasks and don t want an exception in one to stop the other

Sharing is the Key Canonical Example Common to have: Different threads access the same resources in an unpredictable order or even at about the same time But program correctness requires that simultaneous access be prevented using synchronization Simultaneous access is rare Makes testing difficult Must be much more disciplined when designing / implementing a concurrent program Will discuss common idioms known to work Several ATM s accessing same account. See ATM2 Bad Interleavings Interleaved changebalance(-100) calls on the same account Assume initial balance 150 Time Thread 1 Thread 2 int nb = b + amount; if(nb < 0) throw new ; balance = nb; Lost withdraw unhappy bank int nb = b + amount; if(nb < 0) throw new ; balance = nb; Interleaving is the Problem Suppose: Thread T1 calls changebalance(-100) Thread T2 calls changebalance(-100) If second call starts before first finishes, we say the calls interleave Could happen even with one processor since a thread can be pre-empted at any point for time-slicing If x and y refer to different accounts, no problem You cook in your kitchen while I cook in mine But if x and y alias, possible trouble

Problems with Account Solve with Mutual Exclusion Get wrong answers! Try to fix by getting balance again, rather than using newbalance. Still can have interleaving, though less likely Can go negative w/ wrong interleaving! At most one thread withdraws from account A at one time. Areas where don t want two threads executing called critical sections. Programmer needs to decide where, as compiler doesn t know intentions. Java Solution Re-entrant locks via synchronized blocks Syntax: synchronized (expression) {statements Evaluates expression to an object and tries to grab it as a lock If no other process is holding it, grabs it and executes statements. Releasing when finishes statements. If another process is holding it, waits until it is released. Net result: Only one thread at a time can execute a synchronized block w/same lock Correct Code public class Account { private mylock = new Object( );... // return balance public int getbalance() { synchronized(mylock){ return balance; // update balance by adding amount public void changebalance(int amount) { synchronized(mylock) { int newbalance = balance + amount; display.settext("" + newbalance); balance = newbalance;

Better Code public class Account {... // return balance public int getbalance() { synchronized(this){ return balance; // update balance by adding amount public void changebalance(int amount) { synchronized(this) { int newbalance = balance + amount; display.settext("" + newbalance); balance = newbalance; Best Code public class Account {... // return balance synchronized public int getbalance() { return balance; // update balance by adding amount synchronized public void changebalance(int amount) { int newbalance = balance + amount; display.settext("" + newbalance); balance = newbalance; Reentrant Locks If thread holds lock when executing code, then further method calls within block don t need to reacquire same lock. E.g., Methods m and n are both synchronized with same lock (e.g., with this), and execution of m results in calling n. Then once thread has the lock executing m, no delay in calling n. Responsiveness

Maze Program Non-Event-Driven Programming Uses stack to solve a maze. When user clicks solve maze button, spawns Thread to solve maze. What happens if send run instead of start? Program in control. Program can ask for input at any point, with program control depending on input. But user can t interrupt program Only give input when program ready Event-Driven Programming Control inverted. User takes action, program responds GUI components (buttons, mouse, etc.) have listeners associated with them that are to be notified when component generates an event. Listeners then take action to respond to event. Event-Driven Programming in Java When an event occurs, it is posted to appropriate event queue. Java GUI components share an event queue. Any thread can post to the queue Only the event thread can remove event from the queue. When event removed from queue, thread executes the appropriate method of listener w/ event as parameter.

Example: Maze-Solver Listeners Start button StartListener object Clear button ClearAndChooseListener Maze choice ClearAndChooseListener Speed slider SpeedListener Different kinds of GUI items require different kinds of listeners: Button -- ActionListener Mouse -- MouseListener, MouseMotionListener Slider -- ChangeListener See GUI cheatsheet on documentation web page Event Thread Why did Maze Freeze? Removes events from queue Executes appropriate methods in listeners Also handles repaint events Must remain responsive! Code must complete and return quickly If not, then spawn new thread! Solver animation was being run by event thread Because didn t return until solved, was not available to remove events from queue. Could not respond to GUI controls Could not paint screen

Off to the Races A race condition occurs when the computation result depends on scheduling (how threads are interleaved). Answer depends on shared state. Bugs that exist only due to concurrency No interleaved scheduling with 1 thread Typically, problem is some intermediate state that messes up a concurrent thread that sees that state Example class Stack<E> { synchronized void push(e val) { synchronized E pop() { if(isempty()) throw new StackEmptyException(); E peek() { E ans = pop(); push(ans); return ans; Sequentially Fine Concurrently Flawed Correct in sequential world May need to write this way, if only have access to push, pop, & isempty methods. peek() has no overall effect on data structure reads rather than writes Way it s implemented creates an inconsistent intermediate state Even though calls to push and pop are synchronized so no data races on the underlying array/list/whatever (A data race is simultaneous (unsynchronized) read/write or write/write of the same memory: more on this soon) This intermediate state should not be exposed Leads to several wrong interleavings

Lose Invariants Solution Want: If there is at least one push and no pops, then isempty always returns false. Fails with two threads if one is doing a peek, other isempty, & unlucky. Gets worse: Can lose LIFO property Problem do push while doing peek. Want: If # pushes > # pops then peek never throws an exception. Can fail if two threads do simultaneous peeks Make peek synchronized (w/same lock) No problem with internal calls to push and pop because locks reentrant Just because all changes to state done within synchronized pushes and pops doesn t prevent exposing intermediate state. Beware of Accessing Changing Data Re-entrant locks allows calls to push and pop if use same lock Even if unsynchronized methods don t change it. From within Stack class Stack<E> { synchronized E peek(){ E ans = pop(); push(ans); return ans; From outside Stack class C { <E> E mypeek(stack<e> s){ synchronized (s) { E ans = s.pop(); s.push(ans); return ans; class Stack<E> { private E[] array = (E[])new Object[SIZE]; int index = -1; boolean isempty() { // unsynchronized: wrong?! return index==-1; synchronized void push(e val) { array[++index] = val; synchronized E pop() { return array[index--]; E peek() { // unsynchronized: wrong! return array[index];

Providing Safe Access For every memory location (e.g., object field) in your program, you must obey at least one of the following: Thread-local: Don t access the location in > 1 thread Immutable: Don t write to the memory location Synchronized: Use synchronization to control access to the location need synchronization all memory thread-local memory immutable memory