Understanding Task Scheduling Algorithms. Kenjiro Taura

Size: px
Start display at page:

Download "Understanding Task Scheduling Algorithms. Kenjiro Taura"

Transcription

1 Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 48

2 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary 2 / 48

3 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary 3 / 48

4 Introduction T6 T8 T5 T7 T9 T10 T13 T16 T17 T18 T19 T15 T12 T14 T20 T21 T22 T4 T11 T23 T24 T25 T51 T52 T3 T29 T30 T50 T34 T26 T27 T28 T49 T53 T54 T33 T35 T36 T47 T48 T55 T46 T32 T37 T45 T56 T57 T58 T59 T2 T31 T60 T38 T39 T44 T61 T43 T63 T64 T42 T62 T65 T69 T70 T68 T71 T89 T67 T72 T73 T86 T88 T41 T66 T90 T91 T74 T75 T85 T87 T92 T76 T98 T99 T84 T95 T97 T1 T40 T79 T77 T78 T82 T80 T81 T83 T101 T100 T94 T96 T108 T107 T109 T93 T106 T110 T105 T111 T112 T104 T120 T114 T113 T115 T117 T116 T121 T118 T119 T102 T103 T124 T125 T126 T127 T0 T122 T123 T132 T133 T128 T129 T130 T131 T134 T137 T138 T135 T136 T153 T154 T152 T139 T140 T141 T142 T155 T156 T143 T144 T164 T158 T146 T163 T166 T165 T167 T171 T168 T169 T170 T157 T159 T160 T145 T147 T150 T148 T149 T151 T161 T162 T172 T173 T175 T184 T174 T176 T181 T177 T178 T179 T180 T185 T182 T183 T187 T186 T188 T190 T189 T193 T191 T192 T195 T194 T196 T198 T197 T199 in this part, we study how tasks in task parallel programs are scheduled what can we expect about its performance void ms(elem * a, elem * a_end, elem * t, int dest) { long n = a_end - a; if (n == 1) {... } else {... create task(ms(a, c, t, 1 - dest)); ms(c, a_end, t + nh, 1 - dest); wait tasks; } } 4 / 48

5 Goals understand a representative scheduling algorithm (work stealing scheduler) execution time (without modeling communication): how much time does a scheduler take to finish a computation? in particular, how close is it to greedy schedulers? data access (communication) cost: when a computation is executed in parallel by a scheduler, how much data are transferred (caches memory, caches cache)? in particular, how much are they worse (or better) than those of the serial execution? 5 / 48

6 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary 6 / 48

7 Model of computation assume a program performs the following operations create task(s): create a task that performs S wait tasks: waits for completion of tasks it has created (but has not waited for) e.g., 1 int fib(n) { 2 if (n < 2) return 1; 3 else { 4 int x, y; 5 create_task({ x = fib(n - 1); }); // share x 6 y = fib(n - 2); 7 wait_tasks; 8 return x + y; 9 } 10 } 7 / 48

8 Model of computation model an execution as a DAG (directed acyclic graph) node: a sequence of instructions edge: dependency assume no other dependencies besides induced by create task(s) and wait tasks e.g., (note C 1 and C 2 may be subgraphs, not single nodes) 1 P 1 2 create_task(c 1 ); 3 P 2 4 create_task(c 2 ); 5 P 3 6 wait_tasks; 7 P 4 P 1 P 2 C 1 P 3 C 2 P 4 8 / 48

9 Terminologies and remarks a single node in the DAG represents a sequence of instructions performing no task-related operations note that a task a single node, but = a sequence of nodes C 1 P 1 P 2 we say a node is ready when all its predecessors have finished; we say a task is ready to mean a node of it becomes ready C 2 P 4 P 3 9 / 48

10 Work stealing scheduler a state of the art scheduler of task parallel systems the main ideas invented in 1990: Mohr, Kranz, and Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. ACM conference on LISP and functional programming. originally termed Lazy Task Creation, but essentially the same strategy is nowadays called work stealing 10 / 48

11 Work stealing scheduler: data structure top bottom ready deques executing tasks W 0 W 1 W 2 W n 1 ready tasks each worker maintains its ready deque that contains ready tasks the top entry of each ready deque is an executing task 11 / 48

12 Work stealing scheduler : in a nutshell work-first; when creating a task, the created task gets executed first (before the parent) LIFO execution order within a worker; without work stealing, the order of execution is as if it were a serial program create task(s) S wait tasks noop FIFO stealing; it partitions tasks at points close to the root of the task tree it is a practical approximation of a greedy scheduler, in the sense that any ready task can be (eventually) stolen by any idle worker 12 / 48

13 Work stealing scheduler in action describing a scheduler boils down to defining actions on each of the following events (1) create task (2) a worker becoming idle (3) wait tasks (4) a task termination we will see them in detail 13 / 48

14 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (1) worker W encounters create task(s): 14 / 48

15 Work stealing scheduler in action top S P bottom W 0 W 1 W 2 W n 1 (1) worker W encounters create task(s): 1 W pushes S to its deque 2 and immediately starts executing S 14 / 48

16 Work stealing scheduler in action top S P bottom W 0 W 1 W 2 W n 1 (2) a worker with empty deque repeats work stealing: 14 / 48

17 Work stealing scheduler in action top S P bottom W 0 W 1 W 2 W n 1 (2) a worker with empty deque repeats work stealing: 1 picks a random worker V as the victim 14 / 48

18 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (2) a worker with empty deque repeats work stealing: 1 picks a random worker V as the victim 2 steals the task at the bottom of V s deque 14 / 48

19 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (3) a worker W encounters wait tasks: there are two cases 14 / 48

20 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (3) a worker W encounters wait tasks: there are two cases 1 tasks to wait for have finished W just continues the task 14 / 48

21 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (3) a worker W encounters wait tasks: there are two cases 1 tasks to wait for have finished W just continues the task 14 / 48

22 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (3) a worker W encounters wait tasks: there are two cases 1 tasks to wait for have finished W just continues the task 2 otherwise pops the task from its deque (the task is now blocked, and W will start work stealing) 14 / 48

23 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (3) a worker W encounters wait tasks: there are two cases 1 tasks to wait for have finished W just continues the task 2 otherwise pops the task from its deque (the task is now blocked, and W will start work stealing) 14 / 48

24 Work stealing scheduler in action top bottom W 0 W 1 W 2 W n 1 (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 14 / 48

25 Work stealing scheduler in action top bottom T W 0 W 1 W 2 W n 1 P (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 1 P is blocked and now becomes ready again W enqueues and continues to P 14 / 48

26 Work stealing scheduler in action top bottom P W 0 W 1 W 2 W n 1 (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 1 P is blocked and now becomes ready again W enqueues and continues to P 14 / 48

27 Work stealing scheduler in action top T P bottom W 0 W 1 W 2 W n 1 (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 1 P is blocked and now becomes ready again W enqueues and continues to P 2 other cases no particular action; continues to the next task in its deque or starts work stealing if it becomes empty 14 / 48

28 Work stealing scheduler in action top P bottom W 0 W 1 W 2 W n 1 (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 1 P is blocked and now becomes ready again W enqueues and continues to P 2 other cases no particular action; continues to the next task in its deque or starts work stealing if it becomes empty 14 / 48

29 Work stealing scheduler in action top bottom T P W 0 W 1 W 2 W n 1 (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 1 P is blocked and now becomes ready again W enqueues and continues to P 2 other cases no particular action; continues to the next task in its deque or starts work stealing if it becomes empty 14 / 48

30 Work stealing scheduler in action top bottom P W 0 W 1 W 2 W n 1 (4) when W encounters the termination of a task T, W pops T from its deque. there are two cases about T s parent P : 1 P is blocked and now becomes ready again W enqueues and continues to P 2 other cases no particular action; continues to the next task in its deque or starts work stealing if it becomes empty 14 / 48

31 A note about the cost of operations top S P bottom W 0 W 1 W 2 W n 1 with work stealing, the cost of a task creation is cheap, unless its parent is stolen 1 a task gets created 2 the control jumps to the new task 3 when finished, the control returns back to the parent (as it has not been stolen) 15 / 48

32 A note about the cost of operations top S P bottom W 0 W 1 W 2 W n 1 much like a procedure call, except: the parent and the child each needs a separate stack, as the parent might be executed concurrently with the child, as the parent might be executed without returning from the child, the parent generally cannot assume callee save registers are preserved the net overhead is instructions, from task creation to termination 15 / 48

33 What you must remember when using work stealing systems when using (good) work stealing scheduler, don t try to match the number of tasks to the number of processors bad idea 1: create P tasks when you have P processors bad idea 2 (task pooling): keep exactly P tasks all the time and let each grab work they are effective with OS-managed threads or processes but not with work stealing schedulers remember: keep the granularity of a task above a constant factor of task creation overhead (so the relative overhead is a sufficiently small constant. e.g., 2%) good idea: make the granularity 5000 cycles 16 / 48

34 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary 17 / 48

35 Analyzing execution time of work stealing we analyze execution time of work stealing scheduler in terms of T 1 (total work) and T (critical path) Blumofe et al. Scheduling multithreaded computations by work stealing Journal of the ACM 46(5) due to the random choices of victims, the upper bound is necessarily probabilistic (e.g., T P with a probability ) for mathematical simplicity, we are satisfied with the result about average execution time 18 / 48

36 Analyzing execution time of work stealing the main result: with P processors, E(T P ) T 1 P + ct, with c a small constant reflecting the cost of a work steal remember the greedy scheduler theorem? T P T 1 P + T 19 / 48

37 Recap : DAG model recall the DAG model of computation T 1 : total work (= execution time by a single processor) T : critical path (= execution time by an arbitrarily many processors) T P : execution time by P processors two obvious lower bounds of execution time of any scheduler: T P T 1 P, T P T T 20 / 48

38 Recap : greedy scheduler greedy scheduler: a worker is never idle, as long as any ready task is left ready nodes the greedy scheduler theorem: any greedy scheduler achieves the following upper bound T P T 1 P + T considering both T 1 and T P are lower bounds, this shows any greedy scheduler is within a factor of two of optimal 21 / 48

39 Proof of the greedy scheduler theorem : settings for the sake of simplifying analysis, assume all nodes take a unit time to execute (longer nodes can be modeled by a chain of unit-time nodes) there are P workers workers execute in a lockstep manner 22 / 48

40 Proof of the greedy scheduler theorem : settings ready nodes in each time step, either of the following happens 23 / 48

41 Proof of the greedy scheduler theorem : settings ready nodes in each time step, either of the following happens (S) there are P ready tasks each worker executes any ready task (Saturated) 23 / 48

42 Proof of the greedy scheduler theorem : settings ready nodes in each time step, either of the following happens (S) there are P ready tasks each worker executes any ready task (Saturated) (U) there are P ready tasks each ready task is executed by any worker (Unsaturated) 23 / 48

43 Proof of the greedy scheduler theorem : settings ready nodes in each time step, either of the following happens (S) there are P ready tasks each worker executes any ready task (Saturated) (U) there are P ready tasks each ready task is executed by any worker (Unsaturated) 23 / 48

44 Proof of the greedy scheduler theorem consider a path from the start node to the end node, of which a node is always ready (let s call such a path a ready path) exercise: prove there is a ready path terminated ready 24 / 48

45 Proof of the greedy scheduler theorem consider a path from the start node to the end node, of which a node is always ready (let s call such a path a ready path) exercise: prove there is a ready path at each step, either of the following must happen terminated ready 24 / 48

46 Proof of the greedy scheduler theorem consider a path from the start node to the end node, of which a node is always ready (let s call such a path a ready path) exercise: prove there is a ready path at each step, either of the following must happen (S) all P workers execute a node, or terminated ready 24 / 48

47 Proof of the greedy scheduler theorem consider a path from the start node to the end node, of which a node is always ready (let s call such a path a ready path) exercise: prove there is a ready path at each step, either of the following must happen (S) all P workers execute a node, or (U) the ready node in the path gets executed terminated ready 24 / 48

48 Proof of the greedy scheduler theorem consider a path from the start node to the end node, of which a node is always ready (let s call such a path a ready path) exercise: prove there is a ready terminated path ready at each step, either of the following must happen (S) all P workers execute a node, or (U) the ready node in the path gets executed (S) happens T 1 /P times and (U) T times 24 / 48

49 Proof of the greedy scheduler theorem consider a path from the start node to the end node, of which a node is always ready (let s call such a path a ready path) exercise: prove there is a ready terminated path ready at each step, either of the following must happen (S) all P workers execute a node, or (U) the ready node in the path gets executed (S) happens T 1 /P times and (U) T times therefore, the end node will be executed in T P (T 1 /P + T ) steps 24 / 48

50 What about work stealing scheduler? what is the difference between the genuine greedy scheduler and work stealing scheduler? the greedy scheduler finds ready tasks with zero delays any practically implementable scheduler will inevitably cause some delays, from the time a node becomes ready until the time it gets executed in the work stealing scheduler, the delay is the time to randomly search other workers deques for ready tasks, without knowing which deques have tasks to steal 25 / 48

51 Analyzing work stealing scheduler : settings a similar setting with greedy scheduler all nodes take a unit time P workers 26 / 48

52 Analyzing work stealing scheduler : settings a similar setting with greedy scheduler all nodes take a unit time P workers each worker does the following in each time step its deque is not empty executes the node designated by the algorithm its deque is empty attempts to steal a node; if it succeeds, the node will be executed in the next step 26 / 48

53 Analyzing work stealing scheduler : settings a similar setting with greedy scheduler all nodes take a unit time P workers each worker does the following in each time step its deque is not empty executes the node designated by the algorithm its deque is empty attempts to steal a node; if it succeeds, the node will be executed in the next step remarks on work stealing attempts if the chosen victim has no ready tasks (besides the one executing), an attempt fails when two or more workers choose the same victim, only one can succeed in a single time 26 / 48

54 The overall strategy of the proof steal attempts P T 1 T P in each time step, each processor is either executes a node or attempts a steal if we can estimate the number of steal attempts (succeeded or not), we can estimate the execution time by: T P = T 1 + steal attempts P 27 / 48

55 Analyzing work stealing scheduler ready how many steal attempts? executed similar to the proof of greedy scheduler, consider a ready path (a path from the start node to the end node, of which a node is always ready) the crux is how many steal attempts are enough to make a progress along a ready path 28 / 48

56 The number of steal attempts: the key observation a task at the bottom of the deque will be stolen by a steal attempt with probability 1/P thus, on average, such a task will be stolen within P steal attempts, and is executed in the next step create task P steal attempts we are going to extend the argument to any ready task along a ready path, and establish an average number of steal attempts for any ready task to get executed 29 / 48

57 How many steal attempts to occur for any ready task to get executed? there are five types of edges (A) create task child (B) create task the continuation (C) wait task the continuation (D) last node of a task the parent s continuation after the corresponding wait (E) non-task node the continuation create task (A) create task (B) wait tasks (C) wait tasks (D) 30 / 48

58 How many steal attempts to occur for any ready task to get executed? the successor of type (A), (C), (D), and (E) is executed immediately after the predecessor is executed (recall the path is a ready path) create task (A) create task (B) wait tasks (C) wait tasks (D) 30 / 48

59 How many steal attempts to occur for any ready task to get executed? only the successor of type (B) edges may need a steal attempt to get executed as has been discussed, tasks at the bottom of a deque need P attempts on average create task (A) create task (B) wait tasks (C) wait tasks (D) 30 / 48

60 How many steal attempts to occur for any ready task to get executed? a subtlety arises, as ready tasks are not necessarily at the bottom of a deque y cannot be stolen until x has been stolen create task (A) x create task (B) y wait tasks (C) wait tasks (D) 30 / 48

61 When a task becomes stealable? create task in general, in order for the continuation of a create task (n) to be stolen, create task continuations of all create tasks along create task the path from the start node to n must be stolen so we define a node n to be critical iff it is ready and for all create task nodes along the create task n path from the start node to n, its continuation node has been executed then we have a critical node is at the bottom of a deque; and a critical node is, on average, executed within P steal attempts a ready path 31 / 48

62 Summary of the proof (1) there is a ready path (2) define a critical node as in the previous slide (3) with this definition a critical node will be stolen, on average, within P steal attempts (4) a critical node will be executed one time step after stolen, so will be finished within 2P steal attempts (5) the length of any ready path is T (6) from (4) and (5), any path will take 2T P steal attempts to finish (7) therefore E(T P ) T 1 + 2T P P T 1 P + 2T 32 / 48

63 Extensions we assumed a steal attempt takes a single time step, but it can be generalized; if a steal attempt takes a time steps, E(T P ) T 1 P + 2aT we can also probabilistically bound the execution time the basis is the probability that a critical node takes cp steal attempts to be executed is e c ( 1 1 ) cp e c P based on this we bound the probability that a path of length l takes ClP steal attempts, for a large enough constant C 33 / 48

64 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary 34 / 48

65 Analyzing cache misses of work stealing we like to know the amount of data transfers between a processor s cache and { main memory, other caches }, under a task parallel scheduler in particular, we like to understand how is it worse (or better) than its serial execution hardware thread (virtual core, CPU) L1 cache L2 cache memory controller L3 cache 35 / 48

66 An analysis methodology of serial computation we have learned how to analyze data transfer between the cache and main memory, in single processor machines the key was to identify cache-fitting subcomputations (working set size C words) a cache-fitting subcomputation induces C words data transfers cache main memory capacity C capacity C C C 36 / 48

67 Minor remarks (data transfer vs. cache misses) we hereafter identify a single data transfer from/to a cache and a single cache miss in real machines, some data transfers do not lead to cache misses due to prefetches we use a simpler term cache misses to mean any data transfer from/to a cache 37 / 48

68 What s different in parallel execution? the argument that cache misses by a cache-fitting subcomputation C no longer holds in parallel execution consider two subcomputations A and B 1 create_task({ A }); 2 B A B assume A and B together fit in the cache a parallel execution may execute A and B on different processors, which may turn what would originally be a cache hit in B into a miss C 38 / 48

69 What s different in parallel execution? so a parallel execution might increase cache misses but how much? The data locality of work stealing. SPAA 00 Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures. 39 / 48

70 Problem settings P processors ( P workers) caches are private to each processor (no shared caches) consider only a single-level cache, with the capacity of C words LRU replacement: i.e., a cache holds most recently accessed C distinct words caches capacity of each C main memory capacity 40 / 48

71 The key observation we have learned the work stealing scheduler tends to preserve much of the serial execution order extra cache misses are caused by work stealings a work stealing essentially brings a subcomputation to a processor with unknown cache states A B????? 41 / 48

72 The key questions key question 1: how many extra misses can occur when a subcomputation moves to an unknown cache states? hit miss initial cache states (different) executed instructions (identical) key question 2: how many times work stealings happen? (we know an answer) 42 / 48

73 Roadmap (1) bound the number of extra cache misses that occurs each time a node is drifted (i.e., executed in a different order with the serial execution) (2) we know an upper bound on the number of steals (3) from (2), bound the number of drifted nodes combine (1) and (3) to derive an upper bound on the total number of extra cache misses 43 / 48

74 Extra misses per drifted node when caches are LRU, the two cache states converge to an identical state, after no more than C cache misses occur in either cache initial cache states (different) C transfers (misses) C transfers (misses) this is because the cache is LRU (holds most recently accessed distinct C words) extra cache misses for each drifted node C 44 / 48

75 A bound on the drifted nodes let v a node in the DAG and u the node that would immediately precedes v in the serial execution we say v is drifted when u and v are not executed consecutively on the same processor without a detailed proof, we note: work stealing drifted would immediately precede a drifted node in the serial order the number of drifted nodes in the work stealing scheduler 2 the number of steals 45 / 48

76 The main result the average number of work stealings 2P T the average number of drifted nodes 4P T the average number of extra cache misses 4CP T average execution time T P T 1 P + 4mCT, where m is the cost of a single cache miss 46 / 48

77 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary 47 / 48

78 Summary the basics of work stealing scheduler work-first (preserve serial execution order) steal tasks from near the root average execution time (without cost of communication) T P T 1 P + 2T with the cost of communication T P T 1 P + 4mCT where mc essentially represents the time to fill the cache 48 / 48

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures

Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures Daniel Spoonhower Guy E. Blelloch Phillip B. Gibbons Robert Harper Carnegie Mellon University {spoons,blelloch,rwh}@cs.cmu.edu

More information

The Data Locality of Work Stealing

The Data Locality of Work Stealing The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon University umut@cs.cmu.edu Guy E. Blelloch School of Computer Science Carnegie Mellon University guyb@cs.cmu.edu

More information

Ownership of a queue for practical lock-free scheduling

Ownership of a queue for practical lock-free scheduling Ownership of a queue for practical lock-free scheduling Lincoln Quirk May 4, 2008 Abstract We consider the problem of scheduling tasks in a multiprocessor. Tasks cannot always be scheduled independently

More information

Today: Amortized Analysis (examples) Multithreaded Algs.

Today: Amortized Analysis (examples) Multithreaded Algs. Today: Amortized Analysis (examples) Multithreaded Algs. COSC 581, Algorithms March 11, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 17 (Amortized

More information

Thread Scheduling for Multiprogrammed Multiprocessors

Thread Scheduling for Multiprogrammed Multiprocessors Thread Scheduling for Multiprogrammed Multiprocessors Nimar S Arora Robert D Blumofe C Greg Plaxton Department of Computer Science, University of Texas at Austin nimar,rdb,plaxton @csutexasedu Abstract

More information

Analyzing Data Access of Algorithms and How to Make Them Cache-Friendly? Kenjiro Taura

Analyzing Data Access of Algorithms and How to Make Them Cache-Friendly? Kenjiro Taura Analyzing Data Access of Algorithms and How to Make Them Cache-Friendly? Kenjiro Taura 1 / 79 Contents 1 Introduction 2 Analyzing data access complexity of serial programs Overview Model of a machine An

More information

Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle

Cilk. Cilk In 2008, ACM SIGPLAN awarded Best influential paper of Decade. Cilk : Biggest principle CS528 Slides are adopted from http://supertech.csail.mit.edu/cilk/ Charles E. Leiserson A Sahu Dept of CSE, IIT Guwahati HPC Flow Plan: Before MID Processor + Super scalar+ Vector Unit Serial C/C++ Coding

More information

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY

CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY CILK/CILK++ AND REDUCERS YUNMING ZHANG RICE UNIVERSITY 1 OUTLINE CILK and CILK++ Language Features and Usages Work stealing runtime CILK++ Reducers Conclusions 2 IDEALIZED SHARED MEMORY ARCHITECTURE Hardware

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming

More information

CSE 421 Applications of DFS(?) Topological sort

CSE 421 Applications of DFS(?) Topological sort CSE 421 Applications of DFS(?) Topological sort Yin Tat Lee 1 Precedence Constraints In a directed graph, an edge (i, j) means task i must occur before task j. Applications Course prerequisite: course

More information

Topics. Trees Vojislav Kecman. Which graphs are trees? Terminology. Terminology Trees as Models Some Tree Theorems Applications of Trees CMSC 302

Topics. Trees Vojislav Kecman. Which graphs are trees? Terminology. Terminology Trees as Models Some Tree Theorems Applications of Trees CMSC 302 Topics VCU, Department of Computer Science CMSC 302 Trees Vojislav Kecman Terminology Trees as Models Some Tree Theorems Applications of Trees Binary Search Tree Decision Tree Tree Traversal Spanning Trees

More information

CS420: Operating Systems

CS420: Operating Systems Virtual Memory James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Background Code needs to be in memory

More information

A Framework for Space and Time Efficient Scheduling of Parallelism

A Framework for Space and Time Efficient Scheduling of Parallelism A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523

More information

Latency-Hiding Work Stealing

Latency-Hiding Work Stealing Latency-Hiding Work Stealing Stefan K. Muller April 2017 CMU-CS-16-112R Umut A. Acar School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 A version of this work appears in the proceedings

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Provably Efficient Non-Preemptive Task Scheduling with Cilk

Provably Efficient Non-Preemptive Task Scheduling with Cilk Provably Efficient Non-Preemptive Task Scheduling with Cilk V. -Y. Vee and W.-J. Hsu School of Applied Science, Nanyang Technological University Nanyang Avenue, Singapore 639798. Abstract We consider the

More information

Uninformed Search Methods

Uninformed Search Methods Uninformed Search Methods Search Algorithms Uninformed Blind search Breadth-first uniform first depth-first Iterative deepening depth-first Bidirectional Branch and Bound Informed Heuristic search Greedy

More information

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Verification of a Concurrent Deque Implementation

Verification of a Concurrent Deque Implementation Verification of a Concurrent Deque Implementation Robert D. Blumofe C. Greg Plaxton Sandip Ray Department of Computer Science, University of Texas at Austin rdb, plaxton, sandip @cs.utexas.edu June 1999

More information

Theory of Computing Systems 2002 Springer-Verlag New York Inc.

Theory of Computing Systems 2002 Springer-Verlag New York Inc. Theory Comput. Systems 35, 321 347 (2002) DOI: 10.1007/s00224-002-1057-3 Theory of Computing Systems 2002 Springer-Verlag New York Inc. The Data Locality of Work Stealing Umut A. Acar, 1 Guy E. Blelloch,

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements

More information

Chordal Graphs: Theory and Algorithms

Chordal Graphs: Theory and Algorithms Chordal Graphs: Theory and Algorithms 1 Chordal graphs Chordal graph : Every cycle of four or more vertices has a chord in it, i.e. there is an edge between two non consecutive vertices of the cycle. Also

More information

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof. Priority Queues 1 Introduction Many applications require a special type of queuing in which items are pushed onto the queue by order of arrival, but removed from the queue based on some other priority

More information

Graph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1

Graph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1 Graph Algorithms Chapter 22 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms? Mathematical graphs seem to be relatively specialized and abstract Why spend so much time and effort on algorithms

More information

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets. COMP3600/6466 Algorithms 2018 Lecture 12 1 Binary search trees Reading: Cormen et al, Sections 12.1 to 12.3 Binary search trees are data structures based on binary trees that support operations on dynamic

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

implementing the breadth-first search algorithm implementing the depth-first search algorithm

implementing the breadth-first search algorithm implementing the depth-first search algorithm Graph Traversals 1 Graph Traversals representing graphs adjacency matrices and adjacency lists 2 Implementing the Breadth-First and Depth-First Search Algorithms implementing the breadth-first search algorithm

More information

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.

The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism. Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,

More information

CS 240A: Shared Memory & Multicore Programming with Cilk++

CS 240A: Shared Memory & Multicore Programming with Cilk++ CS 240A: Shared Memory & Multicore rogramming with Cilk++ Multicore and NUMA architectures Multithreaded rogramming Cilk++ as a concurrency platform Work and Span Thanks to Charles E. Leiserson for some

More information

How to Solve Complex Problems in Parallel (Parallel Divide and Conquer and Task Parallelism) Kenjiro Taura

How to Solve Complex Problems in Parallel (Parallel Divide and Conquer and Task Parallelism) Kenjiro Taura How to Solve Complex Problems in Parallel (Parallel Divide and Conquer and Task Parallelism) Kenjiro Taura 1 / 97 Contents 1 Introduction 2 An example : k-d tree construction k-d tree 3 Parallelizing divide

More information

CSE 260 Lecture 19. Parallel Programming Languages

CSE 260 Lecture 19. Parallel Programming Languages CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014

More information

Project 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015

Project 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015 CSE 539 Project 3 Assigned: 03/17/2015 Due Date: 03/27/2015 Building a parallelism profiler for Cilk computations In this project, you will implement a simple serial tool for Cilk programs a parallelism

More information

Thread Scheduling for Multiprogrammed Multiprocessors

Thread Scheduling for Multiprogrammed Multiprocessors Thread Scheduling for Multiprogrammed Multiprocessors (Authors: N. Arora, R. Blumofe, C. G. Plaxton) Geoff Gerfin Dept of Computer & Information Sciences University of Delaware Outline Programming Environment

More information

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014 GC (Chapter 14) Eleanor Ainy December 16 th 2014 1 Outline of Today s Talk How to use parallelism in each of the 4 components of tracing GC: Marking Copying Sweeping Compaction 2 Introduction Till now

More information

DAGViz: A DAG Visualization Tool for Analyzing Task-Parallel Program Traces

DAGViz: A DAG Visualization Tool for Analyzing Task-Parallel Program Traces DAGViz: A DAG Visualization Tool for Analyzing Task-Parallel Program Traces An Huynh University of Tokyo, Japan Douglas Thain University of Notre Dame, USA Miquel Pericas Chalmers University of Technology,

More information

A Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores

A Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores A Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores Marc Tchiboukdjian Vincent Danjean Thierry Gautier Fabien Le Mentec Bruno Raffin Marc Tchiboukdjian A Work Stealing Scheduler for

More information

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory. Demand Paging

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory. Demand Paging TDDB68 Concurrent programming and operating systems Overview: Virtual Memory Virtual Memory [SGG7/8] Chapter 9 Background Demand Paging Page Replacement Allocation of Frames Thrashing and Data Access Locality

More information

Constraint Satisfaction Problems

Constraint Satisfaction Problems Constraint Satisfaction Problems Look-Back Malte Helmert and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 5, 2007 S. Wölfl, M. Helmert (Universität Freiburg) Constraint Satisfaction Problems June

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Introduction and Overview

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Introduction and Overview Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Introduction and Overview Welcome to Analysis of Algorithms! What is an Algorithm? A possible definition: a step-by-step

More information

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due October 15th, beginning of class October 1, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,

More information

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo PAGE REPLACEMENT Operating Systems 2015 Spring by Euiseong Seo Today s Topics What if the physical memory becomes full? Page replacement algorithms How to manage memory among competing processes? Advanced

More information

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson Multithreaded Programming in Cilk LECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Address spaces and memory management

Address spaces and memory management Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address

More information

Processes. Process Concept

Processes. Process Concept Processes These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each slide

More information

Virtual Memory. ICS332 Operating Systems

Virtual Memory. ICS332 Operating Systems Virtual Memory ICS332 Operating Systems Virtual Memory Allow a process to execute while not completely in memory Part of the address space is kept on disk So far, we have assumed that the full address

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees Computer Science 0 Data Structures Siena College Fall 08 Topic Notes: Trees We ve spent a lot of time looking at a variety of structures where there is a natural linear ordering of the elements in arrays,

More information

Recursion. What is Recursion? Simple Example. Repeatedly Reduce the Problem Into Smaller Problems to Solve the Big Problem

Recursion. What is Recursion? Simple Example. Repeatedly Reduce the Problem Into Smaller Problems to Solve the Big Problem Recursion Repeatedly Reduce the Problem Into Smaller Problems to Solve the Big Problem What is Recursion? A problem is decomposed into smaller sub-problems, one or more of which are simpler versions of

More information

The Parallel Persistent Memory Model

The Parallel Persistent Memory Model The Parallel Persistent Memory Model Guy E. Blelloch Phillip B. Gibbons Yan Gu Charles McGuffey Julian Shun Carnegie Mellon University MIT CSAIL ABSTRACT We consider a parallel computational model, the

More information

Virtual Memory COMPSCI 386

Virtual Memory COMPSCI 386 Virtual Memory COMPSCI 386 Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception

More information

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing

Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Uni-Address Threads: Scalable Thread Management for RDMA-based Work Stealing Shigeki Akiyama, Kenjiro Taura The University of Tokyo June 17, 2015 HPDC 15 Lightweight Threads Lightweight threads enable

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

Keeping Order:! Stacks, Queues, & Deques. Travis W. Peters Dartmouth College - CS 10

Keeping Order:! Stacks, Queues, & Deques. Travis W. Peters Dartmouth College - CS 10 Keeping Order:! Stacks, Queues, & Deques 1 Stacks 2 Stacks A stack is a last in, first out (LIFO) data structure Primary Operations: push() add item to top pop() return the top item and remove it peek()

More information

Binary search trees 3. Binary search trees. Binary search trees 2. Reading: Cormen et al, Sections 12.1 to 12.3

Binary search trees 3. Binary search trees. Binary search trees 2. Reading: Cormen et al, Sections 12.1 to 12.3 Binary search trees Reading: Cormen et al, Sections 12.1 to 12.3 Binary search trees 3 Binary search trees are data structures based on binary trees that support operations on dynamic sets. Each element

More information

VIRTUAL MEMORY READING: CHAPTER 9

VIRTUAL MEMORY READING: CHAPTER 9 VIRTUAL MEMORY READING: CHAPTER 9 9 MEMORY HIERARCHY Core! Processor! Core! Caching! Main! Memory! (DRAM)!! Caching!! Secondary Storage (SSD)!!!! Secondary Storage (Disk)! L cache exclusive to a single

More information

Algorithm 23 works. Instead of a spanning tree, one can use routing.

Algorithm 23 works. Instead of a spanning tree, one can use routing. Chapter 5 Shared Objects 5.1 Introduction Assume that there is a common resource (e.g. a common variable or data structure), which different nodes in a network need to access from time to time. If the

More information

The Relative Power of Synchronization Methods

The Relative Power of Synchronization Methods Chapter 5 The Relative Power of Synchronization Methods So far, we have been addressing questions of the form: Given objects X and Y, is there a wait-free implementation of X from one or more instances

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Finding a -regular Supergraph of Minimum Order

Finding a -regular Supergraph of Minimum Order Finding a -regular Supergraph of Minimum Order Hans L. Bodlaender a, Richard B. Tan a,b and Jan van Leeuwen a a Department of Computer Science Utrecht University Padualaan 14, 3584 CH Utrecht The Netherlands

More information

Space vs Time, Cache vs Main Memory

Space vs Time, Cache vs Main Memory Space vs Time, Cache vs Main Memory Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Space vs Time, Cache vs Main Memory CS 4435 - CS 9624 1 / 49

More information

Week 6. Data structures

Week 6. Data structures 1 2 3 4 5 n Week 6 Data structures 6 7 n 8 General remarks We start considering Part III Data Structures from CLRS. As a first example we consider two special of buffers, namely stacks and queues. Reading

More information

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory TDIU Operating systems Overview: Virtual Memory Virtual Memory Background Demand Paging Page Replacement Allocation of Frames Thrashing and Data Access Locality [SGG7/8/9] Chapter 9 Copyright Notice: The

More information

On the cost of managing data flow dependencies

On the cost of managing data flow dependencies On the cost of managing data flow dependencies - program scheduled by work stealing - Thierry Gautier, INRIA, EPI MOAIS, Grenoble France Workshop INRIA/UIUC/NCSA Outline Context - introduction of work

More information

Scheduling Multithreaded Computations by Work Stealing. Robert D. Blumofe. Charles E. Leiserson. 545 Technology Square. Cambridge, MA 02139

Scheduling Multithreaded Computations by Work Stealing. Robert D. Blumofe. Charles E. Leiserson. 545 Technology Square. Cambridge, MA 02139 Scheduling Multithreaded Computations by Work Stealing Robert D. Blumofe Charles E. Leiserson MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 Abstract This paper studies the

More information

CPSC 221: Algorithms and Data Structures Lecture #1: Stacks and Queues

CPSC 221: Algorithms and Data Structures Lecture #1: Stacks and Queues CPSC 221: Algorithms and Data Structures Lecture #1: Stacks and Queues Alan J. Hu (Slides borrowed from Steve Wolfman) Be sure to check course webpage! http://www.ugrad.cs.ubc.ca/~cs221 1 Lab 1 is available.

More information

Operating Systems Virtual Memory. Lecture 11 Michael O Boyle

Operating Systems Virtual Memory. Lecture 11 Michael O Boyle Operating Systems Virtual Memory Lecture 11 Michael O Boyle 1 Paged virtual memory Allows a larger logical address space than physical memory All pages of address space do not need to be in memory the

More information

MC302 GRAPH THEORY Thursday, 10/24/13

MC302 GRAPH THEORY Thursday, 10/24/13 MC302 GRAPH THEORY Thursday, 10/24/13 Today: Return, discuss HW 3 From last time: Greedy Algorithms for TSP Matchings and Augmenting Paths HW 4 will be posted by tomorrow Reading: [CH] 4.1 Exercises: [CH]

More information

Unit 2 Buffer Pool Management

Unit 2 Buffer Pool Management Unit 2 Buffer Pool Management Based on: Pages 318-323, 541-542, and 586-587 of Ramakrishnan & Gehrke (text); Silberschatz, et. al. ( Operating System Concepts ); Other sources Original slides by Ed Knorr;

More information

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel CS 267 Applications of Parallel Computers Lecture 23: Load Balancing and Scheduling James Demmel http://www.cs.berkeley.edu/~demmel/cs267_spr99 CS267 L23 Load Balancing and Scheduling.1 Demmel Sp 1999

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 18 Lecture 18/19: Page Replacement Memory Management Memory management systems Physical and virtual addressing; address translation Techniques: Partitioning,

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 and External Memory 1 1 (2, 4) Trees: Generalization of BSTs Each internal node

More information

Memory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging

Memory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging Memory Management Outline Operating Systems Processes (done) Memory Management Basic (done) Paging (done) Virtual memory Virtual Memory (Chapter.) Motivation Logical address space larger than physical

More information

3. Fundamental Data Structures

3. Fundamental Data Structures 3. Fundamental Data Structures CH08-320201: Algorithms and Data Structures 233 Data Structures Definition (recall): A data structure is a way to store and organize data in order to facilitate access and

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Data Structure. Recitation VII

Data Structure. Recitation VII Data Structure Recitation VII Recursion: Stack trace Queue Topic animation Trace Recursive factorial Executes factorial(4) Step 9: return 24 Step 8: return 6 factorial(4) Step 0: executes factorial(4)

More information

Problem with Scanning an Infix Expression

Problem with Scanning an Infix Expression Operator Notation Consider the infix expression (X Y) + (W U), with parentheses added to make the evaluation order perfectly obvious. This is an arithmetic expression written in standard form, called infix

More information

Load Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Load Balancing. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab. Load Balancing Minsoo Ryu Department of Computer Science and Engineering 2 1 Concepts of Load Balancing Page X 2 Load Balancing Algorithms Page X 3 Overhead of Load Balancing Page X 4 Load Balancing in

More information

Scheduling Parallel Programs by Work Stealing with Private Deques

Scheduling Parallel Programs by Work Stealing with Private Deques Scheduling Parallel Programs by Work Stealing with Private Deques Umut Acar Carnegie Mellon University Arthur Charguéraud INRIA Mike Rainey Max Planck Institute for Software Systems PPoPP 25.2.2013 1 Scheduling

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 B-Trees and External Memory 1 (2, 4) Trees: Generalization of BSTs Each internal

More information

Chapter 9: Virtual-Memory

Chapter 9: Virtual-Memory Chapter 9: Virtual-Memory Management Chapter 9: Virtual-Memory Management Background Demand Paging Page Replacement Allocation of Frames Thrashing Other Considerations Silberschatz, Galvin and Gagne 2013

More information

Compsci 590.3: Introduction to Parallel Computing

Compsci 590.3: Introduction to Parallel Computing Compsci 590.3: Introduction to Parallel Computing Alvin R. Lebeck Slides based on this from the University of Oregon Admin Logistics Homework #3 Use script Project Proposals Document: see web site» Due

More information

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science Virtual Memory CSCI 315 Operating Systems Design Department of Computer Science Notice: The slides for this lecture have been largely based on those from an earlier edition of the course text Operating

More information

Chapter 6: Demand Paging

Chapter 6: Demand Paging ADRIAN PERRIG & TORSTEN HOEFLER ( 5-006-00 ) Networks and Operating Systems Chapter 6: Demand Paging Source: http://redmine.replicant.us/projects/replicant/wiki/samsunggalaxybackdoor If you miss a key

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

JAVA CONCURRENCY FRAMEWORK. Kaushik Kanetkar

JAVA CONCURRENCY FRAMEWORK. Kaushik Kanetkar JAVA CONCURRENCY FRAMEWORK Kaushik Kanetkar Old days One CPU, executing one single program at a time No overlap of work/processes Lots of slack time CPU not completely utilized What is Concurrency Concurrency

More information

Linear Structures. Linear Structure. Implementations. Array details. List details. Operations 4/18/2013

Linear Structures. Linear Structure. Implementations. Array details. List details. Operations 4/18/2013 Linear Structure Linear Structures Chapter 4 CPTR 318 Every non-empty linear structure has A unique element called first A unique element called last Every element except last has a unique successor Every

More information

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Concurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists Companion slides for The by Maurice Herlihy & Nir Shavit Set Object Interface Collection of elements No duplicates Methods add() a new element remove() an element contains() if element

More information

Constraint Satisfaction Problems

Constraint Satisfaction Problems Constraint Satisfaction Problems Search and Lookahead Bernhard Nebel, Julien Hué, and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 4/6, 2012 Nebel, Hué and Wölfl (Universität Freiburg) Constraint

More information

UNINFORMED SEARCH. Announcements Reading Section 3.4, especially 3.4.1, 3.4.2, 3.4.3, 3.4.5

UNINFORMED SEARCH. Announcements Reading Section 3.4, especially 3.4.1, 3.4.2, 3.4.3, 3.4.5 UNINFORMED SEARCH Announcements Reading Section 3.4, especially 3.4.1, 3.4.2, 3.4.3, 3.4.5 Robbie has no idea where room X is, and may have little choice but to try going down this corridor and that. On

More information

Linked Lists: The Role of Locking. Erez Petrank Technion

Linked Lists: The Role of Locking. Erez Petrank Technion Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University. Operating Systems Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring 2014 Paul Krzyzanowski Rutgers University Spring 2015 March 27, 2015 2015 Paul Krzyzanowski 1 Exam 2 2012 Question 2a One of

More information

W4231: Analysis of Algorithms

W4231: Analysis of Algorithms W4231: Analysis of Algorithms 10/21/1999 Definitions for graphs Breadth First Search and Depth First Search Topological Sort. Graphs AgraphG is given by a set of vertices V and a set of edges E. Normally

More information

Assignment # 4 Selected Solutions

Assignment # 4 Selected Solutions Assignment # 4 Selected Solutions Problem 2.3.3 Let G be a connected graph which is not a tree (did you notice this is redundant?) and let C be a cycle in G. Prove that the complement of any spanning tree

More information

Section Summary. Introduction to Trees Rooted Trees Trees as Models Properties of Trees

Section Summary. Introduction to Trees Rooted Trees Trees as Models Properties of Trees Chapter 11 Copyright McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter Summary Introduction to Trees Applications

More information