The Manticore Project
|
|
- Homer Golden
- 6 years ago
- Views:
Transcription
1 The Manticore Project John Reppy University of Chicago January 20, 2009
2 Introduction People The Manticore project is a joint project between the University of Chicago and the Toyota Technological Institute. Lars Bergstrom Matthew Fluet Mike Rainey Adam Shaw Yingqi Xiao University of Chicago Toyota Technological Institute at Chicago University of Chicago University of Chicago University of Chicago with help from Nic Ford Jon Riehl Ridg Scott University of Chicago University of Chicago University of Chicago DAMP 2009 The Manticore Project 2
3 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3
4 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3
5 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3
6 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3
7 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. CML Erlang Java Manticore OpenMP Nesl Nepal Id ph SISAL Explicitly Threaded Implicitly Threaded Implicitly Parallel DAMP 2009 The Manticore Project 3
8 Introduction Talk overview This talk is divided into two parts: 1. Language design for heterogeneous parallel programming 2. The Manticore implementation. Please see for papers. DAMP 2009 The Manticore Project 4
9 Design Design DAMP 2009 The Manticore Project 5
10 Design Design overview Language design Our initial design is purposefully conservative. It can be summarized as the combination of three distinct sub-languages: A mutation-free subset of SML (no refs or arrays, but includes exceptions). Language mechanisms for implicitly-threaded parallel programming. Language mechanisms for explicitly-threaded parallel programming (a.k.a. concurrent programming). DAMP 2009 The Manticore Project 6
11 Design Implicit threading Language design (continued...) Manticore provides several light-weight syntactic forms for introducing parallel computation. These forms are hints to the compiler and runtime that a computation is a good candidate for parallel execution. Parallel arrays provide fine-grain data-parallel computations over sequences. Parallel tuples provide a basic fork-join parallel computation. Parallel bindings provide data-flow parallelism with cancelation of unused subcomputations. Parallel case provides non-deterministic speculative parallelism. DAMP 2009 The Manticore Project 7
12 Design Implicit threading Parallel arrays We support fine-grained data-parallel computation using a parallel array comprehension form (NESL/Nepal/DPH): [ exp pat i in exp i where pred ] For example, the parallel point-wise summing of two arrays: [ x+y x in xs, y in ys ] NOTE: zip semantics, not Cartesian-product semantics. DAMP 2009 The Manticore Project 8
13 Design Implicit threading Nested data parallelism (continued...) Mandelbrot set computation: fun x i = x0 + dx * itof i; fun y j = y0 - dy * itof j; fun loop (cnt, re, im) = if (cnt < 255) andalso (re*re + im*im > 4.0) then loop(cnt+1, re*re - re*im + re, 2.0*re*im + im) else cnt; [ [ loop(0, x i, y j) i in [ 0..N ] ] j in [ 0..N ] ] DAMP 2009 The Manticore Project 9
14 Design Implicit threading Parallel tuples Parallel tuples provide fork-join parallelism. For example, consider summing the leaves of a binary tree. datatype tree = LF of long ND of tree * tree fun treeadd (LF n) = n treeadd (ND(t1, t2)) = (op +) ( treeadd t1, treeadd t2 ) DAMP 2009 The Manticore Project 10
15 Design Implicit threading Parallel bindings Parallel bindings provide more flexibility than parallel tuples. For example, consider computing the product of the leaves of a binary tree. fun treemul (LF n) = n treemul (ND(t1, t2)) = let pval b = treemul t2 val a = treemul t1 in if (a = 0) then 0 else a*b end NOTE: the computation of b is speculative. DAMP 2009 The Manticore Project 11
16 Design Implicit threading Parallel bindings (continued...) DAMP 2009 The Manticore Project 12
17 Design Implicit threading Parallel case Parallel case supports speculative parallelism when we want the quickest answer (e.g., search problems). For example, consider picking a leaf of the tree: fun treepick (LF n) = n treepick (ND(t1, t2)) = ( pcase treepick t1 & treepick t2 of? & n => n n &? => n) There is some similarity with join patterns. DAMP 2009 The Manticore Project 13
18 Design Implicit threading Parallel case (continued...) DAMP 2009 The Manticore Project 14
19 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15
20 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15
21 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15
22 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15
23 Design Explicit threading Interprocess communication The most common mechanism of communication is by channels: type a chan val channel : unit -> a chan val recv : a chan -> a val send : ( a chan * a) -> unit Manticore provides both unbuffered (i.e., blocking send) and buffered channels, but we focus on the unbuffered channels here. DAMP 2009 The Manticore Project 16
24 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17
25 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17
26 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17
27 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17
28 Design Explicit threading Interprocess communication (continued...) For example, consider a possible interaction between a client and two servers. Server1 Client Server2 request request reply / ack nack DAMP 2009 The Manticore Project 18
29 Design Explicit threading Interprocess communication (continued...) Without abstraction, the code is a mess. let val replch1 = channel() and nack1 = cvar() val replch2 = channel() and nack2 = cvar() in send (reqch1, (req1, replch1, nack1)); ] end send (reqch2, (req2, replch2, nack2)); select [ (replch1, fn repl1 => ( set nack2; act1 repl1 ), (replch2, fn repl2 => ( set nack1; act2 repl2 ) But traditional abstraction mechanisms do not support choice! DAMP 2009 The Manticore Project 19
30 Design Explicit threading Events We use event values to package up protocols as first-class abstractions. An event is an abstraction of a synchronous operation, such as receiving a message or a timeout. type a event Base-event constructors create event values for communication primitives: val recvevt : a chan -> a event DAMP 2009 The Manticore Project 20
31 Design Explicit threading Events (continued...) CML event operations: Event wrappers for post-synchronization actions: val wrap : ( a event * ( a -> b)) -> b event Event generators for pre-synchronization actions and cancellation: val guard : (unit -> a event) -> a event val withnack : (unit event -> a event) -> a event Choice for managing multiple communications: val choose : a event list -> a event Synchronization on an event value: val sync : a event -> a DAMP 2009 The Manticore Project 21
32 Design Explicit threading Two-server interaction using events Server abstraction: type server val rpcevt : server * req -> repl event The client code is no longer a mess. select [ wrap (rpcevt server1, fn repl1 => act1 repl1 ), ] wrap (rpcevt server2, fn repl2 => act2 repl2 ) Note that select is shorthand for sync o choose. DAMP 2009 The Manticore Project 22
33 Implementation DAMP 2009 The Manticore Project 23
34 Process abstractions Process abstractions The runtime model is based on heap-allocated first-class continuations and has three distinct notions of process abstraction: Fibers are unadorned threads of control. A suspended fiber s state is represented as a continuation. Threads correspond to language-level threads and have an ID. Virtual Processors (VProcs) are an abstraction of a computational resource. The runtime maintains a dynamic binding between fibers and fiber-local storage (FLS). DAMP 2009 The Manticore Project 24
35 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25
36 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25
37 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25
38 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25
39 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25
40 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25
41 Memory model and GC Memory model (continued...) Global heap Local Heap Local Heap Local Heap VProc state VProc state VProc state VProc VProc VProc DAMP 2009 The Manticore Project 26
42 Schedulers VProcs In addition to its heap, each VProc has local state that provides the execution context for running fibers. This state includes a ready queue of fibers paired with FLS (local access only) a signal mask for local masking of preemption a landing pad for incoming threads (lock-free stack) a stack of scheduler actions DAMP 2009 The Manticore Project 27
43 Schedulers Schedulers The Manticore runtime system provides a flexible framework for nested schedulers Schedulers are implemented using scheduler actions. This framework has been used to implement a wide range of scheduler policies. DAMP 2009 The Manticore Project 28
44 Schedulers Schedulers The Manticore runtime system provides a flexible framework for nested schedulers Schedulers are implemented using scheduler actions. This framework has been used to implement a wide range of scheduler policies. DAMP 2009 The Manticore Project 28
45 Schedulers Schedulers The Manticore runtime system provides a flexible framework for nested schedulers Schedulers are implemented using scheduler actions. This framework has been used to implement a wide range of scheduler policies. DAMP 2009 The Manticore Project 28
46 Schedulers Scheduler actions A scheduler action is a function that implements context switching for a VProc. type fiber = unit cont datatype signal = STOP PREEMPT of fiber type action = signal cont The runtime model defines three operations on a VProc s action stack DAMP 2009 The Manticore Project 29
47 Schedulers Action-stack transitions: running a fiber val run : action -> fiber -> a pushes the action on the stack and starts running the fiber. thread queue... mask thread queue... mask act action stack run act E[run act f ] throw f () act running action stack running DAMP 2009 The Manticore Project 30
48 Schedulers Action-stack transitions: preemption Preemption captures the current state as a fiber and delivers a preemption signal to the top action on the stack (which is popped) thread queue... mask thread queue... mask preemption act f act (PREEMPT f) act act action stack running action stack running DAMP 2009 The Manticore Project 31
49 Schedulers Action-stack transitions: forwarding a signal val forward : signal -> a delivers the signal to the top action on the stack (which is popped) thread queue... mask thread queue... mask forward act act E[forward sig] act sig act action stack running action stack running DAMP 2009 The Manticore Project 32
50 Compiler support The Manticore compiler The compiler is organized as a series on transformations between IRs: Typed AST explicitly-typed, polymorphic, abstract-syntax tree. BOM direct-style, normalized λ-calculus. CPS continuation-passing-style λ-calculus. CFG first-order control-flow graph Typed Source Front end AST Translate BOM Convert CPS Closure CFG Codegen Asm DAMP 2009 The Manticore Project 33
51 Compiler support AST transformations Introduction of compensation code for exceptions. Flattening of nested-data-parallelism (AOS to SOA). Introduction of futures for other implicitly-threaded parallel constructs. Compilation of pattern matching. DAMP 2009 The Manticore Project 34
52 Compiler support Introducing futures fun treemul (LF n) = n treemul (ND(t1, t2)) = let (* pval b = treemul t2 *) val b = future (treemul t2) val a = treemul t1 in if (a = 0) then (cancel b ; 0) else a * (touch b ) end DAMP 2009 The Manticore Project 35
53 Compiler support BOM transformations Standard functional-compiler optimizations (e.g., uncurrying, inlining, and contraction). High-level operator expansion. Rewriting-based domain-specific optimizations (under construction). DAMP 2009 The Manticore Project 36
54 Compiler support High-level operations BOM types and code embedded in ML modules. Used to implement concurrency and parallel features (e.g., spawn, future, and touch). Used to import scheduler code into a program and to implement scheduler operations (e.g., run and forward). Rewriting to implement DSOs (e.g., fusion) DAMP 2009 The Manticore Project 37
55 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
56 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
57 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
58 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
59 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
60 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
61 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38
62 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39
63 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39
64 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39
65 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39
66 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39
67 Implementation status Preliminary performance data (continued...) Speedup vs. Sequential Manticore Perfect speedup Barnes-Hut (LTC) Bitonic Sort (LTC) Merge Sort (LTC) Ray Tracer Speedup Seq Number of Processors DAMP 2009 The Manticore Project 40
68 Implementation status Preliminary performance data (continued...) Speedup vs. SML/NJ Perfect speedup Barnes-Hut (LTC) Bitonic Sort (LTC) Merge Sort (LTC) Ray Tracer Speedup Seq Number of Processors DAMP 2009 The Manticore Project 41
69 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42
70 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42
71 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42
72 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42
73 Optimizations Reducing the cost of object promotion The requirement that shared objects get promoted to the global heap is a significant source of overhead. For example, 47% of the overhead of work stealing is from object promotion. We are investigating three ways of reducing this cost: 1. direct global allocation 2. lazy promotion for work stealing 3. proxy objects DAMP 2009 The Manticore Project 43
74 Optimizations Reducing the cost of object promotion The requirement that shared objects get promoted to the global heap is a significant source of overhead. For example, 47% of the overhead of work stealing is from object promotion. We are investigating three ways of reducing this cost: 1. direct global allocation 2. lazy promotion for work stealing 3. proxy objects DAMP 2009 The Manticore Project 43
75 Optimizations Reducing the cost of object promotion The requirement that shared objects get promoted to the global heap is a significant source of overhead. For example, 47% of the overhead of work stealing is from object promotion. We are investigating three ways of reducing this cost: 1. direct global allocation 2. lazy promotion for work stealing 3. proxy objects DAMP 2009 The Manticore Project 43
76 Optimizations Lazy promotion for work stealing Global heap VProc VProc In our implementation of work stealing, each worker has a globally visible work list that other workers can steal from. Thus, the units of work (i.e., continuations) must be promoted to the global heap. DAMP 2009 The Manticore Project 44
77 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45
78 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45
79 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45
80 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45
81 Optimizations Evaluation of lazy promotion Lazy promotion and software polling reduces the overhead of stealing work. Chunking increases the amount of work per steal. We have benchmarked the combination of these features for mergesort. DAMP 2009 The Manticore Project 46
82 Optimizations Evaluation of lazy promotion (continued...) Speedup vs. Sequential Manticore Mergesort (LTC+Chunk) Mergesort (LTC+Lazy+Chunk) Mergesort (LTC+Lazy) Mergesort (LTC) Speedup Number of processors DAMP 2009 The Manticore Project 47
83 Optimizations Proxy continuations Another source of unnecessary promotion is when continuations are put into the scheduling queues of synchronization objects (e.g., channels). Often, the continuations may not get used or may be resumed on the same processor as they were allocated on. DAMP 2009 The Manticore Project 48
84 Optimizations Proxy continuations (continued...) Global heap Proxy continuation Proxy table Proxy table Local heap Local heap Instead of copying the continuation, and everything reachable from it, copy a proxy. DAMP 2009 The Manticore Project 49
85 Conclusion Questions? DAMP 2009 The Manticore Project 50
Implicit threading, Speculation, and Mutation in Manticore
Implicit threading, Speculation, and Mutation in Manticore John Reppy University of Chicago April 30, 2010 Introduction Overview This talk is about some very preliminary ideas for the next step in the
More informationConcurrent ML. John Reppy January 21, University of Chicago
Concurrent ML John Reppy jhr@cs.uchicago.edu University of Chicago January 21, 2016 Introduction Outline I Concurrent programming models I Concurrent ML I Multithreading via continuations (if there is
More informationStatus Report: The Manticore Project
Status Report: The Manticore Project http://manticore.cs.uchicago.edu Matthew Fluet Toyota Technological Institute at Chicago fluet@tti-c.org Nic Ford Mike Rainey John Reppy Adam Shaw Yingqi Xiao University
More informationTHE UNIVERSITY OF CHICAGO PARALLEL FUNCTIONAL PROGRAMMING WITH MUTABLE STATE A DISSERTATION SUBMITTED TO
THE UNIVERSITY OF CHICAGO PARALLEL FUNCTIONAL PROGRAMMING WITH MUTABLE STATE A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
More informationGarbage Collection for Multicore NUMA Machines
Garbage Collection for Multicore NUMA Machines Sven Auhagen University of Chicago sauhagen@cs.uchicago.edu Lars Bergstrom University of Chicago larsberg@cs.uchicago.edu Matthew Fluet Rochester Institute
More informationThreads and Continuations COS 320, David Walker
Threads and Continuations COS 320, David Walker Concurrency Concurrency primitives are an important part of modern programming languages. Concurrency primitives allow programmers to avoid having to specify
More informationThe Manticore runtime model
THE UNIVERSITY OF CHICAGO The Manticore runtime model A MASTER S PAPER SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER
More informationWritten Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming
Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming Nicolas Bettenburg 1 Universitaet des Saarlandes, D-66041 Saarbruecken, nicbet@studcs.uni-sb.de Abstract. As traditional
More informationProgramming Idioms for Transactional Events
Matthew Kehrt Laura Effinger-Dean Michael Schmitz Dan Grossman University of Washington {mkehrt,effinger,schmmd,djg}@cs.washington.edu Abstract Transactional events (TE) are an extension of Concurrent
More informationOutline. Introduction Concepts and terminology The case for static typing. Implementing a static type system Basic typing relations Adding context
Types 1 / 15 Outline Introduction Concepts and terminology The case for static typing Implementing a static type system Basic typing relations Adding context 2 / 15 Types and type errors Type: a set of
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More informationEfficient Work Stealing for Fine-Grained Parallelism
Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n
More informationRuntime. The optimized program is ready to run What sorts of facilities are available at runtime
Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic
More informationVerifying Concurrent ML programs
Verifying Concurrent ML programs a research proposal Gergely Buday Eszterházy Károly University Gyöngyös, Hungary Synchron 2016 Bamberg December 2016 Concurrent ML is a synchronous language a CML program
More informationChapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading
More informationVerification of an ML compiler. Lecture 3: Closures, closure conversion and call optimisations
Verification of an ML compiler Lecture 3: Closures, closure conversion and call optimisations Marktoberdorf Summer School MOD 2017 Magnus O. Myreen, Chalmers University of Technology Implementing the ML
More informationMultiprocessor scheduling
Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.
More informationProcess Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey
CSC400 - Operating Systems 3. Process Concepts J. Sumey Overview Concurrency Processes & Process States Process Accounting Interrupts & Interrupt Processing Interprocess Communication CSC400 - Process
More informationTowards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.
Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2 1 Karlsruhe Institute of Technology, Germany 2 Microsoft Research, USA 1 2018/12/12 Ullrich, de Moura - Towards Lean 4: KIT The Research An University
More informationDynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle
Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation
More informationChapter 3: Processes. Operating System Concepts 8th Edition, modified by Stewart Weiss
Chapter 3: Processes Operating System Concepts 8 Edition, Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication
More informationOPERATING SYSTEM. Chapter 4: Threads
OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To
More informationProcess. Program Vs. process. During execution, the process may be in one of the following states
What is a process? What is process scheduling? What are the common operations on processes? How to conduct process-level communication? How to conduct client-server communication? Process is a program
More informationControl in Sequential Languages
CS 242 2012 Control in Sequential Languages Reading: Chapter 8, Sections 8.1 8.3 (only) Section 7.3 of The Haskell 98 Report, Exception Handling in the I/O Monad, http://www.haskell.org/onlinelibrary/io-13.html
More informationClosures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming
Closures Mooly Sagiv Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming Summary 1. Predictive Parsing 2. Large Step Operational Semantics (Natural) 3. Small Step Operational Semantics
More informationChapter 4: Threads. Chapter 4: Threads
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Improving IPC by Kernel Design & The Performance of Micro- Kernel Based Systems The IPC Dilemma IPC is very import in µ-kernel design - Increases modularity,
More informationOperating Systems 2 nd semester 2016/2017. Chapter 4: Threads
Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition
More informationAgenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008
Agenda Threads CSCI 444/544 Operating Systems Fall 2008 Thread concept Thread vs process Thread implementation - user-level - kernel-level - hybrid Inter-process (inter-thread) communication What is Thread
More informationConcurrency and Parallelism. CS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Concurrency vs. Parallelism.
CS152: Programming Languages Lecture 19 Shared-Memory Parallelism and Concurrency Dan Grossman Spring 2011 Concurrency and Parallelism PL support for concurrency/parallelism a huge topic Increasingly important
More informationCS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Dan Grossman Spring 2011
CS152: Programming Languages Lecture 19 Shared-Memory Parallelism and Concurrency Dan Grossman Spring 2011 Concurrency and Parallelism PL support for concurrency/parallelism a huge topic Increasingly important
More informationFunctional programming languages
Functional programming languages Part V: functional intermediate representations Xavier Leroy INRIA Paris-Rocquencourt MPRI 2-4, 2015 2017 X. Leroy (INRIA) Functional programming languages MPRI 2-4, 2015
More informationA New Verified Compiler Backend for CakeML. Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish
A New Verified Compiler Backend for CakeML Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish 1 CakeML Has: references, modules, signatures, datatypes, exceptions,...
More informationExercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads
Exercise (could be a quiz) 1 2 Solution CSE 421/521 - Operating Systems Fall 2013 Lecture - IV Threads Tevfik Koşar 3 University at Buffalo September 12 th, 2013 4 Roadmap Threads Why do we need them?
More informationEffective Performance Measurement and Analysis of Multithreaded Applications
Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationCS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1
CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:
More informationAgenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2
Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process
More informationProcesses and Non-Preemptive Scheduling. Otto J. Anshus
Processes and Non-Preemptive Scheduling Otto J. Anshus Threads Processes Processes Kernel An aside on concurrency Timing and sequence of events are key concurrency issues We will study classical OS concurrency
More informationRun-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13
Run-time Environments Lecture 13 by Prof. Vijay Ganesh) Lecture 13 1 What have we covered so far? We have covered the front-end phases Lexical analysis (Lexer, regular expressions,...) Parsing (CFG, Top-down,
More informationTable of Contents. Cilk
Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages
More informationCSE341: Programming Languages Lecture 9 Function-Closure Idioms. Dan Grossman Winter 2013
CSE341: Programming Languages Lecture 9 Function-Closure Idioms Dan Grossman Winter 2013 More idioms We know the rule for lexical scope and function closures Now what is it good for A partial but wide-ranging
More informationChapter 4: Threads. Operating System Concepts 9 th Edit9on
Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit
More informationA Primer on Scheduling Fork-Join Parallelism with Work Stealing
Doc. No.: N3872 Date: 2014-01-15 Reply to: Arch Robison A Primer on Scheduling Fork-Join Parallelism with Work Stealing This paper is a primer, not a proposal, on some issues related to implementing fork-join
More informationA Deterministic Concurrent Language for Embedded Systems
SHIM:A A Deterministic Concurrent Language for Embedded Systems p. 1/28 A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A
More informationA Lightweight OpenMP Runtime
Alexandre Eichenberger - Kevin O Brien 6/26/ A Lightweight OpenMP Runtime -- OpenMP for Exascale Architectures -- T.J. Watson, IBM Research Goals Thread-rich computing environments are becoming more prevalent
More informationCS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017
CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Outline o Process concept o Process creation o Process states and scheduling o Preemption and context switch o Inter-process communication
More informationCOS 320. Compiling Techniques
Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly
More informationMulticore OCaml. Stephen Dolan Leo White Anil Madhavapeddy. University of Cambridge. OCaml 2014 September 5, Multicore OCaml
Stephen Dolan Leo White Anil Madhavapeddy University of Cambridge OCaml 2014 September 5, 2014 Concurrency and Parallelism Concurrency is for writing programs my program handles 10,000 connections at once
More information15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms
:Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms Page1 Andrew Chien, 2008 2 Outline Concurrency vs. Parallelism Quicksort example Nested
More informationDynamic inter-core scheduling in Barrelfish
Dynamic inter-core scheduling in Barrelfish. avoiding contention with malleable domains Georgios Varisteas, Mats Brorsson, Karl-Filip Faxén November 25, 2011 Outline Introduction Scheduling & Programming
More informationTHE UNIVERSITY OF CHICAGO IMPLEMENTATION TECHNIQUES FOR NESTED-DATA-PARALLEL LANGUAGES A DISSERTATION SUBMITTED TO
THE UNIVERSITY OF CHICAGO IMPLEMENTATION TECHNIQUES FOR NESTED-DATA-PARALLEL LANGUAGES A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationChapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems
Chapter 5: Processes Chapter 5: Processes & Threads Process Concept Process Scheduling Operations on Processes Interprocess Communication Communication in Client-Server Systems, Silberschatz, Galvin and
More informationChapter 3: Process-Concept. Operating System Concepts 8 th Edition,
Chapter 3: Process-Concept, Silberschatz, Galvin and Gagne 2009 Chapter 3: Process-Concept Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin
More informationA Deterministic Concurrent Language for Embedded Systems
A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A Deterministic Concurrent Language for Embedded Systems p. 1/38 Definition
More informationMotivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4
Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell
More informationA Static Cut-off for Task Parallel Programs
A Static Cut-off for Task Parallel Programs Shintaro Iwasaki, Kenjiro Taura Graduate School of Information Science and Technology The University of Tokyo September 12, 2016 @ PACT '16 1 Short Summary We
More informationSMD149 - Operating Systems
SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program
More informationIntel Thread Building Blocks, Part IV
Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for
More informationHPX The C++ Standards Library for Concurrency and Parallelism. Hartmut Kaiser
HPX The C++ Standards Library for Concurrency and Hartmut Kaiser (hkaiser@cct.lsu.edu) HPX A General Purpose Runtime System The C++ Standards Library for Concurrency and Exposes a coherent and uniform,
More informationLecture 4: Threads; weaving control flow
Lecture 4: Threads; weaving control flow CSE 120: Principles of Operating Systems Alex C. Snoeren HW 1 Due NOW Announcements Homework #1 due now Project 0 due tonight Project groups Please send project
More informationLecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011
Lecture 5: Declarative Programming. The Declarative Kernel Language Machine September 12th, 2011 1 Lecture Outline Declarative Programming contd Dataflow Variables contd Expressions and Statements Functions
More informationLecture 14. Synthesizing Parallel Programs IAP 2007 MIT
6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007 MIT Synthesizing parallel programs (or borrowing some ideas from hardware design) Arvind Computer Science & Artificial
More informationChapter 3: Process Concept
Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2
More informationChapter 3: Process Concept
Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2
More informationMutation. COS 326 David Walker Princeton University
Mutation COS 326 David Walker Princeton University Mutation? 2 Thus far We have considered the (almost) purely functional subset of Ocaml. We ve had a few side effects: printing & raising exceptions. Two
More informationCOP 4610: Introduction to Operating Systems (Spring 2014) Chapter 3: Process. Zhi Wang Florida State University
COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 3: Process Zhi Wang Florida State University Contents Process concept Process scheduling Operations on processes Inter-process communication
More informationMicrosoft. Microsoft Visual C# Step by Step. John Sharp
Microsoft Microsoft Visual C#- 2010 Step by Step John Sharp Table of Contents Acknowledgments Introduction xvii xix Part I Introducing Microsoft Visual C# and Microsoft Visual Studio 2010 1 Welcome to
More informationAnnouncements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)
Announcements Program #1 Due 2/15 at 5:00 pm Reading Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed) 1 Scheduling criteria Per processor, or system oriented CPU utilization
More informationA Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine. Frej Drejhammar and Lars Rasmusson
A Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine Frej Drejhammar and Lars Rasmusson 140609 Who am I? Senior researcher at the Swedish Institute of Computer Science
More informationThreads Implementation. Jo, Heeseung
Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing
More informationUvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP
Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel
More informationProject 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015
CSE 539 Project 3 Assigned: 03/17/2015 Due Date: 03/27/2015 Building a parallelism profiler for Cilk computations In this project, you will implement a simple serial tool for Cilk programs a parallelism
More informationEfficient, Deterministic, and Deadlock-free Concurrency
Efficient, Deterministic Concurrency p. 1/31 Efficient, Deterministic, and Deadlock-free Concurrency Nalini Vasudevan Columbia University Efficient, Deterministic Concurrency p. 2/31 Data Races int x;
More informationPart V. Process Management. Sadeghi, Cubaleska RUB Course Operating System Security Memory Management and Protection
Part V Process Management Sadeghi, Cubaleska RUB 2008-09 Course Operating System Security Memory Management and Protection Roadmap of Chapter 5 Notion of Process and Thread Data Structures Used to Manage
More informationIntroduction to Parallel Programming Models
Introduction to Parallel Programming Models Tim Foley Stanford University Beyond Programmable Shading 1 Overview Introduce three kinds of parallelism Used in visual computing Targeting throughput architectures
More informationComposable Asynchronous Events
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2010 Composable Asynchronous Events Lukasz Ziarek Purdue University, lziarek@cs.purdue.edu
More informationHandout 2 August 25, 2008
CS 502: Compiling and Programming Systems Handout 2 August 25, 2008 Project The project you will implement will be a subset of Standard ML called Mini-ML. While Mini- ML shares strong syntactic and semantic
More informationImproving the Practicality of Transactional Memory
Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationChapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples
More informationContinuations and Continuation-Passing Style
Continuations and Continuation-Passing Style Lecture 4 CS 390 1/16/08 Goal Weʼre interested in understanding how to represent the state of a co-routine Insight into what a thread really means How fundamental
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationExpressing Parallel Com putation
L1-1 Expressing Parallel Com putation Laboratory for Computer Science M.I.T. Lecture 1 Main St r eam Par allel Com put ing L1-2 Most server class machines these days are symmetric multiprocessors (SMP
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationScheduling Parallel Programs by Work Stealing with Private Deques
Scheduling Parallel Programs by Work Stealing with Private Deques Umut Acar Carnegie Mellon University Arthur Charguéraud INRIA Mike Rainey Max Planck Institute for Software Systems PPoPP 25.2.2013 1 Scheduling
More informationCompiler Construction Lent Term 2015 Lectures 10, 11 (of 16)
Compiler Construction Lent Term 15 Lectures 10, 11 (of 16) 1. Slang.2 (Lecture 10) 1. In lecture code walk of slang2_derive 2. Assorted topics (Lecture 11) 1. Exceptions 2. Objects 3. Stacks vs. Register
More informationReview: Creating a Parallel Program. Programming for Performance
Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)
More informationRun-time Environments
Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction
More informationRun-time Environments
Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction
More informationChapter 4: Threads. Operating System Concepts 9 th Edition
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples
More informationConcurrent Programming. Alexandre David
Concurrent Programming Alexandre David 1.2.05 adavid@cs.aau.dk Disclaimer Overlap with PSS Threads Processes Scheduling Overlap with MVP Parallelism Thread programming Synchronization But good summary
More informationChapter 3: Processes
Chapter 3: Processes Silberschatz, Galvin and Gagne 2013 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2013
More informationSpace Profiling for Parallel Functional Programs
Space Profiling for Parallel Functional Programs Daniel Spoonhower 1, Guy Blelloch 1, Robert Harper 1, & Phillip Gibbons 2 1 Carnegie Mellon University 2 Intel Research Pittsburgh 23 September 2008 ICFP
More information4.8 Summary. Practice Exercises
Practice Exercises 191 structures of the parent process. A new task is also created when the clone() system call is made. However, rather than copying all data structures, the new task points to the data
More informationChapter 3: Processes. Operating System Concepts 8 th Edition,
Chapter 3: Processes, Silberschatz, Galvin and Gagne 2009 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2009
More informationThreads SPL/2010 SPL/20 1
Threads 1 Today Processes and Scheduling Threads Abstract Object Models Computation Models Java Support for Threads 2 Process vs. Program processes as the basic unit of execution managed by OS OS as any
More informationxtc Robert Grimm Making C Safely Extensible New York University
xtc Making C Safely Extensible Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to
More information