The Manticore Project

Size: px
Start display at page:

Download "The Manticore Project"

Transcription

1 The Manticore Project John Reppy University of Chicago January 20, 2009

2 Introduction People The Manticore project is a joint project between the University of Chicago and the Toyota Technological Institute. Lars Bergstrom Matthew Fluet Mike Rainey Adam Shaw Yingqi Xiao University of Chicago Toyota Technological Institute at Chicago University of Chicago University of Chicago University of Chicago with help from Nic Ford Jon Riehl Ridg Scott University of Chicago University of Chicago University of Chicago DAMP 2009 The Manticore Project 2

3 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3

4 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3

5 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3

6 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. DAMP 2009 The Manticore Project 3

7 Introduction Background and motivation The Manticore project is our effort to address the programming needs of commodity applications running on the commodity hardware of Hardware supports parallelism at multiple levels: SIMD, SMT, multicore, and small-scale SMP. Likewise, many commodity applications exhibit parallelism at multiple levels. To maximize the performance of these applications, parallel languages should support parallelism at multiple levels. CML Erlang Java Manticore OpenMP Nesl Nepal Id ph SISAL Explicitly Threaded Implicitly Threaded Implicitly Parallel DAMP 2009 The Manticore Project 3

8 Introduction Talk overview This talk is divided into two parts: 1. Language design for heterogeneous parallel programming 2. The Manticore implementation. Please see for papers. DAMP 2009 The Manticore Project 4

9 Design Design DAMP 2009 The Manticore Project 5

10 Design Design overview Language design Our initial design is purposefully conservative. It can be summarized as the combination of three distinct sub-languages: A mutation-free subset of SML (no refs or arrays, but includes exceptions). Language mechanisms for implicitly-threaded parallel programming. Language mechanisms for explicitly-threaded parallel programming (a.k.a. concurrent programming). DAMP 2009 The Manticore Project 6

11 Design Implicit threading Language design (continued...) Manticore provides several light-weight syntactic forms for introducing parallel computation. These forms are hints to the compiler and runtime that a computation is a good candidate for parallel execution. Parallel arrays provide fine-grain data-parallel computations over sequences. Parallel tuples provide a basic fork-join parallel computation. Parallel bindings provide data-flow parallelism with cancelation of unused subcomputations. Parallel case provides non-deterministic speculative parallelism. DAMP 2009 The Manticore Project 7

12 Design Implicit threading Parallel arrays We support fine-grained data-parallel computation using a parallel array comprehension form (NESL/Nepal/DPH): [ exp pat i in exp i where pred ] For example, the parallel point-wise summing of two arrays: [ x+y x in xs, y in ys ] NOTE: zip semantics, not Cartesian-product semantics. DAMP 2009 The Manticore Project 8

13 Design Implicit threading Nested data parallelism (continued...) Mandelbrot set computation: fun x i = x0 + dx * itof i; fun y j = y0 - dy * itof j; fun loop (cnt, re, im) = if (cnt < 255) andalso (re*re + im*im > 4.0) then loop(cnt+1, re*re - re*im + re, 2.0*re*im + im) else cnt; [ [ loop(0, x i, y j) i in [ 0..N ] ] j in [ 0..N ] ] DAMP 2009 The Manticore Project 9

14 Design Implicit threading Parallel tuples Parallel tuples provide fork-join parallelism. For example, consider summing the leaves of a binary tree. datatype tree = LF of long ND of tree * tree fun treeadd (LF n) = n treeadd (ND(t1, t2)) = (op +) ( treeadd t1, treeadd t2 ) DAMP 2009 The Manticore Project 10

15 Design Implicit threading Parallel bindings Parallel bindings provide more flexibility than parallel tuples. For example, consider computing the product of the leaves of a binary tree. fun treemul (LF n) = n treemul (ND(t1, t2)) = let pval b = treemul t2 val a = treemul t1 in if (a = 0) then 0 else a*b end NOTE: the computation of b is speculative. DAMP 2009 The Manticore Project 11

16 Design Implicit threading Parallel bindings (continued...) DAMP 2009 The Manticore Project 12

17 Design Implicit threading Parallel case Parallel case supports speculative parallelism when we want the quickest answer (e.g., search problems). For example, consider picking a leaf of the tree: fun treepick (LF n) = n treepick (ND(t1, t2)) = ( pcase treepick t1 & treepick t2 of? & n => n n &? => n) There is some similarity with join patterns. DAMP 2009 The Manticore Project 13

18 Design Implicit threading Parallel case (continued...) DAMP 2009 The Manticore Project 14

19 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15

20 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15

21 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15

22 Design Explicit threading Explicit threading Explicit threading with preemptive scheduling. Threads do not share state; instead they communicate and synchronize via message passing. Synchronization and communication abstraction are supported by the mechanism of first-class synchronous operations (called events). In addition to supporting abstraction, events also provide a uniform framework for many kinds of communication and synchronization primitives (e.g., buffered and unbuffered channels, I-variables, and M-variables). DAMP 2009 The Manticore Project 15

23 Design Explicit threading Interprocess communication The most common mechanism of communication is by channels: type a chan val channel : unit -> a chan val recv : a chan -> a val send : ( a chan * a) -> unit Manticore provides both unbuffered (i.e., blocking send) and buffered channels, but we focus on the unbuffered channels here. DAMP 2009 The Manticore Project 16

24 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17

25 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17

26 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17

27 Design Explicit threading Interprocess communication (continued...) CML s events are motivated by the following observations: Most interactions between processes involve multiple messages. A process may need to interact with multiple partners (nondeterministic choice). These two properties of IPC cause a conflict. DAMP 2009 The Manticore Project 17

28 Design Explicit threading Interprocess communication (continued...) For example, consider a possible interaction between a client and two servers. Server1 Client Server2 request request reply / ack nack DAMP 2009 The Manticore Project 18

29 Design Explicit threading Interprocess communication (continued...) Without abstraction, the code is a mess. let val replch1 = channel() and nack1 = cvar() val replch2 = channel() and nack2 = cvar() in send (reqch1, (req1, replch1, nack1)); ] end send (reqch2, (req2, replch2, nack2)); select [ (replch1, fn repl1 => ( set nack2; act1 repl1 ), (replch2, fn repl2 => ( set nack1; act2 repl2 ) But traditional abstraction mechanisms do not support choice! DAMP 2009 The Manticore Project 19

30 Design Explicit threading Events We use event values to package up protocols as first-class abstractions. An event is an abstraction of a synchronous operation, such as receiving a message or a timeout. type a event Base-event constructors create event values for communication primitives: val recvevt : a chan -> a event DAMP 2009 The Manticore Project 20

31 Design Explicit threading Events (continued...) CML event operations: Event wrappers for post-synchronization actions: val wrap : ( a event * ( a -> b)) -> b event Event generators for pre-synchronization actions and cancellation: val guard : (unit -> a event) -> a event val withnack : (unit event -> a event) -> a event Choice for managing multiple communications: val choose : a event list -> a event Synchronization on an event value: val sync : a event -> a DAMP 2009 The Manticore Project 21

32 Design Explicit threading Two-server interaction using events Server abstraction: type server val rpcevt : server * req -> repl event The client code is no longer a mess. select [ wrap (rpcevt server1, fn repl1 => act1 repl1 ), ] wrap (rpcevt server2, fn repl2 => act2 repl2 ) Note that select is shorthand for sync o choose. DAMP 2009 The Manticore Project 22

33 Implementation DAMP 2009 The Manticore Project 23

34 Process abstractions Process abstractions The runtime model is based on heap-allocated first-class continuations and has three distinct notions of process abstraction: Fibers are unadorned threads of control. A suspended fiber s state is represented as a continuation. Threads correspond to language-level threads and have an ID. Virtual Processors (VProcs) are an abstraction of a computational resource. The runtime maintains a dynamic binding between fibers and fiber-local storage (FLS). DAMP 2009 The Manticore Project 24

35 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25

36 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25

37 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25

38 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25

39 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25

40 Memory model and GC Memory model The GC is a combination of the Appel Semi-generational collector and the Doligez-Leroy-Gonthier parallel collector. Each VProc has a local heap that can be independently collected. Objects that are shared between VProcs must be promoted to the global heap. Minor GCs are completely asynchronous. Major GC are mostly asynchronous. Global GCs are parallel stop-the-world. DAMP 2009 The Manticore Project 25

41 Memory model and GC Memory model (continued...) Global heap Local Heap Local Heap Local Heap VProc state VProc state VProc state VProc VProc VProc DAMP 2009 The Manticore Project 26

42 Schedulers VProcs In addition to its heap, each VProc has local state that provides the execution context for running fibers. This state includes a ready queue of fibers paired with FLS (local access only) a signal mask for local masking of preemption a landing pad for incoming threads (lock-free stack) a stack of scheduler actions DAMP 2009 The Manticore Project 27

43 Schedulers Schedulers The Manticore runtime system provides a flexible framework for nested schedulers Schedulers are implemented using scheduler actions. This framework has been used to implement a wide range of scheduler policies. DAMP 2009 The Manticore Project 28

44 Schedulers Schedulers The Manticore runtime system provides a flexible framework for nested schedulers Schedulers are implemented using scheduler actions. This framework has been used to implement a wide range of scheduler policies. DAMP 2009 The Manticore Project 28

45 Schedulers Schedulers The Manticore runtime system provides a flexible framework for nested schedulers Schedulers are implemented using scheduler actions. This framework has been used to implement a wide range of scheduler policies. DAMP 2009 The Manticore Project 28

46 Schedulers Scheduler actions A scheduler action is a function that implements context switching for a VProc. type fiber = unit cont datatype signal = STOP PREEMPT of fiber type action = signal cont The runtime model defines three operations on a VProc s action stack DAMP 2009 The Manticore Project 29

47 Schedulers Action-stack transitions: running a fiber val run : action -> fiber -> a pushes the action on the stack and starts running the fiber. thread queue... mask thread queue... mask act action stack run act E[run act f ] throw f () act running action stack running DAMP 2009 The Manticore Project 30

48 Schedulers Action-stack transitions: preemption Preemption captures the current state as a fiber and delivers a preemption signal to the top action on the stack (which is popped) thread queue... mask thread queue... mask preemption act f act (PREEMPT f) act act action stack running action stack running DAMP 2009 The Manticore Project 31

49 Schedulers Action-stack transitions: forwarding a signal val forward : signal -> a delivers the signal to the top action on the stack (which is popped) thread queue... mask thread queue... mask forward act act E[forward sig] act sig act action stack running action stack running DAMP 2009 The Manticore Project 32

50 Compiler support The Manticore compiler The compiler is organized as a series on transformations between IRs: Typed AST explicitly-typed, polymorphic, abstract-syntax tree. BOM direct-style, normalized λ-calculus. CPS continuation-passing-style λ-calculus. CFG first-order control-flow graph Typed Source Front end AST Translate BOM Convert CPS Closure CFG Codegen Asm DAMP 2009 The Manticore Project 33

51 Compiler support AST transformations Introduction of compensation code for exceptions. Flattening of nested-data-parallelism (AOS to SOA). Introduction of futures for other implicitly-threaded parallel constructs. Compilation of pattern matching. DAMP 2009 The Manticore Project 34

52 Compiler support Introducing futures fun treemul (LF n) = n treemul (ND(t1, t2)) = let (* pval b = treemul t2 *) val b = future (treemul t2) val a = treemul t1 in if (a = 0) then (cancel b ; 0) else a * (touch b ) end DAMP 2009 The Manticore Project 35

53 Compiler support BOM transformations Standard functional-compiler optimizations (e.g., uncurrying, inlining, and contraction). High-level operator expansion. Rewriting-based domain-specific optimizations (under construction). DAMP 2009 The Manticore Project 36

54 Compiler support High-level operations BOM types and code embedded in ML modules. Used to implement concurrency and parallel features (e.g., spawn, future, and touch). Used to import scheduler code into a program and to implement scheduler operations (e.g., run and forward). Rewriting to implement DSOs (e.g., fusion) DAMP 2009 The Manticore Project 37

55 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

56 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

57 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

58 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

59 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

60 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

61 Implementation status Implementation status Core-SML plus a simple module system is implemented. All of the IRs (AST, BOM, CPS, and CFG) are implemented (with invariant checkers), as are the translations between IRs. Code generation for the x86-64 is implemented using the MLRisc framework. The runtime system is implemented on Linux and Mac OS X. A number of basic BOM optimizations are implemented. Protocols for asymmetric CML have been designed and coded (see DAMP 08). Protocols symmetric CML are designed and model checked, but not implemented. DAMP 2009 The Manticore Project 38

62 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39

63 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39

64 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39

65 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39

66 Implementation status Preliminary performance data We have done some preliminary benchmarking of the system to test the efficiency of the scheduler framework. The benchmarks use Cilk s version of lazy-task creation (PLDI 98) and gang scheduling. By-hand chunking was used for small problem sizes. Scaling is good, but our base-line performance is still poor. We have implemented very few sequential optimizations, so we expect significant improvement in the baseline. DAMP 2009 The Manticore Project 39

67 Implementation status Preliminary performance data (continued...) Speedup vs. Sequential Manticore Perfect speedup Barnes-Hut (LTC) Bitonic Sort (LTC) Merge Sort (LTC) Ray Tracer Speedup Seq Number of Processors DAMP 2009 The Manticore Project 40

68 Implementation status Preliminary performance data (continued...) Speedup vs. SML/NJ Perfect speedup Barnes-Hut (LTC) Bitonic Sort (LTC) Merge Sort (LTC) Ray Tracer Speedup Seq Number of Processors DAMP 2009 The Manticore Project 41

69 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42

70 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42

71 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42

72 Optimizations Optimizations We are investigating several kinds of optimizations in the Manticore compiler and runtime. Traditional sequential optimizations, such as arity raising. Effects analysis to eliminate compensation code. Aggregation of work to reduce overhead (chunking). Techniques to reduce the cost of promotion DAMP 2009 The Manticore Project 42

73 Optimizations Reducing the cost of object promotion The requirement that shared objects get promoted to the global heap is a significant source of overhead. For example, 47% of the overhead of work stealing is from object promotion. We are investigating three ways of reducing this cost: 1. direct global allocation 2. lazy promotion for work stealing 3. proxy objects DAMP 2009 The Manticore Project 43

74 Optimizations Reducing the cost of object promotion The requirement that shared objects get promoted to the global heap is a significant source of overhead. For example, 47% of the overhead of work stealing is from object promotion. We are investigating three ways of reducing this cost: 1. direct global allocation 2. lazy promotion for work stealing 3. proxy objects DAMP 2009 The Manticore Project 43

75 Optimizations Reducing the cost of object promotion The requirement that shared objects get promoted to the global heap is a significant source of overhead. For example, 47% of the overhead of work stealing is from object promotion. We are investigating three ways of reducing this cost: 1. direct global allocation 2. lazy promotion for work stealing 3. proxy objects DAMP 2009 The Manticore Project 43

76 Optimizations Lazy promotion for work stealing Global heap VProc VProc In our implementation of work stealing, each worker has a globally visible work list that other workers can steal from. Thus, the units of work (i.e., continuations) must be promoted to the global heap. DAMP 2009 The Manticore Project 44

77 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45

78 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45

79 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45

80 Optimizations Lazy promotion for work stealing (continued...) Global heap VProc VProc An alternative is to keep the work list in the local heap and only promote stolen work. But this choice means that the work list cannot be visible to workers running on other vprocs. Our protocol is to send a thief to the vproc that we wish to steal work from. The thief promotes the work. The thief sends the work back to its master. DAMP 2009 The Manticore Project 45

81 Optimizations Evaluation of lazy promotion Lazy promotion and software polling reduces the overhead of stealing work. Chunking increases the amount of work per steal. We have benchmarked the combination of these features for mergesort. DAMP 2009 The Manticore Project 46

82 Optimizations Evaluation of lazy promotion (continued...) Speedup vs. Sequential Manticore Mergesort (LTC+Chunk) Mergesort (LTC+Lazy+Chunk) Mergesort (LTC+Lazy) Mergesort (LTC) Speedup Number of processors DAMP 2009 The Manticore Project 47

83 Optimizations Proxy continuations Another source of unnecessary promotion is when continuations are put into the scheduling queues of synchronization objects (e.g., channels). Often, the continuations may not get used or may be resumed on the same processor as they were allocated on. DAMP 2009 The Manticore Project 48

84 Optimizations Proxy continuations (continued...) Global heap Proxy continuation Proxy table Proxy table Local heap Local heap Instead of copying the continuation, and everything reachable from it, copy a proxy. DAMP 2009 The Manticore Project 49

85 Conclusion Questions? DAMP 2009 The Manticore Project 50

Implicit threading, Speculation, and Mutation in Manticore

Implicit threading, Speculation, and Mutation in Manticore Implicit threading, Speculation, and Mutation in Manticore John Reppy University of Chicago April 30, 2010 Introduction Overview This talk is about some very preliminary ideas for the next step in the

More information

Concurrent ML. John Reppy January 21, University of Chicago

Concurrent ML. John Reppy January 21, University of Chicago Concurrent ML John Reppy jhr@cs.uchicago.edu University of Chicago January 21, 2016 Introduction Outline I Concurrent programming models I Concurrent ML I Multithreading via continuations (if there is

More information

Status Report: The Manticore Project

Status Report: The Manticore Project Status Report: The Manticore Project http://manticore.cs.uchicago.edu Matthew Fluet Toyota Technological Institute at Chicago fluet@tti-c.org Nic Ford Mike Rainey John Reppy Adam Shaw Yingqi Xiao University

More information

THE UNIVERSITY OF CHICAGO PARALLEL FUNCTIONAL PROGRAMMING WITH MUTABLE STATE A DISSERTATION SUBMITTED TO

THE UNIVERSITY OF CHICAGO PARALLEL FUNCTIONAL PROGRAMMING WITH MUTABLE STATE A DISSERTATION SUBMITTED TO THE UNIVERSITY OF CHICAGO PARALLEL FUNCTIONAL PROGRAMMING WITH MUTABLE STATE A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

More information

Garbage Collection for Multicore NUMA Machines

Garbage Collection for Multicore NUMA Machines Garbage Collection for Multicore NUMA Machines Sven Auhagen University of Chicago sauhagen@cs.uchicago.edu Lars Bergstrom University of Chicago larsberg@cs.uchicago.edu Matthew Fluet Rochester Institute

More information

Threads and Continuations COS 320, David Walker

Threads and Continuations COS 320, David Walker Threads and Continuations COS 320, David Walker Concurrency Concurrency primitives are an important part of modern programming languages. Concurrency primitives allow programmers to avoid having to specify

More information

The Manticore runtime model

The Manticore runtime model THE UNIVERSITY OF CHICAGO The Manticore runtime model A MASTER S PAPER SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER

More information

Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming

Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming Nicolas Bettenburg 1 Universitaet des Saarlandes, D-66041 Saarbruecken, nicbet@studcs.uni-sb.de Abstract. As traditional

More information

Programming Idioms for Transactional Events

Programming Idioms for Transactional Events Matthew Kehrt Laura Effinger-Dean Michael Schmitz Dan Grossman University of Washington {mkehrt,effinger,schmmd,djg}@cs.washington.edu Abstract Transactional events (TE) are an extension of Concurrent

More information

Outline. Introduction Concepts and terminology The case for static typing. Implementing a static type system Basic typing relations Adding context

Outline. Introduction Concepts and terminology The case for static typing. Implementing a static type system Basic typing relations Adding context Types 1 / 15 Outline Introduction Concepts and terminology The case for static typing Implementing a static type system Basic typing relations Adding context 2 / 15 Types and type errors Type: a set of

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic

More information

Verifying Concurrent ML programs

Verifying Concurrent ML programs Verifying Concurrent ML programs a research proposal Gergely Buday Eszterházy Károly University Gyöngyös, Hungary Synchron 2016 Bamberg December 2016 Concurrent ML is a synchronous language a CML program

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Verification of an ML compiler. Lecture 3: Closures, closure conversion and call optimisations

Verification of an ML compiler. Lecture 3: Closures, closure conversion and call optimisations Verification of an ML compiler Lecture 3: Closures, closure conversion and call optimisations Marktoberdorf Summer School MOD 2017 Magnus O. Myreen, Chalmers University of Technology Implementing the ML

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey CSC400 - Operating Systems 3. Process Concepts J. Sumey Overview Concurrency Processes & Process States Process Accounting Interrupts & Interrupt Processing Interprocess Communication CSC400 - Process

More information

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2. Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2 1 Karlsruhe Institute of Technology, Germany 2 Microsoft Research, USA 1 2018/12/12 Ullrich, de Moura - Towards Lean 4: KIT The Research An University

More information

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation

More information

Chapter 3: Processes. Operating System Concepts 8th Edition, modified by Stewart Weiss

Chapter 3: Processes. Operating System Concepts 8th Edition, modified by Stewart Weiss Chapter 3: Processes Operating System Concepts 8 Edition, Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

Process. Program Vs. process. During execution, the process may be in one of the following states

Process. Program Vs. process. During execution, the process may be in one of the following states What is a process? What is process scheduling? What are the common operations on processes? How to conduct process-level communication? How to conduct client-server communication? Process is a program

More information

Control in Sequential Languages

Control in Sequential Languages CS 242 2012 Control in Sequential Languages Reading: Chapter 8, Sections 8.1 8.3 (only) Section 7.3 of The Haskell 98 Report, Exception Handling in the I/O Monad, http://www.haskell.org/onlinelibrary/io-13.html

More information

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming Closures Mooly Sagiv Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming Summary 1. Predictive Parsing 2. Large Step Operational Semantics (Natural) 3. Small Step Operational Semantics

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole Improving IPC by Kernel Design & The Performance of Micro- Kernel Based Systems The IPC Dilemma IPC is very import in µ-kernel design - Increases modularity,

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008 Agenda Threads CSCI 444/544 Operating Systems Fall 2008 Thread concept Thread vs process Thread implementation - user-level - kernel-level - hybrid Inter-process (inter-thread) communication What is Thread

More information

Concurrency and Parallelism. CS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Concurrency vs. Parallelism.

Concurrency and Parallelism. CS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Concurrency vs. Parallelism. CS152: Programming Languages Lecture 19 Shared-Memory Parallelism and Concurrency Dan Grossman Spring 2011 Concurrency and Parallelism PL support for concurrency/parallelism a huge topic Increasingly important

More information

CS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Dan Grossman Spring 2011

CS152: Programming Languages. Lecture 19 Shared-Memory Parallelism and Concurrency. Dan Grossman Spring 2011 CS152: Programming Languages Lecture 19 Shared-Memory Parallelism and Concurrency Dan Grossman Spring 2011 Concurrency and Parallelism PL support for concurrency/parallelism a huge topic Increasingly important

More information

Functional programming languages

Functional programming languages Functional programming languages Part V: functional intermediate representations Xavier Leroy INRIA Paris-Rocquencourt MPRI 2-4, 2015 2017 X. Leroy (INRIA) Functional programming languages MPRI 2-4, 2015

More information

A New Verified Compiler Backend for CakeML. Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish

A New Verified Compiler Backend for CakeML. Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish A New Verified Compiler Backend for CakeML Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish 1 CakeML Has: references, modules, signatures, datatypes, exceptions,...

More information

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads

Exercise (could be a quiz) Solution. Concurrent Programming. Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - IV Threads Exercise (could be a quiz) 1 2 Solution CSE 421/521 - Operating Systems Fall 2013 Lecture - IV Threads Tevfik Koşar 3 University at Buffalo September 12 th, 2013 4 Roadmap Threads Why do we need them?

More information

Effective Performance Measurement and Analysis of Multithreaded Applications

Effective Performance Measurement and Analysis of Multithreaded Applications Effective Performance Measurement and Analysis of Multithreaded Applications Nathan Tallent John Mellor-Crummey Rice University CSCaDS hpctoolkit.org Wanted: Multicore Programming Models Simple well-defined

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1

CS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1 CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Processes and Non-Preemptive Scheduling. Otto J. Anshus Processes and Non-Preemptive Scheduling Otto J. Anshus Threads Processes Processes Kernel An aside on concurrency Timing and sequence of events are key concurrency issues We will study classical OS concurrency

More information

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13 Run-time Environments Lecture 13 by Prof. Vijay Ganesh) Lecture 13 1 What have we covered so far? We have covered the front-end phases Lexical analysis (Lexer, regular expressions,...) Parsing (CFG, Top-down,

More information

Table of Contents. Cilk

Table of Contents. Cilk Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages

More information

CSE341: Programming Languages Lecture 9 Function-Closure Idioms. Dan Grossman Winter 2013

CSE341: Programming Languages Lecture 9 Function-Closure Idioms. Dan Grossman Winter 2013 CSE341: Programming Languages Lecture 9 Function-Closure Idioms Dan Grossman Winter 2013 More idioms We know the rule for lexical scope and function closures Now what is it good for A partial but wide-ranging

More information

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

Chapter 4: Threads. Operating System Concepts 9 th Edit9on Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit

More information

A Primer on Scheduling Fork-Join Parallelism with Work Stealing

A Primer on Scheduling Fork-Join Parallelism with Work Stealing Doc. No.: N3872 Date: 2014-01-15 Reply to: Arch Robison A Primer on Scheduling Fork-Join Parallelism with Work Stealing This paper is a primer, not a proposal, on some issues related to implementing fork-join

More information

A Deterministic Concurrent Language for Embedded Systems

A Deterministic Concurrent Language for Embedded Systems SHIM:A A Deterministic Concurrent Language for Embedded Systems p. 1/28 A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A

More information

A Lightweight OpenMP Runtime

A Lightweight OpenMP Runtime Alexandre Eichenberger - Kevin O Brien 6/26/ A Lightweight OpenMP Runtime -- OpenMP for Exascale Architectures -- T.J. Watson, IBM Research Goals Thread-rich computing environments are becoming more prevalent

More information

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017 CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Outline o Process concept o Process creation o Process states and scheduling o Preemption and context switch o Inter-process communication

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

Multicore OCaml. Stephen Dolan Leo White Anil Madhavapeddy. University of Cambridge. OCaml 2014 September 5, Multicore OCaml

Multicore OCaml. Stephen Dolan Leo White Anil Madhavapeddy. University of Cambridge. OCaml 2014 September 5, Multicore OCaml Stephen Dolan Leo White Anil Madhavapeddy University of Cambridge OCaml 2014 September 5, 2014 Concurrency and Parallelism Concurrency is for writing programs my program handles 10,000 connections at once

More information

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms :Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms Page1 Andrew Chien, 2008 2 Outline Concurrency vs. Parallelism Quicksort example Nested

More information

Dynamic inter-core scheduling in Barrelfish

Dynamic inter-core scheduling in Barrelfish Dynamic inter-core scheduling in Barrelfish. avoiding contention with malleable domains Georgios Varisteas, Mats Brorsson, Karl-Filip Faxén November 25, 2011 Outline Introduction Scheduling & Programming

More information

THE UNIVERSITY OF CHICAGO IMPLEMENTATION TECHNIQUES FOR NESTED-DATA-PARALLEL LANGUAGES A DISSERTATION SUBMITTED TO

THE UNIVERSITY OF CHICAGO IMPLEMENTATION TECHNIQUES FOR NESTED-DATA-PARALLEL LANGUAGES A DISSERTATION SUBMITTED TO THE UNIVERSITY OF CHICAGO IMPLEMENTATION TECHNIQUES FOR NESTED-DATA-PARALLEL LANGUAGES A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems Chapter 5: Processes Chapter 5: Processes & Threads Process Concept Process Scheduling Operations on Processes Interprocess Communication Communication in Client-Server Systems, Silberschatz, Galvin and

More information

Chapter 3: Process-Concept. Operating System Concepts 8 th Edition,

Chapter 3: Process-Concept. Operating System Concepts 8 th Edition, Chapter 3: Process-Concept, Silberschatz, Galvin and Gagne 2009 Chapter 3: Process-Concept Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin

More information

A Deterministic Concurrent Language for Embedded Systems

A Deterministic Concurrent Language for Embedded Systems A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A Deterministic Concurrent Language for Embedded Systems p. 1/38 Definition

More information

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4 Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell

More information

A Static Cut-off for Task Parallel Programs

A Static Cut-off for Task Parallel Programs A Static Cut-off for Task Parallel Programs Shintaro Iwasaki, Kenjiro Taura Graduate School of Information Science and Technology The University of Tokyo September 12, 2016 @ PACT '16 1 Short Summary We

More information

SMD149 - Operating Systems

SMD149 - Operating Systems SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program

More information

Intel Thread Building Blocks, Part IV

Intel Thread Building Blocks, Part IV Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for

More information

HPX The C++ Standards Library for Concurrency and Parallelism. Hartmut Kaiser

HPX The C++ Standards Library for Concurrency and Parallelism. Hartmut Kaiser HPX The C++ Standards Library for Concurrency and Hartmut Kaiser (hkaiser@cct.lsu.edu) HPX A General Purpose Runtime System The C++ Standards Library for Concurrency and Exposes a coherent and uniform,

More information

Lecture 4: Threads; weaving control flow

Lecture 4: Threads; weaving control flow Lecture 4: Threads; weaving control flow CSE 120: Principles of Operating Systems Alex C. Snoeren HW 1 Due NOW Announcements Homework #1 due now Project 0 due tonight Project groups Please send project

More information

Lecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011

Lecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011 Lecture 5: Declarative Programming. The Declarative Kernel Language Machine September 12th, 2011 1 Lecture Outline Declarative Programming contd Dataflow Variables contd Expressions and Statements Functions

More information

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007 MIT Synthesizing parallel programs (or borrowing some ideas from hardware design) Arvind Computer Science & Artificial

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

Mutation. COS 326 David Walker Princeton University

Mutation. COS 326 David Walker Princeton University Mutation COS 326 David Walker Princeton University Mutation? 2 Thus far We have considered the (almost) purely functional subset of Ocaml. We ve had a few side effects: printing & raising exceptions. Two

More information

COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 3: Process. Zhi Wang Florida State University

COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 3: Process. Zhi Wang Florida State University COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 3: Process Zhi Wang Florida State University Contents Process concept Process scheduling Operations on processes Inter-process communication

More information

Microsoft. Microsoft Visual C# Step by Step. John Sharp

Microsoft. Microsoft Visual C# Step by Step. John Sharp Microsoft Microsoft Visual C#- 2010 Step by Step John Sharp Table of Contents Acknowledgments Introduction xvii xix Part I Introducing Microsoft Visual C# and Microsoft Visual Studio 2010 1 Welcome to

More information

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed) Announcements Program #1 Due 2/15 at 5:00 pm Reading Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed) 1 Scheduling criteria Per processor, or system oriented CPU utilization

More information

A Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine. Frej Drejhammar and Lars Rasmusson

A Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine. Frej Drejhammar and Lars Rasmusson A Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine Frej Drejhammar and Lars Rasmusson 140609 Who am I? Senior researcher at the Swedish Institute of Computer Science

More information

Threads Implementation. Jo, Heeseung

Threads Implementation. Jo, Heeseung Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing

More information

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP Parallel Programming with Compiler Directives OpenMP Clemens Grelck University of Amsterdam UvA-SARA High Performance Computing Course June 2013 OpenMP at a Glance Loop Parallelization Scheduling Parallel

More information

Project 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015

Project 3. Building a parallelism profiler for Cilk computations CSE 539. Assigned: 03/17/2015 Due Date: 03/27/2015 CSE 539 Project 3 Assigned: 03/17/2015 Due Date: 03/27/2015 Building a parallelism profiler for Cilk computations In this project, you will implement a simple serial tool for Cilk programs a parallelism

More information

Efficient, Deterministic, and Deadlock-free Concurrency

Efficient, Deterministic, and Deadlock-free Concurrency Efficient, Deterministic Concurrency p. 1/31 Efficient, Deterministic, and Deadlock-free Concurrency Nalini Vasudevan Columbia University Efficient, Deterministic Concurrency p. 2/31 Data Races int x;

More information

Part V. Process Management. Sadeghi, Cubaleska RUB Course Operating System Security Memory Management and Protection

Part V. Process Management. Sadeghi, Cubaleska RUB Course Operating System Security Memory Management and Protection Part V Process Management Sadeghi, Cubaleska RUB 2008-09 Course Operating System Security Memory Management and Protection Roadmap of Chapter 5 Notion of Process and Thread Data Structures Used to Manage

More information

Introduction to Parallel Programming Models

Introduction to Parallel Programming Models Introduction to Parallel Programming Models Tim Foley Stanford University Beyond Programmable Shading 1 Overview Introduce three kinds of parallelism Used in visual computing Targeting throughput architectures

More information

Composable Asynchronous Events

Composable Asynchronous Events Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2010 Composable Asynchronous Events Lukasz Ziarek Purdue University, lziarek@cs.purdue.edu

More information

Handout 2 August 25, 2008

Handout 2 August 25, 2008 CS 502: Compiling and Programming Systems Handout 2 August 25, 2008 Project The project you will implement will be a subset of Standard ML called Mini-ML. While Mini- ML shares strong syntactic and semantic

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples

More information

Continuations and Continuation-Passing Style

Continuations and Continuation-Passing Style Continuations and Continuation-Passing Style Lecture 4 CS 390 1/16/08 Goal Weʼre interested in understanding how to represent the state of a co-routine Insight into what a thread really means How fundamental

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Expressing Parallel Com putation

Expressing Parallel Com putation L1-1 Expressing Parallel Com putation Laboratory for Computer Science M.I.T. Lecture 1 Main St r eam Par allel Com put ing L1-2 Most server class machines these days are symmetric multiprocessors (SMP

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Scheduling Parallel Programs by Work Stealing with Private Deques

Scheduling Parallel Programs by Work Stealing with Private Deques Scheduling Parallel Programs by Work Stealing with Private Deques Umut Acar Carnegie Mellon University Arthur Charguéraud INRIA Mike Rainey Max Planck Institute for Software Systems PPoPP 25.2.2013 1 Scheduling

More information

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16)

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16) Compiler Construction Lent Term 15 Lectures 10, 11 (of 16) 1. Slang.2 (Lecture 10) 1. In lecture code walk of slang2_derive 2. Assorted topics (Lecture 11) 1. Exceptions 2. Objects 3. Stacks vs. Register

More information

Review: Creating a Parallel Program. Programming for Performance

Review: Creating a Parallel Program. Programming for Performance Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Concurrent Programming. Alexandre David

Concurrent Programming. Alexandre David Concurrent Programming Alexandre David 1.2.05 adavid@cs.aau.dk Disclaimer Overlap with PSS Threads Processes Scheduling Overlap with MVP Parallelism Thread programming Synchronization But good summary

More information

Chapter 3: Processes

Chapter 3: Processes Chapter 3: Processes Silberschatz, Galvin and Gagne 2013 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2013

More information

Space Profiling for Parallel Functional Programs

Space Profiling for Parallel Functional Programs Space Profiling for Parallel Functional Programs Daniel Spoonhower 1, Guy Blelloch 1, Robert Harper 1, & Phillip Gibbons 2 1 Carnegie Mellon University 2 Intel Research Pittsburgh 23 September 2008 ICFP

More information

4.8 Summary. Practice Exercises

4.8 Summary. Practice Exercises Practice Exercises 191 structures of the parent process. A new task is also created when the clone() system call is made. However, rather than copying all data structures, the new task points to the data

More information

Chapter 3: Processes. Operating System Concepts 8 th Edition,

Chapter 3: Processes. Operating System Concepts 8 th Edition, Chapter 3: Processes, Silberschatz, Galvin and Gagne 2009 Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Silberschatz, Galvin and Gagne 2009

More information

Threads SPL/2010 SPL/20 1

Threads SPL/2010 SPL/20 1 Threads 1 Today Processes and Scheduling Threads Abstract Object Models Computation Models Java Support for Threads 2 Process vs. Program processes as the basic unit of execution managed by OS OS as any

More information

xtc Robert Grimm Making C Safely Extensible New York University

xtc Robert Grimm Making C Safely Extensible New York University xtc Making C Safely Extensible Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to

More information