MELD: Merging Execution- and Language-level Determinism. Joe Devietti, Dan Grossman, Luis Ceze University of Washington

Size: px

Start display at page:

Download "MELD: Merging Execution- and Language-level Determinism. Joe Devietti, Dan Grossman, Luis Ceze University of Washington"

Hugo Mason
6 years ago
Views:

1 MELD: Merging Execution- and Language-level Determinism Joe Devietti, Dan Grossman, Luis Ceze University of Washington

2 CoreDet results overhead normalized to nondet 800% 600% 400% 200% 0% barnes blacksch lu radix streamcl Bergan et al., ASPLOS 10; Devietti et al. ASPLOS 11 2

3 CoreDet results overhead normalized to nondet 800% 600% 400% 200% 0% barnes blacksch lu radix streamcl Bergan et al., ASPLOS 10; Devietti et al. ASPLOS 11 2

4 determinism recipe isolate threads updates merge updates i. in a deterministic way ii. at deterministic times 3

5 determinism recipe use store isolate threads updates buffers to buffer updates merge updates i. in a deterministic way ii. at deterministic times 3

6 determinism recipe use store isolate threads updates buffers to buffer updates merge updates i. in a deterministic way parallel merge algorithm ii. at deterministic times 3

7 determinism recipe use store isolate threads updates buffers to buffer updates merge updates i. in a deterministic way parallel merge algorithm ii. at deterministic times count fixed # of instructions 3

8 determinism via store buffers parallel T 1 T 2 T 3 time 4

9 determinism via store buffers parallel parallel mode: buffer all stores, T 1 sync via [Olszewski et al, ASPLOS 09] T 2 T 3 time 4

10 determinism via store buffers parallel T 1 T 2 wr A rd A parallel mode: buffer all stores, sync via [Olszewski et al, ASPLOS 09] T 3 time 4

11 determinism via store buffers parallel T 1 T 2 wr A rd A parallel mode: buffer all stores, sync via [Olszewski et al, ASPLOS 09] T 3 time 4

12 determinism via store buffers parallel parallel mode: buffer all stores, T 1 wr A rd A sync via [Olszewski et al, ASPLOS 09] T 2 rd A T 3 time 4

13 determinism via store buffers parallel parallel mode: buffer all stores, T 1 wr A rd A sync via [Olszewski et al, ASPLOS 09] T 2 T 3 rd A wr A time 4

14 determinism via store buffers parallel parallel mode: buffer all stores, T 1 wr A rd A sync via [Olszewski et al, ASPLOS 09] T 2 rd A T 3 time 4

15 determinism via store buffers parallel parallel mode: buffer all stores, T 1 T 2 sync via [Olszewski et al, ASPLOS 09] commit mode: deterministically publish buffers T 3 time 4

16 determinism via store buffers parallel commit parallel mode: buffer all stores, T 1 T 2 sync via [Olszewski et al, ASPLOS 09] commit mode: deterministically publish buffers T 3 time 4

17 sources of overhead parallel commit T 1 T 2 T 3 time 5

18 sources of overhead parallel commit store buffer T 1 instrumentation T 2 T 3 time 5

19 sources of overhead parallel commit store buffer T 1 instrumentation T 2 imbalance T 3 time 5

20 sources of overhead parallel commit store buffer T 1 instrumentation T 2 imbalance T 3 time 5

21 MELD Merging Execution- and Language-level Determinism languagelevel executionlevel examples Jade, DPJ DMP, Kendo 6

22 MELD Merging Execution- and Language-level Determinism languagelevel executionlevel examples Jade, DPJ DMP, Kendo runtime overhead? none moderate-high 6

23 MELD Merging Execution- and Language-level Determinism languagelevel executionlevel examples Jade, DPJ DMP, Kendo runtime overhead? supports all code? none no moderate-high yes 6

24 MELD Merging Execution- and Language-level Determinism languagelevel executionlevel examples Jade, DPJ DMP, Kendo runtime overhead? supports all code? sequential semantics? none no yes moderate-high yes no 6

25 MELD Merging Execution- and Language-level Determinism languagelevel executionlevel hybrid examples Jade, DPJ DMP, Kendo MELD runtime overhead? none moderate-high low supports all code? no yes yes sequential semantics? yes no no 6

26 90% of execution time is spent in 10% of the code 7

27 90% of execution time is spent in 10% of the code often data-parallel 7

28 program composition 8

29 program composition locks condition variables queues flags pointers privatization 8

30 program composition locks condition variables queues flags pointers privatization regular data parallel computation 8

31 program composition locks condition variables queues flags pointers privatization regular data parallel computation 8

32 program composition execution-level locks condition variables determinism queues flags pointers privatization regular data parallel computation 8

33 program composition execution-level locks condition variables determinism queues flags language-level determinism pointers privatization regular data parallel computation 8

34 program composition execution-level locks condition variables determinism queues flags language-level determinism pointers privatization regular data parallel computation 8

35 what could possibly go wrong? mergesort(int* array) { // verified by det lang } 9

36 what could possibly go wrong? what other threads call mergesort concurrently? mergesort(int* array) { // verified by det lang } what aliases array? can other threads concurrently access array? 9

37 what could possibly go wrong? what other threads call mergesort concurrently? langdet mergesort(int* array) { // verified by det lang } what aliases array? can other threads concurrently access array? 9

38 compilation flow 10

39 compilation flow lightweight type qualifier system 10

40 compilation flow lightweight type qualifier system store buffer instrumentation 10

41 compilation flow lightweight type qualifier system store buffer instrumentation verifier (done manually) 10

42 compilation flow lightweight type qualifier system store buffer instrumentation verifier insn counting instrumentation (done manually) 10

43 example: radix int dest =...; // implicitly exdet 11

44 example: radix int dest =...; // implicitly exdet langdet int langdet _source = cast(source); 11

45 example: radix int dest =...; // implicitly exdet langdet int langdet _source = cast(source); BARRIER(); for (int i =...) { dest[complicated] = _source[i]; } BARRIER(); 11

46 experimental setup 8-core 2.4GHz Intel Nehalem, 10GB RAM C benchmarks from SPLASH2, PARSEC CoreDet compiler with consistency optimizations from [Devietti et al., ASPLOS 11] 12

47 MELD results overhead normalized to nondet 800% 600% 400% 200% 0% CoreDet MELD barnes blacksch lu radix streamcl 13

48 MELD results overhead normalized to nondet 800% 600% 400% 200% 0% CoreDet MELD barnes blacksch lu radix streamcl 13

49 MELD results overhead normalized to nondet 800% 600% 400% 200% 0% CoreDet MELD barnes blacksch lu radix streamcl 13

50 characterization workload LOC explicit sync ops (static) barnes blackscholes lu radix streamcluster

51 usability workload annotations casts barnes 6 7 blackscholes 8 0 lu 10 3 radix 2 2 streamcluster

52 future work build fully integrated system supporting nondeterminism via information flow tracking type system find gainful employment 16

53 Questions?

Deterministic Shared Memory Multiprocessing

Deterministic Shared Memory Multiprocessing Luis Ceze, University of Washington joint work with Owen Anderson, Tom Bergan, Joe Devietti, Brandon Lucia, Karin Strauss, Dan Grossman, Mark Oskin. Safe MultiProcessing