AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

Size: px

Start display at page:

Download "AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM"

Derick Cummings
6 years ago
Views:

1 FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM SUVINAY SUBRAMANIAN, MARK C. JEFFREY, MALEEN ABEYDEERA, HYUN RYONG LEE, VICTOR A. YING, JOEL EMER, DANIEL SANCHEZ ISCA 017

2 Current speculative systems scale poorly Speculative parallelization, e.g. TM, simplifies parallel programming Performs poorly on real world applications because applications comprise large atomic tasks FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

3 Large atomic tasks limit performance Database Transaction Millions of cycles Prone to aborts Challenging to track Serial (misses parallelism) FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 3

4 Large atomic tasks have abundant nested parallelism! qry K qry Y qry Y upd J qry S upd L qry M qry U upd V How to - extract parallelism? - maintain atomicity? - achieve high performance? FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 4

5 Prior TMs fail to exploit nested parallelism 1. Merging of nested speculative state with parent A X Large speculative state, prone to aborts B Time Core 1 Core Core 3 Core 4 X A. Cyclic dependence between parent and nested children Deadlock and livelock issues FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 5 B Y Y See the paper for more details!

6 Ordering tasks to guarantee atomicity X Y 1 X A B Y Core 1 Core Core 3 Core 4 X Y A B Time X A B Y FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 6

7 Fractal decouples atomicity from parallelism 1. Decouples unit of atomicity from unit of parallelism Domain: All tasks belonging to a domain appear to execute atomically. Implementation guarantees atomicity by ordering tasks No merging speculative state Benefits of Fractal Tiny tasks Easy to track Composable speculative parallelism FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 7

8 Fractal Execution Model DECOUPLING ATOMICITY FROM PARALLELISM FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 8

9 Domains to group tasks into atomic units Fractal programs consist of atomic tasks Tasks may access arbitrary data Tasks may create child tasks Tasks belong to a hierarchy of nested domains FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 9

10 Semantics across domains Each task: can create a single subdomain can enqueue child tasks to subdomain or current domain A X Root domain B L Y M (All tasks in domain + creator of domain) à C D N O Appear to execute as single atomic unit E P FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 10

11 Semantics within a domain Unordered Arbitrary order while respecting parent-child dependences Timestamp-ordered Tasks appear to execute in increasing timestamp order Children appear to execute after parent A C E X Root domain B L D N P Y M O FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 11

12 Fractal software API Creating and enqueuing tasks Creating sub-domains fractal::enqueue(function_pointer, timestamp, arguments...); fractal::create_subdomain(<domain_type>); High-level programming interface, e.g. forall(), callcc(), parallel_reduce() FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 1

13 Example: Database transactions in Fractal TXN 1 Root domain TXN T 1 qry Z 4 5 qry U upd V 1 qry A qry B upd C FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 13

14 Fractal Implementation ATOMICITY THROUGH ORDERING FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 14

15 Fractal Virtual Time (VT) Fractal assigns a fractal virtual time (VT) to each task Captures the ordering of tasks across domains, within a domain Fractal VT= Domain VT FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 15

16 Example: Database transactions in Fractal TXN 1 1 Root domain TXN qry Z 4 5 qry U upd V qry A qry B upd C T 5 FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 16

17 Example: Database transactions in Fractal TXN 1 1 Root domain TXN qry Z 4 5 qry U upd V qry A qry B upd C 5 3 Fractal VT captures all ordering information FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 16 3 T 5

18 Swarm[MICRO 15] : An efficient substrate for ordered speculation Swarm executes tasks speculatively and out of order Large hardware task queues Scalable ordered commits Scalable ordered speculation 64-tile, 56-core chip Mem / IO Mem / IO Mem / IO Tile Mem / IO Tile organization Router L3 slice L L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task unit Efficiently supports tiny speculative tasks FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 17

19 Fractal features Fractal VT construction requires no centralized structures Fractal VT assigns order dynamically Hardware supports a few number of concurrent depths Zooming operations allow for unbounded nesting Spill tasks from shallower domains to memory Parallelism compounds quickly with depth See the paper for more details! FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 18

20 1 Core 1 Core Core 3 Core 4 TXN 1 TXN Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

21 1 Core 1 Core Core 3 Core 4 qry U 1 1 TXN 1 TXN Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

22 1 Core 1 Core Core 3 Core qry U 1 TXN qry Z TXN upd V Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

23 1 Core 1 Core Core 3 Core T qry U qry Z TXN 1 TXN upd V Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

24 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

25 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V 5 qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

26 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

27 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

28 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

29 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

30 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Tracking, conflict detection at level of fine-grain tasks Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

31 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Tracking, conflict detection at level of fine-grain tasks Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

32 1 1 1 T qry U Abort task qry Z qry A TXN 1 TXN Time Core 1 Core Core 3 Core 4 upd V qry B Tracking, conflict detection at level of fine-grain tasks Selective aborts waste less work FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

33 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

34 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

35 1 1 1 T qry U Abort task qry Z qry A TXN 1 TXN Time Core 1 Core Core 3 Core 4 upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

36 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN Time upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0

37 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN Time upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0

38 Core 1 Core Core 3 Core 4 T qry U Abort task qry Z qry A TXN 1 TXN Time upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0

39 Commit parent before child completes Core 1 Core Core 3 Core 4 qry U qry Z qry A TXN 1 TXN Time upd V 1 3 Abort task qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0 T

40 Commit parent before child completes TXN 1 = X Core 1 Core Core 3 Core 4 T qry U Abort task qry Z qry A Fractal unlocks upd the V benefits of fine-grain parallelism Time qry B 3 upd C Task-level tracking Task-level CD Selective aborts TXN = Y FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 1

41 Evaluation FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

42 Methodology Event-driven, Pin-based simulator Target system: 56-core, 64-tile chip Scalability experiments from 1 56 cores Scaled-down systems have fewer tiles Applications Unordered (STAMP): labyrinth, bayes Ordered: color, msf, silo, maxflow, mis Mem / IO Mem / IO Mem / IO Tile Mem / IO Router L3 slice L L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task unit 64 MB shared L3 (1MB/tile) 16K task queue entries (64/core) 4K commit queue entries (16/core) 56 KB per-tile Ls 16 KB per-core L1s In-order, single-issue, scoreboarded FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 3

43 Fractal uncovers abundant nested parallelism Flat Fractal Large atomic tasks Nested parallelism exposed through fine-grained tasks FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 4

44 Fractal uncovers abundant nested parallelism 56 maxflow 3x 56 Flat bayes Fractal 18 labyrinth Flat Speedup x 4.9x Fractal 1 1c 18c 56c 1 1c 18c 56c 1 1c 18c 56c 88x 3x Flat M 16 M Fractal Average task length (cycles) FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 5

45 Fractal avoids over-serialization Speedup c mis 145x Flat 18c 56c Swarm color 1c 18c 56c Fractal c msf 18c 56c Flat 6x 98x Swarm 1x 119x Fractal 40x 145x Flat Fractal Average task length (cycles) FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 6

46 Conclusion Speculative systems must extract nested parallelism in order to scale large, complex, real-world applications Fractal: An execution model for fine-grain nested speculative parallelism Decouple atomicity from parallelism Guarantee atomicity by ordering tasks Fractal unlocks the benefits of fine-grain speculative parallelism Parallelizes many challenging workloads Enables composition of speculative parallel algorithms FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 7

47 Thank You! Questions? Speculative systems must extract nested parallelism in order to scale large, complex, real-world applications Fractal: An execution model for fine-grain nested speculative parallelism Decouple atomicity from parallelism Guarantee atomicity by ordering tasks Fractal unlocks the benefits of fine-grain speculative parallelism Parallelizes many challenging workloads Enables composition of speculative parallel algorithms FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 8

Data-Centric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MICRO 2016

Data-Centric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MICRO 2016 Data-entric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANHEZ MIRO 26 Executive summary Many-cores must exploit cache locality to