Revisiting the Sequential Programming Model for Multi-Core

Size: px

Start display at page:

Download "Revisiting the Sequential Programming Model for Multi-Core"

Theodore Morrison
6 years ago
Views:

1 Revisiting the Sequential Programming Model for Multi-Core Matthew J. Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, & David I. August The Liberty Research Group Princeton University

2 2

3 Source: Intel/Wikipedia 3

4 App OS Intel Core2 Duo Die Photo: Source Intel 4

5 App OS?? App OS?????? SUN Niagara 2 Die Photo: Source SUN App OS?????????????????????????????????????? AMD Phenom Die Photo: Source AMD???????????????????????????????????????? Terascale 80-core chip: Source Intel 5

6 Parallel Programming Languages Automatic Thread Extraction Automatic Thread Extraction Automatic Thread Extraction Automatic Thread Extraction Automatic Thread Extraction Is easily Debugable, Maintainable, etc.? Is Performance Retargetable? Programmer Managed Speculation? Parallelism Hard to Extract? Legacy Application? 6

7 CCSP TLS DSWP SpecDSWP What prevents the automatic extraction of parallelism? Lack of an Aggressive Compilation Framework 7

8 Time Time Scientific Programs Core 4 General Purpose Programs Core 4 Iter 1 Iter 2 Iter 3 Iter 4 Iter 1 Iteration level parallelism Iter 2 Iter 3... prevented by Loop-Carried Dependences 8

9 Time Iteration A: X++; A 1 B 1 B: Work( ); printf( ); C 1 D 1 A 2 B 2 C: if (rare) break; C 2 D 2 A 3 D: printf( ); B 3 An Aggressive Compilation Framework must parallelize inside of the loop body C 3 D 3 9

10 Time Iteration A: X++; A 1 B 1 B: Work( ); printf( ); C 1 D 1 A 2 B 2 C: if (rare) misspec; C 2 D 2 A 3 D: printf( ); B 3 C 3 D 3 An Aggressive Compilation Framework must speculate rare or predictable dependences 10

11 Time Iteration A: X++; A 1 A 2 B 1 A 3 B: Work( ); C 1 D 1 B 2 C: if (rare) misspec; C 2 D 2 D: printf( ); printf( ); B 3 C 3 D 3 An Aggressive Compilation Framework must schedule dependences to reduce synchronization 11

12 Time Iteration A: X++; read( ); A 1 B 1 A 2 A 3 B: Work ( ); C 1 D 1 B 2 C 2 C: if (rare) misspec; D 2 B 3 C 3 D 3 D: printf( ); printf( ); An Aggressive Compilation Framework must be able to optimize deep into the call tree 12

13 Time Time DSWP/SpecDSWP DOACROSS/TLS Core 4 Core 4 A 1 A 1 A 2 A 2 A 3 A 4 B 1 B 2 B 1 B 2 A 3 A 4 C 1 C 1 B 3 B 4 B 3 B 4 D 1 C 2 D 2 D 1 C 2 D 2 C 3 D 3 C 4 C 3 D 4 D 3 C 4 D 4 An Aggressive Compilation Framework must be able to parallelize loops to efficiently utilize the available cores. 13

14 Stalled Time Time Stalled Stalled Pipeline Fill DSWP/SpecDSWP DOACROSS/TLS Core 4 Core 4 A 1 A 1 A 2 A 2 A 3 A 4 B 2 B 2 A 3 A 4 A 5 B 3 A 6 B 1 B 1 B 4 B 4 CD 1 CD 1 B 3 B 5 B 6 CD 2 CD 3 CD 4 A 5 B 5 CD 5 CD 2 A 6 CD 3 CD 4 CD 5 CD 6 6 Iterations completed 5 Iterations completed B 6 14

What prevents the automatic extraction of parallelism?

15 Performance Potential Trace-based simulation with regions of the trace summarized by a singlethreaded run on native hardware What prevents the automatic extraction of parallelism? Lack of an Aggressive Compilation Framework Sequential Programming Model 15

16 Time Time High Level View Low Level Reality Core 4 Core 4 Iter 1 Iter 2 Iter 3 Iter 4 Iter 1 Iter 2 Iter 3 16

17 Time Low Level Reality alloc 1 char *memory; void * alloc(int size); void * alloc(int size) { void * ptr = memory; memory = memory + size; return ptr; } alloc 2 alloc3 alloc 4 alloc5 Can t speculate the dependence alloc 6 17

18 Time Low Level Reality alloc 1 char *memory; void * alloc(int size); alloc 2 void * alloc(int size) { void * ptr = memory; memory = memory + size; return ptr; } alloc3 alloc 4 alloc5 Can t speculate the dependence Can t schedule the dependence alloc 6 18

19 Time Low Level Reality alloc 1 char void * alloc(int size); alloc 2 void * alloc(int size) { void * ptr = memory; memory = memory + size; return ptr; } alloc3 alloc 4 alloc5 Can t speculate the dependence Can t schedule the dependence Can reorder the dependence alloc 6 19

20 Time Low Level Reality alloc 1 char *memory; alloc void * alloc(int size); alloc 2 alloc 5 void * alloc(int size) { void * ptr = memory; memory = memory + size; return ptr; } alloc 4 alloc 6 Compiler does not preserve the existing sequential order, but does guarantee the existence of a sequential ordering 20

21 Performance Potential What prevents the automatic extraction of parallelism? Lack of an Aggressive Compilation Framework Sequential Programming Model 21

22 22

23 The Liberty Research Group 23

REVISITING THE SEQUENTIAL PROGRAMMING MODEL FOR THE MULTICORE ERA

... REVISITING THE SEQUENTIAL PROGRAMMING MODEL FOR THE MULTICORE ERA... AUTOMATIC PARALLELIZATION HAS THUS FAR NOT BEEN SUCCESSFUL AT EXTRACTING SCALABLE PARALLELISM FROM GENERAL PROGRAMS. AN AGGRESSIVE