Evaluation of a Speculative Multithreading Compiler by Characterizing Program Dependences

Size: px

Start display at page:

Download "Evaluation of a Speculative Multithreading Compiler by Characterizing Program Dependences"

Aldous Morris
6 years ago
Views:

1 Evaluation of a Speculative Multithreading Compiler by Characterizing Program Dependences By Anasua Bhowmik Manoj Franklin Indian Institute of Science Univ of Maryland Supported by National Science Foundation, Intel Corp, and IBM Research

2 Introduction Reducing the execution time of a single program - a major challenge Speculative multithreaded (SpMT) execution becoming important Proper thread generation is crucial for performance in SpMT execution Developed a compiler for program partitioning Need to evaluate the compiler by characterizing inter-thread program dependences

3 Speculative Multithreaded Processor PE 0 CENTRALIZED RESOURCE PE 1 ICN CENTRALIZED RESOURCE PE 2 PE 3 PE: Processing Element ICN: Interconnection Network Example - Multiscalar, Superthreading, Dynamic Multithreading (DMT)

4 Speculative Multithreading (SpMT) Executes multiple flows of control (thread) in parallel from a single program Processor speculates on dependences and continues execution Processor ensures the sequential semantics of the program Rolls back or recovers when detects dependence violation Compiler can perform aggressive thread partitioning in presence of ambiguous dependences Suitable for non-numeric applications

5 SpMT Threads Thread 0 Thread 1 foo() foo() Thread 2

6 Importance of Inter-thread Data Dependences Nature of threads Thread selection criteria Spawning strategies

7 Nature of Threads loop-centric, speculative, non-speculative threads loop-centric thread speculative thread A Thread1 B Thread 2 C D E non-speculative thread F Thread 3

8 Thread Selection Criteria Inter-thread data dependence Load balancing Thread granularity Spawning Strategies Spawning point the beginning of an earlier thread from anywhere in the earlier thread Spawning order sequential program order out-of-order spawning of threads

9 Thread 2 B C Thread 3 Out-of-order Thread Spawning Example 1 main () { Example 2 Thread 1 A D f(); g(); thread 3 } f () { thread 1. thread 2 } g () { }

10 Our Compiler Framework Partitions C programs into threads Goal: Minimizing the total execution time Main Features: Generates all types of threads Considers data dependence, thread size, load balancing Exploits parallelism available at control independent regions explicitly Supports out-of-order spawning

11 Block Diagram of Our Compiler Framework Profiler Optimized IR SUIF IR C Program SUIF Optimizer SUIF Front-end Interprocedural Analysis Annotated IR Thread Generator Threaded program MachSUIF Back-end Profile information Threaded Alpha Assembly Program

12 Data Dependence Distance (DDD) Assumes instructions are executed sequentially in the PE compiler starts a new thread so that DDD is small distance for x = 1-2 = -1 distance for y = 5-3 = 2 DDD between T2 and T1 = max(-1, 2) = 2 May not be very accurate with out-of-order execution T1 T2 1. x = a + b;..;..; 5. y = a b; 1. r =..; 2. p = x + r; 3. q = y + r;

13 Data Dependence Count (DDC) DDC is the weighted count of number of data dependence arcs coming into a thread from predecessors Sometimes even a single dependence arc can cause a significant performance bottleneck T2 z = ; y = ; T1 x = ; Number of Arcs = 3 DDC = w_i*(# arcs from T_i) DDC = 2* *1 = 2.5 T3 p = x + z; q = y +..;

14 Illustration of Program Partitioning Thread 1 A A Thread 1 B Thread 2 B C

15 Experimental Evaluations Developed a trace driven simulator that models an SpMT processor SpMT processor consists of multiple PEs Each PE has program counter, fetch unit, decode unit, execution unit Memory hierarchy with shared L1 d-cache Branch predictor and data value predictor

16 Performance of Basic Thread Generation Scheme a: all types of threads; b: loops + non-speculative threads; c: loop threads

17 Span of Resolved Register Dependences

18 Span of Unresolved Register Dependences

19 Span of Resolved Memory Dependences

20 Span of Unresolved Memory Dependences

21 Conclusion Judicious partitioning (of sequential programs) is important Partitioning compiler should model interthread control dependences and data dependences accurately This study helps in understanding program behavior in an SpMT execution model

Evaluation of a Speculative Multithreading Compiler by Characterizing Program Dependences

Evaluation of a Speculative Multithreading Compiler by Characterizing Program Dependences Anasua Bhowmik Department of Computer Science and Automation Indian Institute of Science Bangalore, India anasua@csa.iisc.