The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

Size: px

Start display at page:

Download "The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling"

Phoebe White
5 years ago
Views:

1 The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban

2 Overview Serial Compilers Parallel Compilers Self-Scheduling Algorithms: Pure Chunk Guided Trapezoid Algorithm Performance Nanothreads

3 A Quick Look at Compilers 6 Steps to Serial Compilation: Lexical Analysis Syntax Checking Intermediate Code Generation Optimizations (if requested) Machine Code Generation Linking (if applicable)/.bin file

4 What is a Parallel Compiler? Performs the previous 6 steps in addition to these: Load-balances for an arbitrary number of processors Minimizes scheduling overheads and contention Performs parallel program partitioning Recent efforts include investigations into exploitation of functional-level parallelism

5 What is Self-Scheduling? Means by which an arbitrary number of loop iterations may be distributed among processors Idea is to achieve good load balancing and minimize scheduling overhead

6 Self-Scheduling Algorithms Four common self-scheduling algorithms exist: Pure Self-Scheduling Chunk Self-Scheduling Guided Self-Scheduling Trapezoid Self-Scheduling

7 Pure Self-Scheduling (SS) One parallel loop iteration is allocated to a processor as the processor goes idle Achieves excellent (near-perfect) load balancing Pitfalls include an enormous amount of contention and scheduling overhead

8 Chunk Self-Scheduling (CSS) Extension of pure self-scheduling in that a chunk of k iterations is assigned to a processor as it goes idle Contention costs decrease significantly for large k, but load balance will suffer Choosing an optimal k is difficult at runtime and nearly impossible at compile-time

9 Guided Self-Scheduling (GSS) Attempts to dynamically determine k by dividing the remaining number of loop iterations by the number of processors in the system K decreases as the loop finishes, but is initially large; in this manner, a tradeoff is established between load-balancing and scheduling overhead Will still cause contention at fine levels of granularity

10 Trapezoidal Self-Scheduling Captures the benefits of CSS and GSS with none of the drawbacks Minimizes scheduling overhead and CPU idle time while maintaining a good load balance Features a downward-sloping line starting at an upper bound of iterations (f) and terminating at a lower bound (l)

11 Make the following 9 assumptions: Trapezoid Theory Shared memory system with P processors I iterations to parallelize Execution time of th i iteration is L(i) Parallelization spawns N tasks Load of t h task is T(t) Time in computation state is X Total scheduling overhead is O Total wait time is W Division of loop iterations determined by function C(t)

12 Self-Scheduling: An Illustration

13 All four algorithms compiled and run on a 96-node GP-1000 system for the load profiles on the right Examine speedup, load imbalance, and overhead Θ = (O * P) / (X + O + W) = (W * P) / (X + O + W) Algorithm Performance

14 Algorithm Performance UNIFORM LOAD RANDOM LOAD

15 Algorithm Performance Continued INCREASING LOAD DECREASING LOAD

16 Nanothreads Programming Model Nanothread: User-level entity that corresponds to a task that needs to be executed Looks for FUNCTIONAL parallelism as well as loop parallelism Able to adapt to changes in resources during runtime, therefore good loadbalancing

17 Nanothreads Programming environment decomposes code into nodes, connecting dependent tasks with edges to form a hierarchical task graph (HTG): PROG PROGRAM m = Z() G( m ) DO i = ENDDO ENDPROGRAM A( i ) = B( i ) = Z() A DO B A Simple HTG node G() ENDDO Z() Compound HTG node Dependence arc END

18 Conclusion Parallel compilers are key in enhancing the benefits of parallel programs running on parallel machines Trapezoidal Self-Scheduling method exhibits all the benefits and none of the pitfalls of other self-scheduling algorithms Nanothread environment looks like the next logical step in exploiting parallelism in programs

19 Questions?

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes