Practical Parallel Processing

Size: px

Start display at page:

Download "Practical Parallel Processing"

Derrick Wilkins
5 years ago
Views:

1 Talk at Royal Military Academy, Brussels, May 2004 Practical Parallel Processing Jan Lemeire Parallel Systems lab Vrije Universiteit Brussel (VUB) 1 /21

2 Example: Matrix Multiplication C = A B C T ij = n k = 1 computation A ik. B = δ kj mm. n ( i, 3 j :1.. n) A A11 A12 A A1n C B B11 B12.. B1j.... B1n B21 B22.. B2j.. B2n Bn1 Bn2.. Bnj.. Bnn A21 A A2n Sequential algorithm Ai1 Ai2 Ai3.... Ain An1 An2 An3.... Ann Cij Parallel@RMA 2 /21

Parallel Matrix Multiplication Parallel System

A13.... A1n A21 A22.... A2n Ai1 Ai2 Ai3.

3 Parallel Matrix Multiplication Parallel System Partitioning p blocks of C n 2 p Submatrix C i,j : p i, j= Ai, rowk. k = 1 elements B Communication colum k, j A A11 A12 A A1n A21 A A2n Ai1 Ai2 Ai3.... Ain An1 An2 An3.... Ann C B B11 B12.. B1j.... B1n B21 B22.. B2j.. B2n Bn1 Bn2.. Bnj.. Bnn Cij Parallel@RMA 3 /21

4 Parallel Matrix Multiplication Execution profile n=150 Extra work = overhead Parallel@RMA 4 /21

5 Parallel Matrix Multiplication Memory usage ~ n 2 Parallel@RMA 5 /21

6 Why Parallel Processing? Speedup (time) for long runs realtime (eg. Simulations) as much as possible (eg. weather forecasting) Memory Usage (space) 6 /21

7 Parallel Systems 1. Shared-Memory Architecture fast communication dedicated machines Collection of - Processors -Memory - Interconnection Network 2. Message-Passing Architecture - slower communicatio - simple, cheap general-purpose PC s Parallel@RMA 7 /21

8 How? Communication Layer Pvm (Parallel Virtual Machine) or MPI (Message Passing Interface) transparant platform-independent Functions create processes on other machines send & receive messages 8 /21

9 Aspects of Practical Parallelization 1. System-dependency 2. Inherent Parallelism 3. Software Engineering 4. Performance Analysis 9 /21

powers - different communication speeds - combinations of

10 1. System-dependency Network Topology Mesh network Star network Heterogeneous Systems - different processing powers - different communication speeds - combinations of shared memory & message passing architectures Parallel@RMA 10 /21

11 2. Inherent Parallelism Trivial parallelizable replicated trials (multiple experiments) => script multiple jobs => job management 11 /21

12 2. Inherent Parallelism II Difficult to Parallelize Simulations Synchronization protocol Model dependent Virtual 3D world Tessalation, lighting calculations, rendering > Performance depends on various aspects, like data structures > Optimizations are possible, but strongly depend on problem/algorithm 12 /21

13 Example: Parallel Simulation Detailed IP-switch Execution profile 13 /21

14 3. Software Engineering Understandable, Maintainable tangled code! Flexible separate parallel code Eg.: reuse sequential algorithm, so it can be adapted Reusable trade-off generic program <> performance 14 /21

15 4. Performance Analysis Detection of performance bottlenecks For example communication-computation ratio load imbalances Scalability analysis bigger problem => more computers Calculation of optimal number of processors 15 /21

16 Performance Analysis Tools Automated analysis Simple: XPvm Complex However: Userfriendliness => EPPA 16 /21

17 Our Performance Analysis Tool 1. Causal Models to structure the relations between the variables 17 /21

18 Our Performance Analysis Tool II 2. Refinement Strategy Start: First-order approximation T = comp.# operations. computation Refine when necessary Parallel@RMA 18 /21

19 Theoretical Conclusions Sequential world Separation hardware program (3GL) With abstract model for architecture: Von Neuman Java: platform-independence.net: language-independence Parallel world Ultimate goal: match software - hardware No universal abstract model for parallel architectures! Conflict generality <> efficiency Performance is program- and hardware dependent Efficient programs should be developed specifically Parallel@RMA 19 /21

20 Practical Conclusions Successful parallel processing is a complex issue But not. Thus: Is it worth it? Is it possible? Is it easy? Effort ~ Benefit Parallel@RMA 20 /21

Lookahead Accumulation in Conservative Parallel Discrete Event Simulation.

Lookahead Accumulation in Conservative Parallel Discrete Event Simulation. Jan Lemeire, Wouter Brissinck, Erik Dirkx Parallel Systems lab, Vrije Universiteit Brussel (VUB) Brussels, Belgium {jlemeire,