Performance models for master/slave parallel programs

Size: px

Start display at page:

Download "Performance models for master/slave parallel programs"

Myron Heath
5 years ago
Views:

1 Performance models for master/slave parallel programs PASM 2004 Lucas Baldo Leonardo Brenner Luiz G. Fernandes Paulo Fernandes Afonso Sales {lbaldo, lbrenner, gustavo, paulof, PUCRS - FACIN - PPGCC Porto Alegre, , Brazil HP Brazil - CNPq/Brazil PASM 2004: Performance models for master/slave parallel programs p.1/22

2 Motivation Foresee implementation behavior in early stages of the parallel program development Identification of performance indices Provide an easy-to-use (or at least feasible) stochastic modeling Any modeling technique could be used (PEPA, SPN, etc) Very dependable on programmer previous experience PASM 2004: Performance models for master/slave parallel programs p.2/22

3 Outline Stochastic Automata Networks (SAN) Case Study: Propagation Algorithm Proposed models Asynchronous Synchronous Performance Indices Conclusions PASM 2004: Performance models for master/slave parallel programs p.3/22

4 SAN (Concept) Proposed by B. Plateau Each subsystem is represented by a stochastic automaton Local events change the local state of only one automaton Synchronizing events change the local state of several automata simultaneously Interaction among automata: Synchronizing events Functional rates and/or functional probabilities PASM 2004: Performance models for master/slave parallel programs p.4/22

5 SAN (Example) A (1) A (2) 0 (1) e 1 e 2 1 (1) 0 (2) e 4 e 2 (π 1 ) e 2 (π 2 ) e 5 2 (2) 1 (2) Type Event Rate loc e 1 f syn e 2 µ loc e 3 σ loc e 4 δ loc e 5 τ f = [( st A (2) == 0 (2)) λ ] + [( st A (2) == 2 (2)) γ ] e 3 Reachability =! [( st A (1) == 1 (1)) && ( st A (2) == 1 (2))] PASM 2004: Performance models for master/slave parallel programs p.5/22

6 SAN (Markov) λ 0 (1) 0 (2) 1 (1) 0 (2) δ τ µπ 2 µπ 1 δ τ 0 (1) 2 (2) σ 0 (1) 1 (2) 1 (1) 2 (2) σ 1 (1) 1 (2) γ PASM 2004: Performance models for master/slave parallel programs p.6/22

7 Master/Slave Paradigm One master node generates and allocates work (tasks) to N slave nodes Each slave node receives tasks, computes them, and sends the result back Master node receives and analyses the computed results by slaves (summation) PASM 2004: Performance models for master/slave parallel programs p.7/22

8 Master/Slave Generic SAN Model Master T x c s r 1..r N Rx up IT x up begins the program Slave (i) s c p i r i execution ends the parallel application (returns to the initial situation) sends tasks to the slaves performs the summation finishes a task by the i th slave sends the result from i th slave back to the master r i (π) I P r T x up s p i r i (1 π) PASM 2004: Performance models for master/slave parallel programs p.8/22

9 Case Study Image interpolation application method to create smooth and realistic virtual views starts with two source images Propagation Algorithm input: seed pairs based on a region growing technique goal: match the largest possible region PASM 2004: Performance models for master/slave parallel programs p.9/22

10 Asynchronous Implementation One master, one buffer and N slaves The results sent by the slaves are stored on a repository (buffer) Master does not need to synchronize the reception of results PASM 2004: Performance models for master/slave parallel programs p.10/22

11 SAN Model of Asynchronous Implementation Master Buffer Slave (1) Slave (i) Slave (N) T x K I I I c(g2) c(g1) Rx s 1..s N r 1..r N r 1..r N K 1 c c P r up s 1 r 1 (1 π) up s i... P r r i (1 π)... P r up s N r N (1 π) up r 1..r N. c r 1 (π) p 1 r i (π) p i r N (π) p N IT x 0 T x T x T x g 1 = (nb [Slave (1)..Slave (N) ] I == 0) g 2 = (nb [Slave (1)..Slave (N) ] I > 0) Type Event Rate Type Event Rate Type Event Rate syn up λ syn µ syn c σ syn s i δ syn r i α loc p i γ PASM 2004: Performance models for master/slave parallel programs p.11/22

12 Parameterization for Asynchronous Implementation Input values: BL slaves buffer length (5.500 points) P S percentage of slices extension over its neighborhoods (0.5) N P number of points ( ) N S RN P N B number of slices (granularity) [ ] real number of points 2 (1 + P S) + (NS 2) (1 + 2P S) number of buffer for each slice RNP BL NS NP NS PASM 2004: Performance models for master/slave parallel programs p.12/22

13 Parameterization for Asynchronous Implementation Rates: s i δ = 1 T T up λ = 1 T T N T T send a new task N number of slaves µ = 1000 (insignificant time) p i γ = 1 CM c σ = 1 ER r i α = 1 T P CM compute final matches ER evaluate results T P send a result pack PASM 2004: Performance models for master/slave parallel programs p.13/22

14 Synchronous Implementation One master and N slaves The results are sending to the master Master waits all slaves have finished their tasks before it starts a new task distribution PASM 2004: Performance models for master/slave parallel programs p.14/22

15 SAN Model of Synchronous Implementation Master Slave (1) Slave (i) Slave (N) T x I I I c s up s up s up s r 1..r N Rx P r r 1 (1 π)... P r r i (1 π)... P r r N (1 π) up r 1 (π) p 1 r i (π) p i r N (π) p N IT x T x T x T x f c = (nb [Slave (1)..Slave (N) ] I == N) σ Type Event Rate Type Event Rate Type Event Rate syn up λ syn µ loc c f c syn s δ syn r i α loc p i γ PASM 2004: Performance models for master/slave parallel programs p.15/22

16 Parameterization for Synchronous Implementation Rates: s δ = 1 T T up λ = 1 T T N T T send a new task N number of slaves µ = 1000 (insignificant time) p i γ = 1 CM c σ = 1 ER r i α = 1 T P N CM compute final matches ER evaluate results T P send a result pack PASM 2004: Performance models for master/slave parallel programs p.16/22

17 Performance Indices Stationary analysis Local state probabilities: Slave in T x state master unavailable Master in Rx state master unavailable Master in T x state slaves unavailable PASM 2004: Performance models for master/slave parallel programs p.17/22

18 Synchronous vs. Asynchronous Synchronous Model Asynchronous Model 0.4 coarse grain fine grain 0.4 coarse grain fine grain Probability of slave in Tx state Probability of slave in Tx state number of slave nodes number of slave nodes N Coarse Fine Synchronous N Coarse Fine Asynchronous PASM 2004: Performance models for master/slave parallel programs p.18/22

19 Buffer Size Asynchronous Implementation with coarse grain Asynchronous Implementation with fine grain slaves 3 slaves 4 slaves 5 slaves 6 slaves 7 slaves slaves 3 slaves 4 slaves 5 slaves 6 slaves 7 slaves Probability of slave in Tx state Probability of slave in Tx state Buffer size Buffer size Buffer size 2 Slaves 3 Slaves 4 Slaves 5 Slaves 6 Slaves 7 Slaves Asynchronous (Coarse) Buffer size 2 Slaves 3 Slaves 4 Slaves 5 Slaves 6 Slaves 7 Slaves Asynchronous (Fine) PASM 2004: Performance models for master/slave parallel programs p.19/22

20 Granularity Asynchronous Model Asynchronous Model 0.98 coarse grain fine grain 0.14 coarse grain fine grain Probability of master in Rx state Probability of master in Tx state number of slave nodes number of slave nodes N Coarse Fine Receiving N Coarse Fine Transmitting PASM 2004: Performance models for master/slave parallel programs p.20/22

21 Conclusions We can: Choose the paradigm Choose the buffer size Choose the granularity Find possible bottlenecks before implementation We cannot (future work): Estimate execution time Transient analysis Provide an easy-to-use stochastic modeling Automatic modeling tool PASM 2004: Performance models for master/slave parallel programs p.21/22

Contact Performance Evaluation Group www.inf.pucrs.

22 Contact Performance Evaluation Group PASM 2004: Performance models for master/slave parallel programs p.22/22

Performance models for master/slave parallel programs 1

PASM 2004 Preliminary Version Performance models for master/slave parallel programs 1 Lucas Baldo 2 Leonardo Brenner 3 Luiz Gustavo Fernandes 4 Paulo Fernandes 5 Afonso Sales 6 Faculdade de Informática