Mapping pipeline skeletons onto heterogeneous platforms

Size: px
Start display at page:

Download "Mapping pipeline skeletons onto heterogeneous platforms"

Transcription

1 Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 January 2007 Mapping pipeline skeletons MAO 1/ 41

2 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

3 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

4 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

5 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

6 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

7 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

8 Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Mapping pipeline skeletons onto heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 2/ 41

9 Why restrict to pipelines? Chains-on-chains partitioning problem - no communications - identical processors Extensions (done) - with communications - with heterogeneous processors/links - goal: assess complexity, design heuristics Extensions (current work) - deal with DEALs - deal with DAGs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 3/ 41

10 Why restrict to pipelines? Chains-on-chains partitioning problem - no communications - identical processors Extensions (done) - with communications - with heterogeneous processors/links - goal: assess complexity, design heuristics Extensions (current work) - deal with DEALs - deal with DAGs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 3/ 41

11 Why restrict to pipelines? Chains-on-chains partitioning problem - no communications - identical processors Extensions (done) - with communications - with heterogeneous processors/links - goal: assess complexity, design heuristics Extensions (current work) - deal with DEALs - deal with DAGs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 3/ 41

12 Chains-on-chains Load-balance contiguous tasks Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41

13 Chains-on-chains Load-balance contiguous tasks With p = 4 processors? Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41

14 Chains-on-chains Load-balance contiguous tasks With p = 4 processors? Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41

15 Chains-on-chains Load-balance contiguous tasks With p = 4 processors? Back to Bokhari and Iqbal partitioning papers See survey by Pinar and Aykanat, JPDC 64, 8 (2004) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 4/ 41

16 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n The pipeline application Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41

17 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n The pipeline application Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41

18 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n One-to-one Mapping Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41

19 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n Interval Mapping Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41

20 Rule of the game Map each pipeline stage on a single processor (no deals) Goal: minimize execution time Several mapping strategies S 1 S 2... S k... S n General Mapping Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 5/ 41

21 Major contributions Theory Formal approach to the problem Problem complexity Integer linear program for exact resolution Practice Heuristics for Interval Mapping on clusters Experiments to compare heuristics and evaluate their absolute performance January 2007 Mapping pipeline skeletons MAO 6/ 41

22 Major contributions Theory Formal approach to the problem Problem complexity Integer linear program for exact resolution Practice Heuristics for Interval Mapping on clusters Experiments to compare heuristics and evaluate their absolute performance January 2007 Mapping pipeline skeletons MAO 6/ 41

23 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 7/ 41

24 The application δ 0 δ 1 δ k 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n n stages S k, 1 k n S k : receives input of size δ k 1 from S k 1 performs w k computations outputs data of size δ k to S k+1 S 0 and S n+1 : virtual stages representing the outside world Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 8/ 41

25 The platform s in P in P out s out b in,u b v,out P u s u b u,v P v s v p processors P u, 1 u p, fully interconnected s u : speed of processor P u bidirectional link link u,v : P u P v, bandwidth b u,v one-port model: each processor can either send, receive or compute at any time-step P in : input data P out : output data Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 9/ 41

26 Different platforms Fully Homogeneous Identical processors (s u = s) and links (b u,v = b): typical parallel machines Communication Homogeneous Different-speed processors (s u s v ), identical links (b u,v = b): networks of workstations, clusters Fully Heterogeneous Fully heterogeneous architectures, s u s v and b u,v b u,v : hierarchical platforms, grids Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 10/ 41

27 Mapping problem: One-to-one Mapping n p: map each stage S k onto a distinct processor P alloc(k) Period of P alloc(k) : minimum delay between processing of two consecutive tasks S k 1 S k S k+1 t = alloc(k 1) u = alloc(k) v = alloc(k + 1) Cycle-time of P u : cycle u = δ k 1 b t,u + w k s u + δ k b u,v Optimization problem: find the allocation function alloc : [1, n] [1, p] which minimizes T period = max 1 k n cycle alloc(k) (with alloc(0) = in and alloc(n + 1) = out) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 11/ 41

28 Mapping problem: One-to-one Mapping n p: map each stage S k onto a distinct processor P alloc(k) Period of P alloc(k) : minimum delay between processing of two consecutive tasks S k 1 S k S k+1 t = alloc(k 1) u = alloc(k) v = alloc(k + 1) Cycle-time of P u : cycle u = δ k 1 b t,u + w k s u + δ k b u,v Optimization problem: find the allocation function alloc : [1, n] [1, p] which minimizes T period = max 1 k n cycle alloc(k) (with alloc(0) = in and alloc(n + 1) = out) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 11/ 41

29 Mapping problem: Interval Mapping Several consecutive stages onto the same processor Increase computational load, reduce communications Mandatory when p < n Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m { δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) } Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 12/ 41

30 Mapping problem: Interval Mapping Several consecutive stages onto the same processor Increase computational load, reduce communications Mandatory when p < n Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m { δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) } Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 12/ 41

31 Mapping problem: Interval Mapping Several consecutive stages onto the same processor Increase computational load, reduce communications Mandatory when p < n Partition of [1..n] into m intervals I j = [d j, e j ] (with d j e j for 1 j m, d 1 = 1, d j+1 = e j + 1 for 1 j m 1 and e m = n) Interval I j mapped onto processor P alloc(j) T period = max 1 j m { δ dj 1 b alloc(j 1),alloc(j) + ej i=d j w i s alloc(j) + δ ej b alloc(j),alloc(j+1) } Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 12/ 41

32 Mapping problem: General Mapping Not suiting the one-port model very well: can always be replaced by an Interval Mapping as good as the general one for Communication Homogeneous platforms Can be the optimal mapping for Fully Heterogeneous platforms in some particular cases More general, but requires threads and may lead to idle times and races with the one-port model January 2007 Mapping pipeline skeletons MAO 13/ 41

33 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 14/ 41

34 Complexity results One-to-one Mapping Interval Mapping General Mapping Fully Hom. Comm. Hom. January 2007 Mapping pipeline skeletons MAO 15/ 41

35 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping General Mapping Binary search polynomial algorithm for One-to-one Mapping January 2007 Mapping pipeline skeletons MAO 15/ 41

36 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on Hom. platforms January 2007 Mapping pipeline skeletons MAO 15/ 41

37 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping same complexity as Interval Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on Hom. platforms General mapping: same complexity as Interval Mapping January 2007 Mapping pipeline skeletons MAO 15/ 41

38 Complexity results Fully Hom. Comm. Hom. One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping same complexity as Interval Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on Hom. platforms General mapping: same complexity as Interval Mapping All problem instances NP-complete on Fully Heterogeneous platforms January 2007 Mapping pipeline skeletons MAO 15/ 41

39 Back to chains-on-chains Chains-on-chains + homogeneous communications: polynomial Chains-on-chains + different-speed processors: NP-complete Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 16/ 41

40 Back to chains-on-chains Chains-on-chains + homogeneous communications: polynomial Chains-on-chains + different-speed processors: NP-complete Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 16/ 41

41 One-to-one/Comm. Hom.: binary search algorithm Work with fastest n processors, numbered P 1 to P n, where s 1 s 2... s n Mark all stages S 1 to S n as free For u = 1 to n Pick up any free stage S k s.t. δ k 1 /b + w k /s u + δ k /b T period Assign S k to P u, and mark S k as already assigned If no stage found return failure Proof: exchange argument Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 17/ 41

42 One-to-one/Comm. Hom.: binary search algorithm Work with fastest n processors, numbered P 1 to P n, where s 1 s 2... s n Mark all stages S 1 to S n as free For u = 1 to n Pick up any free stage S k s.t. δ k 1 /b + w k /s u + δ k /b T period Assign S k to P u, and mark S k as already assigned If no stage found return failure Proof: exchange argument Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 17/ 41

43 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }

44 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }

45 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }

46 Interval, Fully Hom.: dynamic programming algorithm c(i, j, k): optimal period to map stages S i to S j using exactly k processors Goal: min 1 k p c(1, n, k) { c(i, j, k) = min q + r = k 1 q k 1 1 r k 1 min {max (c(i, l, q), c(l + 1, j, r))} i l j 1 c(i, j, 1) = δ j i 1 b + k=i w k + δ j s b c(i, j, k) = + if k > j i + 1 Proof: search over all possible partitionings into two subintervals, using every possible number of processors for each interval Complexity: O(n 3 p 2 ) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 18/ 41 }

47 Heterogeneous platforms: NP-complete Reduction from MINIMUM METRIC BOTTLENECK WANDERING SALESPERSON PROBLEM 1 P in P 1 2 d(c1,c2) 2 d(c1,c4) 2 d(c1,c3) P 1,2 P 1,3 P 1,4 2 d(c1,c2) 2 d(c1,c3) 2 d(c1,c4) P 2 P 3 P 4 1 P 2,3 P 3,4 P 2,4 P out 2 d(c2,c4) 2 d(c2,c4) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 19/ 41

48 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 20/ 41

49 Greedy heuristics (1/2) Target clusters: Communication Homogeneous platforms and Interval Mapping L = n/p consecutive stages per processor: set of intervals fixed n/l processors used One-to-one Mapping when n p (L = 1) Rule applied in all greedy heuristics except random interval length Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 21/ 41

50 Greedy heuristics (2/2) H1a-GR: random Random choice of a free processor for each interval H1b-GRIL: random interval length Idem with random interval sizes: average length L, 1 length 2L 1 H2-GSW: biggest w Place interval with most computations on fastest processor H3-GSD: biggest δ in + δ out Intervals are sorted by communications (δ in + δ out ) in: first stage of interval; out 1: last one H4-GP: biggest period on fastest processor Balancing computation and communication: processors sorted by decreasing speed s u ; for current processor u, choose interval with biggest period (δ in + δ out )/b + i Interval w i/s u Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 22/ 41

51 Sophisticated heuristics H5-BS121: binary search for One-to-one Mapping optimal algorithm for One-to-one Mapping. When p < n, application cut in fixed intervals of length L. H6-SPL: splitting intervals Processors sorted by decreasing speed, all stages to first processor. At each step, select used proc j with largest period, split its interval (give fraction of stages to j ): minimize max(period(j), period(j )) and split if maximum period improved. H7a-BSL and H7b-BSC: binary search (longest/closest) Binary search on period P: start with stage s = 1, build intervals (s, s ) fitting on processors. For each u, and each s s, compute period (s..s, u) and check whether it is smaller than P. H7a: maximizes s ; H7b: chooses the closest period. Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 23/ 41

52 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 24/ 41

53 Plan of experiments Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ b and w s Average over 100 similar random appli/platform pairs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 25/ 41

54 Plan of experiments Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ b and w s Average over 100 similar random appli/platform pairs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 25/ 41

55 Plan of experiments Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ b and w s Average over 100 similar random appli/platform pairs Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 25/ 41

56 Experiment 1 - balanced comm/comp, hom comm δ i = 10, computation time between 1 and processors 100 processors Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 26/ 41

57 Experiment 1 - balanced comm/comp, hom comm δ i = 10, computation time between 1 and processors 100 processors H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Yves.Robert@ens-lyon.fr January 2007 Number of stages Mapping (p=10) pipeline skeletons MAO 26/ 41

58 Experiment 1 - balanced comm/comp, hom comm δ i = 10, computation time between 1 and processors 100 processors H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Yves.Robert@ens-lyon.fr January 2007 Number of stages Mapping (p=100) pipeline skeletons MAO 26/ 41

59 Experiment 2 - balanced comm/comp, het comm communication time between 1 and 100 computation time between 1 and 20 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 27/ 41

60 Experiment 2 - balanced comm/comp, het comm communication time between 1 and 100 computation time between 1 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=10) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 27/ 41

61 Experiment 2 - balanced comm/comp, het comm communication time between 1 and 100 computation time between 1 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=100) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 27/ 41

62 Experiment 3 - large computations communication time between 1 and 20 computation time between 10 and 1000 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 28/ 41

63 Experiment 3 - large computations communication time between 1 and 20 computation time between 10 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=10) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 28/ 41

64 Experiment 3 - large computations communication time between 1 and 20 computation time between 10 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=100) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 28/ 41

65 Experiment 4 - computations communication time between 1 and 20 computation time between 0.01 and 10 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 29/ 41

66 Experiment 4 - computations communication time between 1 and 20 computation time between 0.01 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=10) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 29/ 41

67 Experiment 4 - computations communication time between 1 and 20 computation time between 0.01 and H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest Maximum period Number of stages (p=100) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 29/ 41

68 Summary of experiments Much more efficient than random mappings Three dominant heuristics for different cases Insignificant communications (hom. or small) and many processors: H5-BS121 (One-to-one Mapping) Insignificant communications (hom. or small) and few processors: H7b-BSC (clever choice where to split) Important communications (het. or big): H6-SPL (splitting choice relevant for any number of processors) January 2007 Mapping pipeline skeletons MAO 30/ 41

69 Summary of experiments Much more efficient than random mappings Three dominant heuristics for different cases Insignificant communications (hom. or small) and many processors: H5-BS121 (One-to-one Mapping) Insignificant communications (hom. or small) and few processors: H7b-BSC (clever choice where to split) Important communications (het. or big): H6-SPL (splitting choice relevant for any number of processors) January 2007 Mapping pipeline skeletons MAO 30/ 41

70 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 31/ 41

71 Integer linear programming Integer LP to solve Interval Mapping on Fully Heterogeneous platforms Many integer variables: no efficient algorithm to solve Approach limited to small problem instances Absolute performance of the heuristics for such instances January 2007 Mapping pipeline skeletons MAO 32/ 41

72 Linear program: variables x k,u : 1 if S k on P u (0 otherwise) y k,u : 1 if S k and S k+1 both on P u (0 otherwise) z k,u,v : 1 if S k on P u and S k+1 on P v (0 otherwise) first u and last u : integer denoting first and last stage assigned to P u (to enforce interval constraints) T period : period of the pipeline Objective function: minimize T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 33/ 41

73 Linear program: variables x k,u : 1 if S k on P u (0 otherwise) y k,u : 1 if S k and S k+1 both on P u (0 otherwise) z k,u,v : 1 if S k on P u and S k+1 on P v (0 otherwise) first u and last u : integer denoting first and last stage assigned to P u (to enforce interval constraints) T period : period of the pipeline Objective function: minimize T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 33/ 41

74 Linear program: variables x k,u : 1 if S k on P u (0 otherwise) y k,u : 1 if S k and S k+1 both on P u (0 otherwise) z k,u,v : 1 if S k on P u and S k+1 on P v (0 otherwise) first u and last u : integer denoting first and last stage assigned to P u (to enforce interval constraints) T period : period of the pipeline Objective function: minimize T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 33/ 41

75 Linear program: constraints k [0..n + 1], u x k,u = 1 k [0..n], u v z k,u,v + u y k,u = 1 k [0..n], u, v [1..p] {in, out}, u v, x k,u +x k+1,v 1+z k,u,v k [0..n], u [1..p] {in, out}, x k,u + x k+1,u 1 + y k,u k [1..n], u [1..p], first u k.x k,u + n.(1 x k,u ) k [1..n], u [1..p], last u k.x k,u k [1..n 1], u, v [1..p], u v, last u k.z k,u,v + n.(1 z k,u,v ) k [1..n 1], u, v [1..p], u v, first v (k + 1).z k,u,v nx < u X δ k 1 z : k 1,t,u A + w k x k,u X δ = k z k,u,v A b k=1 t u t,u s u b v u u,v ; T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 34/ 41

76 Linear program: constraints k [0..n + 1], u x k,u = 1 k [0..n], u v z k,u,v + u y k,u = 1 k [0..n], u, v [1..p] {in, out}, u v, x k,u +x k+1,v 1+z k,u,v k [0..n], u [1..p] {in, out}, x k,u + x k+1,u 1 + y k,u k [1..n], u [1..p], first u k.x k,u + n.(1 x k,u ) k [1..n], u [1..p], last u k.x k,u k [1..n 1], u, v [1..p], u v, last u k.z k,u,v + n.(1 z k,u,v ) k [1..n 1], u, v [1..p], u v, first v (k + 1).z k,u,v nx < u X δ k 1 z : k 1,t,u A + w k x k,u X δ = k z k,u,v A b k=1 t u t,u s u b v u u,v ; T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 34/ 41

77 Linear program: constraints k [0..n + 1], u x k,u = 1 k [0..n], u v z k,u,v + u y k,u = 1 k [0..n], u, v [1..p] {in, out}, u v, x k,u +x k+1,v 1+z k,u,v k [0..n], u [1..p] {in, out}, x k,u + x k+1,u 1 + y k,u k [1..n], u [1..p], first u k.x k,u + n.(1 x k,u ) k [1..n], u [1..p], last u k.x k,u k [1..n 1], u, v [1..p], u v, last u k.z k,u,v + n.(1 z k,u,v ) k [1..n 1], u, v [1..p], u v, first v (k + 1).z k,u,v nx < u X δ k 1 z : k 1,t,u A + w k x k,u X δ = k z k,u,v A b k=1 t u t,u s u b v u u,v ; T period Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 34/ 41

78 Linear program: experiments O(np 2 ) variables, as many constraints Experiments only on small problem instances Average over 10 instances of each application Use GLPK Largest experiment: p = 8, n = 4: 14-hour computation time Parameters similar to Experiment 1: homogeneous communications and balanced comm/comp Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 35/ 41

79 Linear program: experiments O(np 2 ) variables, as many constraints Experiments only on small problem instances Average over 10 instances of each application Use GLPK Largest experiment: p = 8, n = 4: 14-hour computation time Parameters similar to Experiment 1: homogeneous communications and balanced comm/comp Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 35/ 41

80 Linear program: experiment p = Linear Program H3-GreedySumDinDout H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest 3.2 Maximum period Number of stages (p=8) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 36/ 41

81 Linear program: experiment p = 8 n LP H5-BS121 H7b-BSC Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 36/ 41

82 Linear program: experiment p = 4 Homogeneous communications (Experiment 1) Linear Program H3-GreedySumDinDout H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest 4.5 Maximum period Number of stages (p=4) - Homogeneous case H7b very close to the optimal (< 3% error) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 37/ 41

83 Linear program: experiment p = 4 Heterogeneous communications (Experiment 2) Linear Program H3-GreedySumDinDout H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest 14 Maximum period Number of stages (p=4) - Heterogeneous case H6 very close to the optimal (< 0.05% error) Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 37/ 41

84 Outline 1 Framework 2 Complexity results 3 Heuristics 4 Experiments 5 Linear programming formulation 6 Conclusion Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 38/ 41

85 Related work Scheduling task graphs on heterogeneous platforms Acyclic task graphs scheduled on different speed processors [Topcuoglu et al.]. Communication contention: 1-port model [Beaumont et al.]. Mapping pipelined computations onto special-purpose architectures FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Mapping pipelined computations onto clusters and grids DAG [Taura et al.], DataCutter [Saltz et al.] Mapping skeletons onto clusters and grids Use of stochastic process algebra [Benoit et al.] January 2007 Mapping pipeline skeletons MAO 39/ 41

86 Conclusion Theoretical side Complexity for different mapping strategies and different platform types Practical side Optimal polynomial algorithm for One-to-one Mapping Design of several heuristics for Interval Mapping on Communication Homogeneous Comparison of their performance Linear program to assess the absolute performance of the heuristics, which turns out to be quite good January 2007 Mapping pipeline skeletons MAO 40/ 41

87 Future work Short term Longer term Heuristics for Fully Heterogeneous platforms Extension to DAG-trees (a DAG which is a tree when un-oriented) Extension to stage replication LP with replication and DAG-trees Real experiments on heterogeneous clusters, using an already-implemented skeleton library and MPI Comparison of effective performance against theoretical performance Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons MAO 41/ 41

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Complexity results for throughput and latency optimization of replicated and data-parallel workflows Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr

More information

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon September 2007 Anne.Benoit@ens-lyon.fr

More information

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Complexity results for throughput and latency optimization of replicated and data-parallel workflows Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert LIP, ENS Lyon, 46 Allée d Italie, 69364 Lyon Cedex 07, France UMR 5668 -

More information

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows DOI 10.1007/s00453-008-9229-4 Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit Yves Robert Received: 6 April 2007 / Accepted: 19 September

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399

More information

Mapping Linear Workflows with Computation/Communication Overlap

Mapping Linear Workflows with Computation/Communication Overlap Mapping Linear Workflows with Computation/Communication Overlap 1 Kunal Agrawal 1, Anne Benoit,2 and Yves Robert 2 1 CSAIL, Massachusetts Institute of Technology, USA 2 LIP, École Normale Supérieure de

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit Yves Robert N 6087 January 007 Thème NUM apport de recherche ISSN 049-6399

More information

Multi-criteria scheduling of pipeline workflows

Multi-criteria scheduling of pipeline workflows Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Multi-criteria scheduling of pipeline workflows Anne Benoit, Veronika

More information

Strategies for Replica Placement in Tree Networks

Strategies for Replica Placement in Tree Networks Strategies for Replica Placement in Tree Networks http://graal.ens-lyon.fr/~lmarchal/scheduling/ 2 avril 2009 Introduction and motivation Replica placement in tree networks Set of clients (tree leaves):

More information

Scheduling tasks sharing files on heterogeneous master-slave platforms

Scheduling tasks sharing files on heterogeneous master-slave platforms Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch 1, Yves Robert 2, and Frédéric Vivien 2 1: ICPS/LSIIT, UMR CNRS ULP 7005, Strasbourg, France 2: LIP, UMR CNRS ENS Lyon

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

Parallel algorithms at ENS Lyon

Parallel algorithms at ENS Lyon Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 1/

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

Scheduling tasks sharing files on heterogeneous master-slave platforms

Scheduling tasks sharing files on heterogeneous master-slave platforms Scheduling tasks sharing files on heterogeneous master-slave platforms Arnaud Giersch a,, Yves Robert b, Frédéric Vivien b a ICPS/LSIIT, UMR CNRS ULP 7005 Parc d Innovation, Bd Sébastien Brant, BP 10413,

More information

Scheduling Tasks Sharing Files from Distributed Repositories

Scheduling Tasks Sharing Files from Distributed Repositories from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

Bi-criteria Pipeline Mappings for Parallel Image Processing

Bi-criteria Pipeline Mappings for Parallel Image Processing Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Bi-criteria Pipeline Mappings for Parallel Image Processing Anne

More information

Optimizing Latency and Reliability of Pipeline Workflow Applications

Optimizing Latency and Reliability of Pipeline Workflow Applications Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Optimizing Latency and Reliability of Pipeline Workflow Applications

More information

Rectangular Partitioning

Rectangular Partitioning Rectangular Partitioning Joe Forsmann and Rock Hymas Introduction/Abstract We will look at a problem that I (Rock) had to solve in the course of my work. Given a set of non-overlapping rectangles each

More information

Lecture 9: Load Balancing & Resource Allocation

Lecture 9: Load Balancing & Resource Allocation Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 29 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/7/2016 Approximation

More information

COLA: Optimizing Stream Processing Applications Via Graph Partitioning

COLA: Optimizing Stream Processing Applications Via Graph Partitioning COLA: Optimizing Stream Processing Applications Via Graph Partitioning Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, and Bugra Gedik Streaming

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

CSE 417 Branch & Bound (pt 4) Branch & Bound

CSE 417 Branch & Bound (pt 4) Branch & Bound CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity

More information

Maximum Differential Graph Coloring

Maximum Differential Graph Coloring Maximum Differential Graph Coloring Sankar Veeramoni University of Arizona Joint work with Yifan Hu at AT&T Research and Stephen Kobourov at University of Arizona Motivation Map Coloring Problem Related

More information

Towards Green Cloud Computing: Demand Allocation and Pricing Policies for Cloud Service Brokerage Chenxi Qiu

Towards Green Cloud Computing: Demand Allocation and Pricing Policies for Cloud Service Brokerage Chenxi Qiu Towards Green Cloud Computing: Demand Allocation and Pricing Policies for Cloud Service Brokerage Chenxi Qiu Holcombe Department of Electrical and Computer Engineering Outline 1. Introduction 2. Algorithm

More information

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18

More information

11. APPROXIMATION ALGORITHMS

11. APPROXIMATION ALGORITHMS 11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: vertex cover LP rounding: vertex cover generalized load balancing knapsack problem Lecture slides by Kevin Wayne Copyright 2005

More information

Laboratoire de l Informatique du Parallélisme

Laboratoire de l Informatique du Parallélisme Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512 SPI Multiplication by an Integer Constant Vincent Lefevre January 1999

More information

Connectivity-aware Virtual Network Embedding

Connectivity-aware Virtual Network Embedding Connectivity-aware Virtual Network Embedding Nashid Shahriar, Reaz Ahmed, Shihabur R. Chowdhury, Md Mashrur Alam Khan, Raouf Boutaba Jeebak Mitra, Feng Zeng Outline Survivability in Virtual Network Embedding

More information

Approximation Algorithms

Approximation Algorithms Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs

More information

The complement of PATH is in NL

The complement of PATH is in NL 340 The complement of PATH is in NL Let c be the number of nodes in graph G that are reachable from s We assume that c is provided as an input to M Given G, s, t, and c the machine M operates as follows:

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018 CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.

More information

Clustering Algorithms. Margareta Ackerman

Clustering Algorithms. Margareta Ackerman Clustering Algorithms Margareta Ackerman A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each

More information

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) On Modularity Clustering Presented by: Presented by: Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) Modularity A quality index for clustering a graph G=(V,E) G=(VE) q( C): EC ( ) EC ( ) + ECC (, ')

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Given an NP-hard problem, what should be done? Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one of three desired features. Solve problem to optimality.

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

Maximizing Restorable Throughput in MPLS Networks

Maximizing Restorable Throughput in MPLS Networks Maximizing Restorable Throughput in MPLS Networks Reuven Cohen Dept. of Computer Science, Technion Gabi Nakibly National EW Research Center 1 Motivation IP networks are required to service real-time applications

More information

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation)

Grid Scheduler. Grid Information Service. Local Resource Manager L l Resource Manager. Single CPU (Time Shared Allocation) (Space Shared Allocation) Scheduling on the Grid 1 2 Grid Scheduling Architecture User Application Grid Scheduler Grid Information Service Local Resource Manager Local Resource Manager Local L l Resource Manager 2100 2100 2100

More information

High-Level Synthesis

High-Level Synthesis High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction

More information

Database Architectures

Database Architectures B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 11 Database Architectures Authors: Tomáš Skopal, Irena Holubová Lecturer: Mar n Svoboda, mar n.svoboda@fel.cvut.cz

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

Geometric Steiner Trees

Geometric Steiner Trees Geometric Steiner Trees From the book: Optimal Interconnection Trees in the Plane By Marcus Brazil and Martin Zachariasen Part 2: Global properties of Euclidean Steiner Trees and GeoSteiner Marcus Brazil

More information

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Arnaud Legrand, Hélène Renard, Yves Robert, and Frédéric Vivien LIP, UMR CNRS-INRIA-UCBL 5668, École normale

More information

Graph Theory and Optimization Approximation Algorithms

Graph Theory and Optimization Approximation Algorithms Graph Theory and Optimization Approximation Algorithms Nicolas Nisse Université Côte d Azur, Inria, CNRS, I3S, France October 2018 Thank you to F. Giroire for some of the slides N. Nisse Graph Theory and

More information

Chapter 5: Distributed Process Scheduling. Ju Wang, 2003 Fall Virginia Commonwealth University

Chapter 5: Distributed Process Scheduling. Ju Wang, 2003 Fall Virginia Commonwealth University Chapter 5: Distributed Process Scheduling CMSC 602 Advanced Operating Systems Static Process Scheduling Dynamic Load Sharing and Balancing Real-Time Scheduling Section 5.2, 5.3, and 5.5 Additional reading:

More information

What is Performance for Internet/Grid Computation?

What is Performance for Internet/Grid Computation? Goals for Internet/Grid Computation? Do things you cannot otherwise do because of: Lack of Capacity Large scale computations Cost SETI Scale/Scope of communication Internet searches All of the above 9/10/2002

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about and give two simple examples of approximation algorithms 1 Overview

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A 4 credit unit course Part of Theoretical Computer Science courses at the Laboratory of Mathematics There will be 4 hours

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

Extending SLURM with Support for GPU Ranges

Extending SLURM with Support for GPU Ranges Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Extending SLURM with Support for GPU Ranges Seren Soner a, Can Özturana,, Itir Karac a a Computer Engineering Department,

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Resource Allocation Strategies for In-Network Stream Processing

Resource Allocation Strategies for In-Network Stream Processing Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Resource Allocation Strategies for In-Network Stream Processing

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms CSE 101, Winter 018 D/Q Greed SP s DP LP, Flow B&B, Backtrack Metaheuristics P, NP Design and Analysis of Algorithms Lecture 8: Greed Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Optimization

More information

Algorithms for Euclidean TSP

Algorithms for Euclidean TSP This week, paper [2] by Arora. See the slides for figures. See also http://www.cs.princeton.edu/~arora/pubs/arorageo.ps Algorithms for Introduction This lecture is about the polynomial time approximation

More information

11. APPROXIMATION ALGORITHMS

11. APPROXIMATION ALGORITHMS Coping with NP-completeness 11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: weighted vertex cover LP rounding: weighted vertex cover generalized load balancing knapsack problem

More information

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48 Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem

More information

Computational Complexity CSC Professor: Tom Altman. Capacitated Problem

Computational Complexity CSC Professor: Tom Altman. Capacitated Problem Computational Complexity CSC 5802 Professor: Tom Altman Capacitated Problem Agenda: Definition Example Solution Techniques Implementation Capacitated VRP (CPRV) CVRP is a Vehicle Routing Problem (VRP)

More information

Outline. 1 Introduction. 2 The problem of assigning PCUs in GERAN. 3 Other hierarchical structuring problems in cellular networks.

Outline. 1 Introduction. 2 The problem of assigning PCUs in GERAN. 3 Other hierarchical structuring problems in cellular networks. Optimisation of cellular network structure by graph partitioning techniques Matías Toril, Volker Wille () (2) () Communications Engineering Dept., University of Málaga, Spain (mtoril@ic.uma.es) (2) Performance

More information

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Temple University, Philadelphia, PA, USA 2012-9-13 ICPP 2012 1 Outline 1. Introduction 2. System

More information

L9: Hierarchical Clustering

L9: Hierarchical Clustering L9: Hierarchical Clustering This marks the beginning of the clustering section. The basic idea is to take a set X of items and somehow partition X into subsets, so each subset has similar items. Obviously,

More information

1. Lecture notes on bipartite matching

1. Lecture notes on bipartite matching Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Lecture 14 01/25/11 1 - Again Problem: Steiner Tree. Given an undirected graph G=(V,E) with non-negative edge costs c : E Q + whose vertex set is partitioned into required vertices

More information

The Size Robust Multiple Knapsack Problem

The Size Robust Multiple Knapsack Problem MASTER THESIS ICA-3251535 The Size Robust Multiple Knapsack Problem Branch and Price for the Separate and Combined Recovery Decomposition Model Author: D.D. Tönissen, Supervisors: dr. ir. J.M. van den

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

Minimum Weight Constrained Forest Problems. Problem Definition

Minimum Weight Constrained Forest Problems. Problem Definition Slide 1 s Xiaoyun Ji, John E. Mitchell Department of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY, USA jix@rpi.edu, mitchj@rpi.edu 2005 Optimization Days Montreal, Canada May 09, 2005

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID

DESIGN AND OVERHEAD ANALYSIS OF WORKFLOWS IN GRID I J D M S C L Volume 6, o. 1, January-June 2015 DESIG AD OVERHEAD AALYSIS OF WORKFLOWS I GRID S. JAMUA 1, K. REKHA 2, AD R. KAHAVEL 3 ABSRAC Grid workflow execution is approached as a pure best effort

More information

CSE 431/531: Analysis of Algorithms. Greedy Algorithms. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CSE 431/531: Analysis of Algorithms. Greedy Algorithms. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo CSE 431/531: Analysis of Algorithms Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Main Goal of Algorithm Design Design fast algorithms to solve

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Frédéric Giroire FG Simplex 1/11 Motivation Goal: Find good solutions for difficult problems (NP-hard). Be able to quantify the goodness of the given solution. Presentation of

More information

LID Assignment In InfiniBand Networks

LID Assignment In InfiniBand Networks LID Assignment In InfiniBand Networks Wickus Nienaber, Xin Yuan, Member, IEEE and Zhenhai Duan, Member, IEEE Abstract To realize a path in an InfiniBand network, an address, known as Local IDentifier (LID)

More information

Cilk, Matrix Multiplication, and Sorting

Cilk, Matrix Multiplication, and Sorting 6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction

More information

Adding Data Parallelism to Streaming Pipelines for Throughput Optimization

Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Adding Data Parallelism to Streaming Pipelines for Throughput Optimization Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain Department of Computer Science and Engineering Washington University

More information

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel

More information

12.1 Formulation of General Perfect Matching

12.1 Formulation of General Perfect Matching CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Perfect Matching Polytope Date: 22/02/2008 Lecturer: Lap Chi Lau Scribe: Yuk Hei Chan, Ling Ding and Xiaobing Wu In this lecture,

More information

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.

More information

Approximation Basics

Approximation Basics Milestones, Concepts, and Examples Xiaofeng Gao Department of Computer Science and Engineering Shanghai Jiao Tong University, P.R.China Spring 2015 Spring, 2015 Xiaofeng Gao 1/53 Outline History NP Optimization

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Parallel Databases C H A P T E R18. Practice Exercises

Parallel Databases C H A P T E R18. Practice Exercises C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks

More information

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs Graphs and Network Flows IE411 Lecture 21 Dr. Ted Ralphs IE411 Lecture 21 1 Combinatorial Optimization and Network Flows In general, most combinatorial optimization and integer programming problems are

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2) ESE535: Electronic Design Automation Preclass Warmup What cut size were you able to achieve? Day 4: January 28, 25 Partitioning (Intro, KLFM) 2 Partitioning why important Today Can be used as tool at many

More information

Solutions for the Exam 6 January 2014

Solutions for the Exam 6 January 2014 Mastermath and LNMB Course: Discrete Optimization Solutions for the Exam 6 January 2014 Utrecht University, Educatorium, 13:30 16:30 The examination lasts 3 hours. Grading will be done before January 20,

More information

1. Lecture notes on bipartite matching February 4th,

1. Lecture notes on bipartite matching February 4th, 1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)

More information

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple

More information

Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors

Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors Journal of Interconnection Networks c World Scientific Publishing Company Energy-Aware Scheduling for Acyclic Synchronous Data Flows on Multiprocessors DAWEI LI and JIE WU Department of Computer and Information

More information

Greedy Algorithms. T. M. Murali. January 28, Interval Scheduling Interval Partitioning Minimising Lateness

Greedy Algorithms. T. M. Murali. January 28, Interval Scheduling Interval Partitioning Minimising Lateness Greedy Algorithms T. M. Murali January 28, 2008 Algorithm Design Start discussion of dierent ways of designing algorithms. Greedy algorithms, divide and conquer, dynamic programming. Discuss principles

More information

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado

More information

CSC 373: Algorithm Design and Analysis Lecture 3

CSC 373: Algorithm Design and Analysis Lecture 3 CSC 373: Algorithm Design and Analysis Lecture 3 Allan Borodin January 11, 2013 1 / 13 Lecture 3: Outline Write bigger and get better markers A little more on charging arguments Continue examples of greedy

More information

1 The Traveling Salesperson Problem (TSP)

1 The Traveling Salesperson Problem (TSP) CS 598CSC: Approximation Algorithms Lecture date: January 23, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im In the previous lecture, we had a quick overview of several basic aspects of approximation

More information